Entry Wang:2012:IGD from talip.bib
Last update: Sun Oct 15 02:55:04 MDT 2017
Top |
Symbols |
Numbers |
Math |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z
BibTeX entry
@Article{Wang:2012:IGD,
author = "Kun Wang and Chengqing Zong and Keh-Yih Su",
title = "Integrating Generative and Discriminative
Character-Based Models for {Chinese} Word
Segmentation",
journal = j-TALIP,
volume = "11",
number = "2",
pages = "7:1--7:??",
month = jun,
year = "2012",
DOI = "https://doi.org/10.1145/2184436.2184440",
ISSN = "1530-0226 (print), 1558-3430 (electronic)",
ISSN-L = "1530-0226",
bibdate = "Tue Jun 12 11:20:16 MDT 2012",
bibsource = "http://portal.acm.org/;
http://www.math.utah.edu/pub/tex/bib/talip.bib",
abstract = "Among statistical approaches to Chinese word
segmentation, the word-based n-gram ( generative )
model and the character-based tagging ( discriminative
) model are two dominant approaches in the literature.
The former gives excellent performance for the
in-vocabulary (IV) words; however, it handles
out-of-vocabulary (OOV) words poorly. On the other
hand, though the latter is more robust for OOV words,
it fails to deliver satisfactory performance for IV
words. These two approaches behave differently due to
the unit they use (word vs. character) and the model
form they adopt (generative vs. discriminative). In
general, character-based approaches are more robust
than word-based ones, as the vocabulary of characters
is a closed set; and discriminative models are more
robust than generative ones, since they can flexibly
include all kinds of available information, such as
future context. This article first proposes a
character-based n -gram model to enhance the robustness
of the generative approach. Then the proposed
generative model is further integrated with the
character-based discriminative model to take advantage
of both approaches. Our experiments show that this
integrated approach outperforms all the existing
approaches reported in the literature. Afterwards, a
complete and detailed error analysis is conducted.
Since a significant portion of the critical errors is
related to numerical/foreign strings, character-type
information is then incorporated into the model to
further improve its performance. Last, the proposed
integrated approach is tested on cross-domain corpora,
and a semi-supervised domain adaptation algorithm is
proposed and shown to be effective in our
experiments.",
acknowledgement = ack-nhfb,
articleno = "7",
fjournal = "ACM Transactions on Asian Language Information
Processing (TALIP)",
journal-URL = "http://portal.acm.org/browse_dl.cfm?&idx=J820",
}
Related entries
- adaptation,
2(1)49,
3(2)94,
5(3)209,
8(1)2,
9(2)5,
10(2)7
- adopt,
7(3)10,
11(4)14,
13(3)13
- advantage,
8(2)7,
9(3)11,
12(3)12
- algorithm,
1(4)281,
5(2)165,
6(3)11,
6(4)2,
6(4)3,
7(1)3,
7(2)7,
7(3)8,
7(3)10,
7(4)12,
8(1)4,
8(3)12,
9(1)3,
9(1)4,
10(2)7,
10(2)9,
10(2)10,
10(4)19,
11(1)3,
11(2)4,
11(3)9,
11(3)11,
11(4)14,
12(1)1,
12(1)2,
12(2)6,
13(1)4,
13(3)13,
13(4)18
- all,
6(4)2,
7(1)1,
7(2)7,
7(3)8,
7(4)11,
7(4)12,
8(2)7,
8(4)16,
8(4)17,
9(2)5,
9(3)11,
9(3)12,
10(2)9,
10(3)15,
11(2)6,
11(4)18,
12(2)5,
12(3)9,
13(1)2,
13(2)6,
13(4)16
- analysis,
2(4)301,
3(2)94,
3(3)169,
4(3)263,
7(1)1,
7(1)3,
7(2)5,
7(3)9,
8(1)2,
8(3)12,
8(4)19,
9(2)6,
9(2)7,
9(3)11,
9(4)15,
10(1)4,
10(3)16,
10(4)20,
11(2)4,
11(4)16,
11(4)18,
12(2)6,
13(2)9,
13(3)11
- article,
3(4)227,
4(3)321,
5(2)121,
6(2)6,
6(2)7,
6(2)8,
6(4)3,
7(1)1,
7(1)3,
7(2)5,
7(2)6,
7(2)7,
7(3)8,
7(3)9,
7(4)11,
7(4)12,
7(4)13,
8(1)2,
8(1)3,
8(1)4,
8(2)6,
8(2)8,
8(2)9,
8(3)10,
8(3)11,
8(3)12,
8(4)14,
8(4)16,
8(4)17,
8(4)18,
9(1)2,
9(1)4,
9(2)6,
9(3)10,
9(3)11,
9(3)12,
9(4)13,
9(4)14,
10(1)3,
10(1)5,
10(1)6,
10(2)7,
10(2)9,
10(2)10,
10(3)12,
10(3)13,
10(3)14,
10(3)15,
10(4)17,
10(4)18,
10(4)20,
10(4)21,
11(1)1,
11(2)4,
11(2)5,
11(3)8,
11(3)10,
11(3)11,
11(4)13,
11(4)14,
11(4)15,
11(4)16,
11(4)17,
11(4)18,
12(1)1,
12(1)3,
12(1)4,
12(2)5,
12(2)6,
12(2)7,
12(3)9,
12(3)10,
12(3)11,
12(3)12,
12(4)14,
13(1)1,
13(1)2,
13(1)3,
13(1)4,
13(2)6,
13(2)7,
13(2)8,
13(2)9,
13(3)12,
13(3)13,
13(4)16
- available,
5(2)89,
8(1)3,
8(4)17,
9(2)5,
10(2)8,
11(1)1,
11(2)6,
11(4)18,
12(4)14,
13(1)1
- Based, Character-,
9(2)5
- based, character-,
1(4)297,
9(2)5
- based, word-,
1(3)173,
1(4)297,
6(3)9,
9(2)7,
9(3)11
- both,
6(2)6,
6(2)7,
6(3)10,
7(1)1,
7(1)2,
7(1)3,
7(3)8,
8(2)7,
8(3)11,
8(3)12,
9(1)2,
9(1)4,
9(2)5,
10(1)2,
10(1)4,
10(1)6,
10(2)10,
10(3)13,
10(3)15,
10(4)20,
10(4)21,
11(2)4,
11(2)6,
11(3)9,
11(4)18,
12(2)5,
12(2)7,
12(4)17,
13(2)9,
13(4)17
- character,
1(3)269,
2(1)27,
6(2)6,
6(2)8,
7(4)11,
8(2)9,
8(3)11,
9(4)14,
10(2)10,
12(1)1,
12(1)2,
12(2)6,
12(3)9,
12(4)16,
13(2)6,
13(2)8,
13(3)12,
13(3)14,
13(4)18
- Character-Based,
9(2)5
- character-based,
1(4)297,
9(2)5
- complete,
12(1)2,
12(1)3,
12(4)16
- conducted,
6(2)8,
7(4)12,
8(1)2,
9(2)7,
9(3)10,
9(3)12,
11(2)4,
13(1)3,
13(4)18
- context,
6(4)2,
8(4)15,
9(2)7,
9(4)14,
10(1)3,
10(4)18,
10(4)19,
11(2)6,
11(3)9,
12(1)2,
13(1)2
- corpora,
5(2)89,
5(2)121,
6(2)6,
6(3)11,
7(2)6,
9(2)5,
9(4)13,
10(3)15,
10(4)19,
10(4)21,
11(2)6,
11(3)11,
12(2)7,
13(1)1,
13(1)3,
13(2)9,
13(3)11
- critical,
8(2)9,
9(4)13,
11(4)17
- detailed,
6(4)3
- discriminative,
1(1)34,
3(2)128,
5(4)413,
8(2)8,
8(4)15,
10(4)17,
13(4)17
- domain,
5(2)165,
7(1)2,
7(4)11,
8(2)7,
8(2)8,
10(1)5,
12(1)3,
13(3)12
- dominant,
12(1)4,
13(3)14
- due,
5(2)121,
6(2)8,
7(2)7,
9(1)1,
9(1)2,
9(4)15,
10(2)10,
10(3)12,
10(4)21,
12(4)14,
13(1)4,
13(4)17
- effective,
4(2)78,
6(2)7,
6(3)11,
7(4)12,
7(4)13,
8(1)2,
8(3)10,
9(2)5,
9(3)12,
10(2)10,
10(3)14,
10(4)17,
10(4)18,
11(2)4,
11(4)18,
12(2)7,
12(4)14,
13(3)13,
13(4)16
- enhance,
6(2)7,
10(4)20,
13(2)8
- error,
4(1)18,
6(3)9,
7(1)2,
7(3)10,
9(1)2,
9(2)6,
10(1)2,
10(1)5,
10(1)6,
10(2)7,
10(2)10,
11(1)3,
11(4)18,
12(1)2,
13(2)8,
13(3)14
- existing,
6(2)8,
6(4)2,
8(1)2,
8(2)7,
9(1)3,
9(2)5,
9(4)13,
10(3)16,
11(2)4,
11(4)18,
12(2)5,
12(4)17,
13(1)2,
13(2)8
- experiment,
2(2)101,
2(2)143,
5(2)146,
5(2)165,
5(3)245,
6(2)7,
6(2)8,
7(1)1,
7(1)2,
7(1)3,
7(2)5,
7(4)11,
8(1)2,
8(2)6,
8(3)11,
8(4)17,
9(2)6,
9(2)7,
9(3)11,
9(3)12,
10(4)20,
11(2)4,
11(2)5,
11(3)10,
11(3)11,
11(4)15,
11(4)17,
12(1)2,
12(2)5,
12(2)7,
12(3)9,
12(3)11,
12(3)12,
12(4)16,
12(4)17,
13(1)2,
13(1)3,
13(2)6,
13(2)8,
13(3)13,
13(4)18
- fail,
9(1)1,
11(3)8,
13(2)8
- first,
5(2)165,
6(2)6,
6(4)3,
7(1)1,
7(3)8,
7(3)10,
8(2)7,
8(3)10,
8(3)11,
8(4)19,
9(3)10,
9(3)11,
10(3)13,
11(2)6,
11(3)8,
11(3)9,
11(3)11,
12(1)3,
12(1)4,
12(2)5,
12(3)10,
12(4)17,
13(2)7,
13(2)9,
13(3)13,
13(4)17
- form,
6(3)9,
7(1)3,
8(1)3,
8(4)18,
9(1)3,
9(3)12,
9(4)13,
10(2)8,
11(4)13,
12(1)4,
12(3)11
- former,
6(4)1
- further,
10(3)13,
10(4)18,
11(1)3,
12(1)3,
12(2)5,
12(4)15,
13(2)6,
13(4)18
- future,
8(4)14,
11(3)10
- general,
8(4)14,
9(3)12,
10(3)12,
11(2)6,
11(3)10,
12(2)6,
12(4)14,
12(4)15
- generative,
8(1)3,
12(3)9
- give,
7(4)11,
9(2)7
- gram,
3(2)113,
6(2)6,
9(2)5,
9(2)7,
9(3)11,
12(1)1,
12(4)15
- gram, n-,
8(1)4
- hand,
8(2)7
- handle,
8(4)18,
10(3)16
- however,
6(2)8,
7(2)7,
7(3)8,
7(3)10,
7(4)11,
7(4)12,
7(4)13,
8(3)10,
9(1)1,
9(3)12,
10(3)12,
10(4)18,
11(2)6,
12(1)2,
12(2)7,
12(3)10,
12(3)11,
12(4)14,
13(3)11,
13(4)17
- improve,
6(4)1,
6(4)3,
7(1)1,
7(1)2,
7(2)6,
7(3)8,
8(1)3,
8(2)9,
8(3)10,
8(4)17,
8(4)18,
9(4)13,
10(2)7,
10(3)13,
10(4)17,
11(3)8,
11(4)14,
11(4)17,
12(1)4,
12(3)12,
12(4)16,
13(2)7
- include,
7(3)8,
8(2)9,
9(4)15,
10(1)2,
10(4)19,
12(3)10,
13(2)7
- incorporated,
10(1)5,
11(3)10
- information,
1(1)65,
1(1)83,
1(4)281,
2(3)245,
2(3)295,
3(1)1,
3(4)227,
4(2)57,
4(2)78,
4(3)243,
4(3)357,
4(4)375,
4(4)475,
5(1)44,
5(2)89,
5(3)264,
5(4)291,
5(4)296,
5(4)323,
6(2)7,
6(4)2,
7(1)2,
7(1)3,
7(2)5,
7(3)8,
7(4)12,
8(1)2,
8(1)3,
8(3)10,
8(3)11,
8(4)15,
8(4)16,
8(4)17,
9(1)1,
9(2)7,
9(3)9,
9(3)10,
9(3)11,
9(3)12,
9(4)13,
9(4)14,
9(4)15,
10(2)8,
10(2)10,
10(3)15,
10(3)16,
10(4)19,
10(4)20,
10(4)21,
11(1)2,
11(2)6,
11(4)15,
11(4)18,
12(2)5,
12(3)11,
12(4)16,
13(2)7,
13(3)13
- integrated,
8(2)9,
8(4)15,
10(4)19
- integrating,
7(3)8,
10(3)13
- kind,
7(4)13,
8(1)2,
8(4)16,
10(3)13,
11(2)5
- last,
7(1)3,
8(4)17,
8(4)18,
11(1)1,
12(3)10,
13(1)2,
13(1)3,
13(3)14
- latter,
8(1)3
- literature,
5(1)22,
5(1)44,
13(1)4
- more,
5(2)146,
6(2)7,
6(3)10,
7(3)9,
7(4)13,
8(1)4,
8(2)7,
8(3)12,
8(4)14,
8(4)16,
9(1)2,
9(2)6,
9(3)11,
9(3)12,
10(1)4,
10(4)19,
11(2)4,
12(1)1,
12(1)2,
12(3)9,
13(1)1,
13(1)3,
13(1)4,
13(2)8,
13(4)18
- n,
6(2)6,
9(2)7,
12(1)1
- n-gram,
6(3)11,
8(1)4
- of-vocabulary, out-,
7(2)5,
10(3)16
- one,
5(2)89,
5(2)121,
6(2)6,
6(3)9,
6(4)3,
7(3)8,
7(3)9,
7(4)11,
7(4)13,
8(2)9,
8(4)16,
8(4)17,
9(1)1,
9(2)5,
9(2)7,
9(3)12,
9(4)14,
10(1)5,
10(3)12,
10(3)13,
10(4)19,
11(2)4,
11(2)6,
11(4)14,
12(1)1,
12(1)2,
12(2)5,
12(2)7,
12(3)11,
12(4)16,
13(1)4,
13(2)10,
13(4)17,
13(4)18
- OOV,
4(2)57,
7(2)5,
10(3)16
- other,
5(2)165,
7(2)6,
7(2)7,
7(3)8,
7(4)11,
8(1)2,
8(2)7,
8(4)14,
8(4)16,
8(4)17,
9(1)1,
10(2)7,
10(4)20,
11(2)4,
11(4)18,
12(1)1,
12(1)3,
12(3)11,
12(4)16,
13(1)3,
13(1)4,
13(2)7
- out-of-vocabulary,
7(2)5,
10(3)16
- outperform,
6(4)3,
7(2)6,
7(3)10,
7(4)13,
8(2)6,
9(2)5,
9(2)6,
10(3)15,
11(3)8,
11(4)14,
12(2)5,
13(3)13,
13(4)17
- performance,
5(2)121,
5(2)165,
6(2)8,
6(3)9,
6(4)1,
6(4)3,
7(1)1,
7(1)2,
7(2)5,
7(2)6,
7(2)7,
7(3)9,
7(3)10,
7(4)13,
8(1)2,
8(1)3,
8(2)7,
8(2)8,
8(2)9,
8(3)10,
8(4)16,
8(4)17,
8(4)18,
9(1)2,
9(1)4,
9(2)5,
9(2)6,
9(3)11,
9(3)12,
9(4)14,
10(2)8,
10(3)13,
10(3)14,
11(3)10,
11(3)11,
11(4)14,
11(4)15,
11(4)17,
12(1)2,
12(3)9,
12(3)11,
12(4)14,
12(4)15,
12(4)16,
13(1)3,
13(1)4,
13(2)7,
13(2)9,
13(4)16,
13(4)17
- portion,
7(4)13,
8(3)11
- propose,
5(2)89,
6(2)6,
6(2)8,
6(3)11,
7(3)8,
7(3)10,
7(4)12,
8(1)2,
8(1)4,
8(2)9,
8(4)19,
9(2)7,
9(4)13,
10(2)10,
10(3)12,
10(3)15,
10(4)17,
10(4)20,
11(3)9,
11(3)11,
11(4)15,
11(4)16,
11(4)18,
12(1)1,
12(1)2,
12(1)3,
12(1)4,
12(2)5,
12(2)6,
12(3)9,
12(3)10,
12(3)12,
12(4)16,
13(1)2,
13(1)3,
13(2)8,
13(2)9,
13(3)12,
13(3)13,
13(4)17,
13(4)18
- proposed,
5(2)121,
5(2)165,
6(2)7,
7(1)1,
7(1)2,
7(3)9,
7(3)10,
7(4)11,
7(4)13,
8(1)4,
8(2)6,
8(3)10,
8(3)11,
8(4)14,
8(4)19,
9(1)1,
9(2)5,
9(2)7,
10(2)7,
10(2)9,
10(3)14,
10(4)18,
11(1)3,
11(2)5,
11(2)6,
11(3)8,
11(3)9,
11(3)10,
11(3)11,
11(4)16,
11(4)17,
12(1)4,
12(2)5,
12(2)7,
12(3)12,
12(4)16,
12(4)17,
13(2)6,
13(2)8,
13(3)13,
13(4)18
- related,
5(1)22,
6(2)7,
6(3)10,
6(4)2,
7(2)6,
7(3)8,
10(2)10,
10(4)20,
11(2)5,
12(1)3
- reported,
7(4)13,
11(1)1,
11(1)2,
13(1)4,
13(2)6,
13(3)14
- robust,
5(2)89,
7(1)2,
7(3)10,
12(1)4,
12(2)5,
12(2)7,
13(3)12,
13(4)18
- robustness,
7(1)2,
7(3)10
- segmentation,
8(2)7,
8(4)16,
9(1)2,
9(1)3,
9(2)5,
9(4)15,
12(1)2,
12(1)4,
12(4)16,
13(2)9
- semi-supervised,
8(3)10,
12(2)7
- set,
1(3)269,
5(2)121,
6(1)z,
7(1)3,
7(3)8,
7(4)11,
7(4)13,
8(3)10,
8(3)12,
8(4)15,
9(1)1,
9(1)3,
9(2)5,
10(1)4,
10(2)8,
10(4)20,
11(2)5,
11(3)10,
11(3)11,
11(4)13,
11(4)14,
12(1)2,
12(1)4,
12(3)9,
13(2)8,
13(2)9,
13(3)12,
13(3)13,
13(4)17
- show,
5(2)89,
5(2)146,
7(1)1,
7(1)2,
7(1)3,
7(4)11,
7(4)12,
7(4)13,
8(1)4,
8(2)7,
8(2)9,
8(3)12,
8(4)16,
8(4)17,
9(1)1,
9(1)2,
9(1)3,
9(2)5,
9(2)6,
9(2)7,
9(3)11,
9(3)12,
9(4)14,
10(1)3,
10(3)15,
11(2)4,
11(2)5,
11(3)8,
11(3)11,
11(4)14,
11(4)15,
11(4)17,
11(4)18,
12(1)2,
12(1)4,
12(2)5,
12(2)7,
12(3)9,
12(3)10,
12(3)11,
12(4)15,
12(4)16,
13(1)3,
13(2)6,
13(2)7,
13(2)9,
13(3)14
- shown,
5(2)146,
8(4)18,
10(2)8,
13(3)12
- significant,
5(2)121,
7(1)3,
8(1)4,
8(4)15,
8(4)16,
8(4)17,
8(4)18,
9(2)5,
9(3)11,
10(1)5,
10(2)8,
10(3)14,
11(2)6,
12(1)1,
12(4)16,
13(1)3,
13(3)14,
13(4)16
- since,
5(2)89,
5(2)165,
8(2)9,
8(4)16,
8(4)18,
10(1)4,
10(1)5,
10(4)19,
10(4)21,
11(4)18,
12(1)2,
12(4)16,
13(4)17
- statistical,
1(1)3,
3(2)87,
3(4)243,
5(2)121,
5(4)323,
5(4)360,
6(1)z-4,
7(1)1,
8(1)2,
8(1)4,
8(2)6,
8(2)7,
8(2)8,
8(2)9,
8(3)10,
8(4)15,
8(4)19,
9(2)6,
9(2)7,
9(3)11,
10(4)18,
11(2)6,
11(3)8,
11(4)15,
12(1)1,
12(3)12,
12(4)14,
12(4)16,
12(4)17,
13(1)2,
13(1)3,
13(1)4,
13(4)17
- such,
7(2)7,
7(3)8,
7(3)10,
7(4)12,
8(2)8,
8(3)10,
8(3)11,
8(3)12,
8(4)14,
8(4)16,
8(4)17,
9(1)1,
9(3)12,
9(4)13,
9(4)15,
10(1)5,
10(2)8,
10(3)12,
10(4)21,
11(1)2,
11(2)5,
11(3)8,
11(3)10,
11(3)11,
11(4)13,
11(4)16,
11(4)17,
11(4)18,
12(1)1,
12(1)2,
12(2)6,
12(3)10,
12(3)11,
12(4)14,
12(4)17,
13(1)1,
13(3)12,
13(4)17
- supervised, semi-,
8(3)10,
12(2)7
- tagging,
1(2)145,
3(1)51,
9(2)5,
9(4)15,
10(1)4,
13(1)1
- take,
8(2)8,
10(1)4,
10(4)20
- tested,
6(3)11,
9(1)3,
12(1)4,
13(2)6,
13(4)16
- than,
5(2)146,
6(3)10,
7(3)8,
7(4)11,
7(4)13,
8(2)8,
8(4)16,
9(1)2,
9(2)7,
9(3)11,
9(3)12,
10(1)3,
10(1)4,
10(3)14,
11(2)4,
11(3)8,
11(3)9,
11(4)13,
11(4)15,
12(2)7,
12(3)10,
12(4)16,
13(1)1,
13(4)17
- then,
5(2)121,
6(2)6,
7(1)1,
7(3)10,
7(4)12,
8(1)4,
8(2)7,
8(3)10,
8(3)11,
8(3)12,
8(4)14,
9(1)1,
9(2)7,
9(3)11,
10(2)7,
10(3)13,
10(4)20,
11(1)3,
11(3)11,
11(4)15,
12(1)3,
12(3)10,
12(4)17,
13(1)4,
13(2)9,
13(3)13,
13(4)16
- though,
6(4)3,
9(2)5,
10(4)21
- two,
5(2)89,
7(2)7,
7(3)8,
7(4)11,
7(4)12,
7(4)13,
8(1)4,
8(2)7,
8(4)17,
9(1)2,
9(3)11,
9(4)13,
10(1)2,
10(3)12,
10(3)14,
10(3)15,
10(4)20,
11(2)4,
11(2)5,
11(3)8,
11(3)9,
11(3)11,
11(4)17,
12(1)1,
12(1)2,
12(1)4,
12(2)5,
12(3)10,
12(3)11,
12(4)16,
13(1)1,
13(1)3,
13(1)4,
13(2)6,
13(2)9,
13(3)11,
13(4)17
- unit,
6(3)9,
9(3)12,
10(1)6,
11(2)5,
11(4)16,
11(4)18,
13(2)9
- use,
4(2)159,
5(2)89,
5(2)146,
6(2)8,
6(3)11,
7(2)6,
7(3)9,
7(4)11,
7(4)12,
8(1)3,
8(2)9,
8(3)10,
8(3)11,
9(1)1,
9(1)3,
9(3)11,
10(1)3,
10(1)4,
11(1)1,
11(2)6,
11(3)8,
11(3)10,
11(4)14,
11(4)18,
12(1)1,
12(2)6,
12(3)9,
12(3)10,
13(2)6,
13(2)9,
13(2)10,
13(3)12
- vocabulary,
6(3)9,
8(1)2,
9(3)12,
12(4)14
- vocabulary, out-of-,
7(2)5,
10(3)16
- word-based,
1(3)173,
1(4)297,
6(3)9,
9(2)7,
9(3)11
- Zong, Chengqing,
7(1)1