Entry Zhao:2010:UCB from talip.bib

Last update: Sun Oct 15 02:55:04 MDT 2017                Valid HTML 3.2!

Index sections

Top | Symbols | Numbers | Math | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

BibTeX entry

@Article{Zhao:2010:UCB,
  author =       "Hai Zhao and Chang-Ning Huang and Mu Li and Bao-Liang
                 Lu",
  title =        "A Unified Character-Based Tagging Framework for
                 {Chinese} Word Segmentation",
  journal =      j-TALIP,
  volume =       "9",
  number =       "2",
  pages =        "5:1--5:??",
  month =        jun,
  year =         "2010",
  CODEN =        "????",
  DOI =          "https://doi.org/10.1145/1781134.1781135",
  ISSN =         "1530-0226 (print), 1558-3430 (electronic)",
  ISSN-L =       "1530-0226",
  bibdate =      "Mon Jun 21 18:03:02 MDT 2010",
  bibsource =    "http://portal.acm.org/;
                 http://www.math.utah.edu/pub/tex/bib/talip.bib",
  abstract =     "Chinese word segmentation is an active area in Chinese
                 language processing though it is suffering from the
                 argument about what precisely is a word in Chinese.
                 Based on corpus-based segmentation standard, we
                 launched this study. In detail, we regard Chinese word
                 segmentation as a character-based tagging problem. We
                 show that there has been a potent trend of using a
                 character-based tagging approach in this field. In
                 particular, learning from segmented corpus with or
                 without additional linguistic resources is treated in a
                 unified way in which the only difference depends on how
                 the feature template set is selected. It differs from
                 existing work in that both feature template selection
                 and tag set selection are considered in our approach,
                 instead of the previous feature template focus only
                 technique. We show that there is a significant
                 performance difference as different tag sets are
                 selected. This is especially applied to a six-tag set,
                 which is good enough for most current segmented
                 corpora. The linguistic meaning of a tag set is also
                 discussed. Our results show that a simple learning
                 system with six $n$-gram feature templates and a
                 six-tag set can obtain competitive performance in the
                 cases of learning only from a training corpus. In cases
                 when additional linguistic resources are available, an
                 ensemble learning technique, assistant segmenter, is
                 proposed and its effectiveness is verified. Assistant
                 segmenter is also proven to be an effective method as
                 segmentation standard adaptation that outperforms
                 existing ones. Based on the proposed approach, our
                 system provides state-of-the-art performance in all 12
                 corpora of three international Chinese word
                 segmentation bakeoffs.",
  acknowledgement = ack-nhfb,
  articleno =    "5",
  fjournal =     "ACM Transactions on Asian Language Information
                 Processing",
  journal-URL =  "http://portal.acm.org/browse_dl.cfm?&idx=J820",
  keywords =     "assistant segmenter; character-based tagging method;
                 Chinese word segmentation; conditional random field;
                 tag set selection",
}

Related entries