Entry Jiang:2013:LRC from talip.bib

Last update: Sun Oct 15 02:55:04 MDT 2017                Valid HTML 3.2!

Index sections

Top | Symbols | Numbers | Math | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

BibTeX entry

@Article{Jiang:2013:LRC,
  author =       "Mike Tian-Jian Jiang and Tsung-Hsien Lee and Wen-Lian
                 Hsu",
  title =        "The Left and Right Context of a Word: Overlapping
                 {Chinese} Syllable Word Segmentation with Minimal
                 Context",
  journal =      j-TALIP,
  volume =       "12",
  number =       "1",
  pages =        "2:1--2:??",
  month =        mar,
  year =         "2013",
  CODEN =        "????",
  DOI =          "https://doi.org/10.1145/2425327.2425329",
  ISSN =         "1530-0226 (print), 1558-3430 (electronic)",
  ISSN-L =       "1530-0226",
  bibdate =      "Sat Mar 2 09:25:42 MST 2013",
  bibsource =    "http://portal.acm.org/;
                 http://www.math.utah.edu/pub/tex/bib/talip.bib",
  abstract =     "Since a Chinese syllable can correspond to many
                 characters (homophones), the syllable-to-character
                 conversion task is quite challenging for Chinese
                 phonetic input methods (CPIM). There are usually two
                 stages in a CPIM: 1. segment the syllable sequence into
                 syllable words, and 2. select the most likely character
                 words for each syllable word. A CPIM usually assumes
                 that the input is a complete sentence, and evaluates
                 the performance based on a well-formed corpus. However,
                 in practice, most Pinyin users prefer progressive text
                 entry in several short chunks, mainly in one or two
                 words each (most Chinese words consist of two or more
                 characters). Short chunks do not provide enough
                 contexts to perform the best possible
                 syllable-to-character conversion, especially when a
                 chunk consists of overlapping syllable words. In such
                 cases, a conversion system often selects the boundary
                 of a word with the highest frequency. Short chunk input
                 is even more popular on platforms with limited
                 computing power, such as mobile phones. Based on the
                 observation that the relative strength of a word can be
                 quite different when calculated leftwards or
                 rightwards, we propose a simple division of the word
                 context into the left context and the right context.
                 Furthermore, we design a double ranking strategy for
                 each word to reduce the number of errors in Step 1. Our
                 strategy is modeled as the minimum feedback arc set
                 problem on bipartite tournament with approximate
                 solutions derived from genetic algorithm. Experiments
                 show that, compared to the frequency-based method (FBM)
                 (low memory and fast) and the conditional random fields
                 (CRF) model (larger memory and slower), our double
                 ranking strategy has the benefits of less memory and
                 low power requirement with competitive performance. We
                 believe a similar strategy could also be adopted to
                 disambiguate conflicting linguistic patterns
                 effectively.",
  acknowledgement = ack-nhfb,
  articleno =    "2",
  fjournal =     "ACM Transactions on Asian Language Information
                 Processing",
  journal-URL =  "http://portal.acm.org/browse_dl.cfm?&idx=J820",
}

Related entries