Entry Leveling:2010:SWI from talip.bib

Last update: Sun Oct 15 02:55:04 MDT 2017                Valid HTML 3.2!

Index sections

Top | Symbols | Numbers | Math | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

BibTeX entry

@Article{Leveling:2010:SWI,
  author =       "Johannes Leveling and Gareth J. F. Jones",
  title =        "Sub-Word Indexing and Blind Relevance Feedback for
                 {English}, {Bengali}, {Hindi}, and {Marathi} {IR}",
  journal =      j-TALIP,
  volume =       "9",
  number =       "3",
  pages =        "12:1--12:??",
  month =        sep,
  year =         "2010",
  CODEN =        "????",
  DOI =          "https://doi.org/10.1145/1838745.1838749",
  ISSN =         "1530-0226 (print), 1558-3430 (electronic)",
  ISSN-L =       "1530-0226",
  bibdate =      "Sat Sep 18 15:58:58 MDT 2010",
  bibsource =    "http://portal.acm.org/;
                 http://www.math.utah.edu/pub/tex/bib/talip.bib",
  abstract =     "The Forum for Information Retrieval Evaluation (FIRE)
                 provides document collections, topics, and relevance
                 assessments for information retrieval (IR) experiments
                 on Indian languages. Several research questions are
                 explored in this article: (1) How to create a simple,
                 language-independent corpus-based stemmer, (2) How to
                 identify sub-words and which types of sub-words are
                 suitable as indexing units, and (3) How to apply blind
                 relevance feedback on sub-words and how feedback term
                 selection is affected by the type of the indexing unit.
                 More than 140 IR experiments are conducted using the
                 BM25 retrieval model on the topic titles and
                 descriptions (TD) for the FIRE 2008 English, Bengali,
                 Hindi, and Marathi document collections.\par

                 The major findings are: The corpus-based stemming
                 approach is effective as a knowledge-light term
                 conflation step and useful in the case of few
                 language-specific resources. For English, the
                 corpus-based stemmer performs nearly as well as the
                 Porter stemmer and significantly better than the
                 baseline of indexing words when combined with query
                 expansion. In combination with blind relevance
                 feedback, it also performs significantly better than
                 the baseline for Bengali and Marathi IR.\par

                 Sub-words such as consonant-vowel sequences and word
                 prefixes can yield similar or better performance in
                 comparison to word indexing. There is no best
                 performing method for all languages. For English,
                 indexing using the Porter stemmer performs best, for
                 Bengali and Marathi, overlapping 3-grams obtain the
                 best result, and for Hindi, 4-prefixes yield the
                 highest MAP. However, in combination with blind
                 relevance feedback using 10 documents and 20 terms,
                 6-prefixes for English and 4-prefixes for Bengali,
                 Hindi, and Marathi IR yield the highest
                 MAP.\par

                 Sub-word identification is a general case of
                 decompounding. It results in one or more index terms
                 for a single word form and increases the number of
                 index terms but decreases their average length. The
                 corresponding retrieval experiments show that relevance
                 feedback on sub-words benefits from selecting a larger
                 number of index terms in comparison with retrieval on
                 word forms. Similarly, selecting the number of
                 relevance feedback terms depending on the ratio of word
                 vocabulary size to sub-word vocabulary size almost
                 always slightly increases information retrieval
                 effectiveness compared to using a fixed number of terms
                 for different languages.",
  acknowledgement = ack-nhfb,
  articleno =    "12",
  fjournal =     "ACM Transactions on Asian Language Information
                 Processing",
  journal-URL =  "http://portal.acm.org/browse_dl.cfm?&idx=J820",
  keywords =     "blind relevance feedback; evaluation; FIRE;
                 Information retrieval; stemming; sub-word indexing",
}

Related entries