All words in every title are candidates for the title word index, with these transformations and reductions:
Non-English letters, such as Scandinavian æ , ø , and å , and French , are sorted as if they were replaced by ae , o , a , and oe , respectively. [More precisely, TEX control sequences for accents, and for these letters, are reduced by dropping the non-letters when forming the sorting key.]
a about above after also am among an and are as at be before beside between but by can do for from go he her hers him his i if in into is it its me my no of on or our out over she so some that the their them these they this those to under up us we with within without you your
Words in languages other than English are not considered for membership in this list, even though titles in at least several Western European languages may be present in the bibliographic data.
Lowercasing such words would require the software to distinguish between a valid lowercasing of Transitive to transitive , and an invalid lowercasing of Weierstrass to weierstrass . This is an impossible task for a computer, because it requires context-sensitive analysis, and human understanding of the text and subject area, to handle ambiguous cases: consider Green functions and green beans !
In some cases, this processing can lead to incorrect lowercasing, such as a capitalized German noun Software being indexed under the English software , but this minor error is acceptable, because it will never prevent a human from finding the word in the index.
In the indexing software used here, a simpler approach is taken. A plural form is reduced to a singular form by stripping a final s or es , reducing a final ies to y , or reducing an ices ending to ex or ix . However, the resulting word is rejected unless it contains only letters, and the word is already present in the list of words to be indexed. That list is thus treated as a dictionary.
If only a plural form is found, then that form is indexed.
This algorithm can produce false reductions and ambiguities, such as cubes to cube or cub , but doing a better job would require a more sophisticated algorithm for plural-to-singular conversion, and an exception dictionary. Furthermore, even that algorithm would fail completely when confronted with a non-English word, or a highly technical word that is absent from English dictionaries, both of which are very likely to occur in scientific bibliography data.
The indexing software therefore takes a conservative approach: it permits the user to supply a supplemental dictionary containing singular words, and one or more plural forms for each of them (e.g., index indexes indices , and symposium symposia symposiums ). This dictionary need not be a comprehensive list for the English language, but only for the few hundred plurals that might occur in the journal index. Such a list can be constructed by filtering the index word list to extract all of those with plural endings, and then manual augmenting them with corresponding singular forms. To avoid errors, the resulting list should itself be checked with spelling programs, such as UNIX spell or GNU ispell.
Any candidate word that is found in the list of singular forms from the supplemental dictionary will not be stripped of plural suffixes, so that, e.g., news can be prevented from reducing to new , even if the latter word occurs elsewhere in the index.
To eliminate remaining unwanted final periods, it is sufficient to make suitable entries in the plural dictionary manually, then regenerate the index.
While this procedure can produce a few sorting irregularities, the order is readily discernible to anyone with even limited exposure to TEX mathematics markup, which generally labels mathematical symbols by their English names, and in any event, the number of index entries with mathematical material is usually fairly short, so a human reader can easily do a linear scan through that section.