CITESUB 1 "16 October 1996" "Version 0.02"

Table of contents


citesub - substitute standardized BibTeX citation labels


citesub [ -f substitution-file ] [ -v ] [ file(s) ] >outfile


citesub applies citation label substitutions to input BibTeX, LaTeX, and TeX files. The substitutions will usually be generated automatically by the companion biblabel program (see below), which can be used to standardize the form of BibTeX citation labels to the conventions adopted for the BibNet Project.

Most existing BibTeX bibliography files have been found to have rather haphazardly-chosen, and unsystematic, citation labels that are very likely to conflict with labels in other bibliography files; biblabel and citesub provide an automatic way to rectify this.

To avoid confusion between labels with common prefixes, such as Smith80 and Smith80a, citesub checks for leading context of a left brace, quote, comma, whitespace, or beginning of line and trailing context of a right brace, comma, quote, percent, whitespace, or end of line so as to match these styles:


crossref = "Smith:1980:ABC",

crossref = {Smith:1980:ABC},




Although one might expect that simple application of standard software tools like the UNIX awk(1) and sed(1) utilities could do the string substitution job, this is not the case. For one thing, the required context sensitivity complicates the regular-expression patterns that are needed. For another, most UNIX sed(1) implementations have a built-in limit of about 100 substitutions, which is far too few for typical bibliographies. Finally, simple application of awk(1) and awk(1) involves matching every input line with every substitution pattern, which results in quadratic run-time behavior that proves impossibly slow for large bibliographies.

citesub provides an efficient solution whose run time is essentially proportional to the size of the input files, and independent of the number of substitutions to be carried out. This is achieved by tokenizing the input lines, and then looking up each token in a constant-access time (hash) table of substitutions. An initial prototype programmed in the awk language led to a final version in C that ran about 50 times faster, processing about 4000 input lines/sec on an entry-level Sun SPARCstation LX workstation.


Except for the option described below, all command-line words are assumed to be input files. Should such a filename begin with a hyphen, it must be disguised by a leading absolute or relative directory path, e.g. /tmp/-foo.bib or ./-foo.bib.
-f substitution-file
This option specifies the name of a file containing pairs of old and new citation labels, one pair per line, surrounded by arbitrary amounts of whitespace. This file is most easily generated by the companion program biblabel(1).

If this option is omitted, then the substitution filename will be derived from that of the first input file by replacing its extension by .sub. Thus, the commands

citesub -f foo.sub foo.bib >foo.bib-new
citesub foo.bib >foo.bib-new
are equivalent.

If the substitution file is named "-", then citesub follows the common UNIX convention and interprets it to mean standard input, allowing the substitutions to be provided from a pipeline, such as

biblabel foo.bib | citesub -f - >
Display the program version number, and possibly installer, location, and compile-time information, on stderr.


citesub will issue warning messages in the following cases:


awk(1), bibcheck(1), bibclean(1), bibextract(1), bibjoin(1), biblabel(1), biblex(1), biborder(1), bibparse(1), bibsort(1), bibtex(1), bibunlex(1), sed(1).


Nelson H. F. Beebe, Ph.D.
Center for Scientific Computing
Department of Mathematics
University of Utah
Salt Lake City, UT 84112
Tel: +1 801 581 5254
FAX: +1 801 581 4148
Email: <>