BIBSORT 1 "17 January 2000" "Version 0.15"

Table of contents


NAME

bibsort - sort a BibTeX bibliography file

SYNOPSIS

bibsort [-?] [-author] [-byday or -bylabel or -bypages or -byseriesvolume or -byvolume or -byyear] [-copyright] [-help] [-reverse] [-version] [ optional sort(1) options ] [ <infile or BibTeXfile(s) ] >outfile

DESCRIPTION

bibsort filters a BibTeX bibliography, or bibliography fragment, on its standard input, printing on standard output a sorted bibliography.

Sorting is normally by BibTeX citation label name, or by @String macro name, and letter case is always ignored in the sorting.


OPTIONS

Command-line options may be abbreviated to a unique leading prefix, and letter case is ignored, so that -option, -Option, -OPTION, -oPtIoN, etc. are all equivalent.

For the sort order options beginning -by, the last one seen overrides all earlier ones.

All options are parsed before any input bibliography files are read, no matter what their order on the command line.

Except for the options described below, command-line words beginning with a hyphen are assumed to be options to be passed to sort(1).

The leading hyphen that distinguishes an option from a filename may be doubled, for compatibility with GNU and POSIX conventions. Thus, -author and --author are equivalent.

All remaining command-line words are assumed to be input files. Should such a filename begin with a hyphen, it must be disguised by a leading absolute or relative directory path, e.g., /tmp/-foo.bib or ./-foo.bib.

The sort(1) -f option to ignore letter case differences is always supplied. The -u option removes duplicate bibliography entries from the input stream; however, such entries must match exactly, including all white space.

Sort keys are constructed from several parts of the BibTeX entry. If non-numeric values are found where numbers are normally expected (that is, for BibTeX day, number, pages, volume, and year keys), they are replaced by large integers that will sort higher than any reasonable integer value likely to be present. Nondigits after the first character are ignored, so 20S will reduce to 20: such values are occasionally seen for volume, number, and pages values.

However, uncertain year values of the form 19xx or 20xx are sorted at the end of their century.

-?
Give a brief help message on stderr, process all further options, but exit with a successful status code (on UNIX, 0) before processing any files.
-author
Give an author credit on stderr, then process all further options, but exit with a successful status code (on UNIX, 0) before processing any files.
-byday
This option is intended for use with bibliographies of publications containing day, month, and year data, such as technical reports, newspapers, and magazines.

With -byday sorting, a day keyword is recognized (it will be standard in BibTeX 1.0), but for backward compatibility, month entries of the form

"daynumber " # monthname
"daynumber~" # monthname
{daynumber } # monthname
{daynumber~} # monthname
monthname # "daynumber "
monthname # "daynumber~"
monthname # {daynumber }
monthname # {daynumber~}

are also recognized, and will yield both a day and a month. If a day number is not available, a very large value is assumed, which will sort the entry after others that have day values in the same year and month.

The sort keys are: <part> <year> <month> <day> <start-pages> <end-pages> <citation-label>, in that order.

The <part> key represents one of the BibTeX file parts described in a later section.

-bylabel
Sort the input by BibTeX citation label. This is the default, if no -byxxx options are specified.

The sort keys are: <part> <citation-label> <journal> <year> <volume> <number> <start- <end-pages>.

The use of additional sort keys after the initial two or three is intentional: that way, entries that are otherwise `equal' will be consistently ordered according to their publication times.

-bypages
This option is intended for use with bibliographies of articles from those journals where page numbers increase monotonically through the volume, across all issue numbers. Do not use it for bibliographies of journals or magazines where page numbers are reset at each issue.

-bypages is similar to -byvolume, except that the issue number is ignored.

The reason for ignoring the issue number is that some journal databases lack that information. If -byvolume were used, then articles lacking issue numbers would be sorted separately from those with issue numbers, which makes it harder to check for duplicates, or to compare entries with original journal issues.

The sort keys are: <part> <journal> <year> <volume> <start-pages> <end-pages> <citation-label>.

-byseriesvolume
This option is intended for use with bibliographies of series, such as Lecture Notes in Mathematics.

The sort keys are: <part> <volume> <citation-label> <journal> <year> <volume> <number> <start-pages> <end-pages>.

-byvolume
This option is intended for use with bibliographies of single journals.

The journal name is included in the sort keys, so that in a bibliography with multiple journals, output entries for each journal are kept together.

With -byvolume sorting, warnings are issued for any entry in which any of these fields are missing, and a value of the missing field is supplied that will sort higher than any printable value.

Because -byvolume sorting is first on journal name, it is essential that there be only one form of each journal name; the best way to ensure this is to always use @String{...} abbreviations for them. Order -byvolume is convenient for checking a bibliography against the original journal, but less convenient for a bibliography user.

The sort keys are: <part> <journal> <year> <volume> <number> <start-pages> <end-pages> <citation-label>.

-byyear
If this option is given, then sorting is first by year, then by citation label. This is useful for keeping a bibliography in approximate chronological order, ordered by citation label within each year.

The sort keys are: <part> <year> <citation-label> <journal> <year> <volume> <number> <start-pages> <end-pages>.

-copyright
Give a brief copyright message on stderr, then process all further options, but exit with a successful status code (on UNIX, 0) before processing any files.
-help
Give a brief help message on stderr, then process all further options, but exit with a successful status code (on UNIX, 0) before processing any files.
-reverse
Reverse the order of the sort. This option does not affect the ordering of the BibTeX file parts (see below). It applies only to the bibliographic entries, and within those entries, only to the citation label and `numeric' fields (volume, number, pages, day, month, and year).

Thus, bibsort -reverse -byvolume for a bibliography with multiple journals will sort entries for each journal in reverse publication order, but the journal blocks will still be in ascending order by journal name.

-version
Give a brief version number message on stderr, then process all further options, but exit with a successful status code (on UNIX, 0) before processing any files.

BIBTEX FILE PARTS

The input stream is conceptually divided into five parts, any of which may be absent.
1.
Introductory material such as comments, file headers, and edit logs that are ignored by BibTeX. No line in this part begins with an at-sign, ``@''.
2.
Preamble material delineated by ``@Preamble{'' and a matching closing ``}'', intended to be processed by TeX. Normally, there is only one such entry in a bibliography file, although BibTeX, and bibsort, permit more than one.
3.
Macro definitions (abbreviations) of the form ``@String{...}''. Any single @String specification may span multiple lines, and there are usually several such definitions.
4.
Bibliography entries such as ``@Article{...}'', ``@Book{...}'', ``@InProceedings{...}'', and so on, provided that their citation labels have not already been encountered in a crossref assignment in a preceding entry. For bibsort, any line that begins with an ``@'' followed by letters and digits and an open brace is considered to be such an entry. Optional spaces and tabs may surround the ``@'', and precede the first open brace; these spaces and tabs will be deleted from the output to help standardize the appearance.
5.
``@Proceedings{...}'' bibliography entries, which are likely to be cross-referenced by ``@InProceedings{...}'' entries, and any other bibliography entries for which a crossref assignment was met before the entry itself.

An unfortunate implementation limitation of the current BibTeX requires cross-referenced entries to appear after all other entries that cross-reference them, although this limitation works to the advantage of bibsort, allowing single-pass processing.

The order of these parts is preserved in the output stream. Part 1 will be unchanged, but parts 2--5 will be sorted within themselves.

The sort key of ``@Preamble'' entries is their initial line, of ``@String'' entries, the abbreviation name. For all other BibTeX entries, the sort key is citation label between the open curly brace and the trailing comma, unless the sort key is prefixed with additional fields as requested by -byvolume or -byyear options.

bibsort will correctly handle UNIX files with LF line terminators, as well as IBM PC DOS files with CR LF line terminators; the essential requirement is that input lines be delineated by LF characters. Thus, files from the Apple Macintosh, which uses bare CR to terminate lines, would first have to be converted to UNIX or PC DOS line format before giving them to bibsort.


CAVEATS

BibTeX has loose syntactical requirements that the current simple implementation of bibsort does not support. In particular, outer parentheses may not be used in place of braces following ``@keyword'' patterns. If you have such a file, you can use bibclean(1) to prettyprint it into a form that bibsort can handle successfully.

The user must be aware that sorting a bibliography is not without peril, for at least these reasons:

1.
BibTeX has a requirement that entry labels given in crossref = label pairs in a bibliography entry must refer to entries defined later, rather than earlier, in the bibliography file. This regrettable implementation limitation of the current (pre-1.0) BibTeX prevents arbitrary ordering of entries when crossref values are present. To partially solve this problem, bibsort will place ``@Proceedings'' entries last, since they are frequently cross-referenced by ``@InProceedings'' entries. However, it is also possible for ``@Book'', ``@InBook'', and ``@InCollection'' entries to cross-reference ``@Book'' entries, and for ``@Article'' entries to cross-reference other ``@Article'' entries. Neither of these cases are dealt with by bibsort, except that ``@Book'' entries that contain a ``booktitle'' assignment, and entries that are explicitly cross-referenced before their definition, are sorted with ``@Proceedings'',
2.
If the BibTeX file contains interspersed commentary between ``@keyword{...}'' entries, this material will be considered part of the preceding entry, and will be sorted with it. Leading commentary is more common, and will be moved elsewhere in the file.

This is normally not a problem for the part 1 material before the ``@Preamble'', since it is kept together at the beginning of the output stream.

3.
Some kinds of bibliography files should be kept in a different order than alphabetically by citation labels. Good examples are a bibliography file with the contents of a journal, or a personal publication list, for both of which chronological publication order is likely to be preferred.

While a much more sophisticated implementation of bibsort could deal with the first point, and the -byvolume option provides a partial solution to the third point, in general, a satisfactory solution requires human intelligence and natural language understanding that computers lack.

bibsort uses octal ASCII control characters 001 through 007, 0177, and 0377, for temporary modifications of the input stream. If any of these are already present in the input, they will be altered on output. This is unlikely to be a problem, because those characters have neither a printable representation, nor are they conventionally used to mark line or page boundaries in text files.


PROGRAMMING NOTES

Some text editors permit application of an arbitrary filter command to a region of text. For example, in GNU emacs(1), the command C-u M-x shell-command-on-region, or equivalently, C-u M-|, can be used to run bibsort on a region of the buffer that is devoid of cross references and other material that cannot be safely sorted.

Some implementations of BibTeX editing support in GNU emacs(1) have a sort-bibtex-entries command that is functionally similar to bibsort. However, the file size that can be processed by emacs(1) is limited, while bibsort can be used on arbitrarily large files, since it acts as a filter, processing a small amount of data at a time. The sort stage needs the entire data stream, but fortunately, the UNIX sort(1) command is clever enough to deal with very large inputs.

The current implementation of bibsort follows the UNIX tradition of combining simple already-available tools. A six-stage pipeline of egrep(1), nawk(1), sort(1), and tr(1) accomplishes the job in one pass with about 900 lines of heavily-commented shell script, about 500 lines of which is a nawk(1) program for insertion of sort keys. The initial prototype of bibsort was written and tested on several large bibliographies in a couple of hours, and after considerable use, was later extended with advanced sorting capabilities and cross-reference recognition in a couple of days of work. By contrast, bibtex(1) is more than 11 000 lines of code and documentation, and bibclean(1) is more than 15 000 lines long; both took months to develop, implement, and test.


BUGS

bibsort may fail on some UNIX systems if their sort(1) implementations cannot handle very long lines, because for sorting purposes, each complete bibliography entry is temporarily folded into a single line. You may be able to overcome this problem by adding a -znnnnn option to the sort(1) command (passed via the command line to bibsort) to increase the maximum line size to some larger value of nnnnn bytes. According to their documentation, some UNIX sort(1) implementations require a space after -z, others forbid it, and still others do not support it at all. If a space is required, you must quote the pair, to prevent the nnnnn value from being interpreted as a filename by bibsort.

SEE ALSO

bibcheck(1), bibclean(1), bibdup(1), bibextract(1), bibjoin(1), biblabel(1), biblex(1), biborder(1), bibparse(1), bibsearch(1), bibsplit(1), bibtex(1), bibunlex(1), citesub(1), egrep(1), emacs(1), gawk(1), mawk(1), nawk(1), sort(1), tr(1).

AUTHOR

Nelson H. F. Beebe, Ph.D.
Center for Scientific Computing
University of Utah
Department of Mathematics, 322 INSCC
155 S 1400 E RM 233
Salt Lake City, UT 84112-0090
USA
Tel: +1 801 581 5254
FAX: +1 801 585 1640, +1 801 581 4148
Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet)
WWW URL: http://www.math.utah.edu/~beebe

COPYRIGHT

########################################################################
########################################################################
########################################################################
###                                                                  ###
###             bibsort: sort a BibTeX bibliography file             ###
###                                                                  ###
###              Copyright (C) 2000 Nelson H. F. Beebe               ###
###                                                                  ###
### This program is covered by the GNU General Public License (GPL), ###
### version 2 or later, available as the file COPYING in the program ###
### source distribution, and on the Internet at                      ###
###                                                                  ###
###               ftp://ftp.gnu.org/gnu/GPL                          ###
###                                                                  ###
###               http://www.gnu.org/copyleft/gpl.html               ###
###                                                                  ###
### This program is free software; you can redistribute it and/or    ###
### modify it under the terms of the GNU General Public License as   ###
### published by the Free Software Foundation; either version 2 of   ###
### the License, or (at your option) any later version.              ###
###                                                                  ###
### This program is distributed in the hope that it will be useful,  ###
### but WITHOUT ANY WARRANTY; without even the implied warranty of   ###
### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the    ###
### GNU General Public License for more details.                     ###
###                                                                  ###
### You should have received a copy of the GNU General Public        ###
### License along with this program; if not, write to the Free       ###
### Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,   ###
### MA 02111-1307 USA                                                ###
########################################################################
########################################################################
########################################################################

AVAILABILITY

Internet source distributions of bibsort are available at the World-Wide Web Uniform Resource Locator addresses

ftp://ftp.math.utah.edu/pub/tex/bib/bibsort-x.yy.jar
ftp://ftp.math.utah.edu/pub/tex/bib/bibsort-x.yy.tar.gz
ftp://ftp.math.utah.edu/pub/tex/bib/bibsort-x.yy.zip
ftp://ftp.math.utah.edu/pub/tex/bib/bibsort-x.yy.zoo

http://www.math.utah.edu/pub/tex/bib/bibsort-x.yy.jar
http://www.math.utah.edu/pub/tex/bib/bibsort-x.yy.tar.gz
http://www.math.utah.edu/pub/tex/bib/bibsort-x.yy.zip
http://www.math.utah.edu/pub/tex/bib/bibsort-x.yy.zoo

where x.yy is the current version (0.15 for the version whose documentation you are now reading).

That site is mirrored to several other Internet archives, so you may also be able to find it elsewhere on the Internet; try searching for the string bibsort at one or more of the popular Web search sites, such as

http://altavista.digital.com/
http://search.microsoft.com/us/default.asp
http://www.dejanews.com/
http://www.dogpile.com/index.html
http://www.euroseek.net/page?ifl=uk
http://www.excite.com/
http://www.go2net.com/search.html
http://www.google.com/
http://www.hotbot.com/
http://www.infoseek.com/
http://www.inktomi.com/
http://www.lycos.com/
http://www.northernlight.com/
http://www.snap.com/
http://www.stpt.com/
http://www.yahoo.com/