


BIBSORT(1)		  User Commands		       BIBSORT(1)



NAME
     bibsort - sort a BibTeX bibliography file

SYNOPSIS
     bibsort [-byvolumeor-byyear] [ optional sort(1) switches ]	<
     infile >outfile
     or
     bibsort [-byvolumeor-byyear] [ optional sort(1)  switches	]
     BibTeXfile(s) >outfile

DESCRIPTION
     bibsort filters a BibTeX bibliography, or bibliography frag-
     ment,  on	its standard input, printing on	standard output	a
     sorted bibliography.

     Sorting is	normally by BibTeX citation  label  name,  or  by
     @String macro name, and letter case is always ignored in the
     sorting.

     The -byvolume switch is intended for use with bibliographies
     of	single journals.  It causes entries to be sorted by jour-
     nal, volume, number, page,	year, and citation label, so that
     the entries appear	in their original publication order.  The
     journal name is included in the  sort  key,  so  that  in	a
     bibliography with multiple	journals, output entries for each
     journal are kept together.

     With -byvolume sorting, warnings are issued for any entry in
     which  any	 of  these fields are missing, and a value of the
     missing field is supplied that will  sort	higher	than  any
     printable value.

     Because -byvolume sorting is first	on journal  name,  it  is
     essential	that there be only one form of each journal name;
     the best way to ensure this is to	always	use  @String{...}
     abbreviations  for	 them.	Order -byvolume	is convenient for
     checking a	bibliography against the  original  journal,  but
     less convenient for a bibliography	user.

     If	the optional -byyear switch is given, then the entry year
     value  is prefixed	to the sort key, so that sorting is first
     by	year, then by citation label.  This is useful for keeping
     a	bibliography  in approximate chronological order, ordered
     by	citation label within each year.

     Other command-line	words beginning	with a hyphen are assumed
     to	be options to be passed	to sort(1).

     All remaining command-line	words are  assumed  to	be  input
     files.   Should such a filename begin with	a hyphen, it must
     be	disguised by a leading	absolute  or  relative	directory
     path, e.g.	 /tmp/-foo.bib or ./-foo.bib.



Version	0.07	 Last change: 21 September 1994			1






BIBSORT(1)		  User Commands		       BIBSORT(1)



     The sort(1) -f switch to ignore letter case  differences  is
     always  supplied.	 The  -r switch	reverses the order of the
     sort. The -u switch removes duplicate  bibliography  entries
     from  the	input  stream;	however,  such entries must match
     exactly, including	all white space.

     The input stream is conceptually divided  into  five  parts,
     any of which may be absent.

	  1.  Introductory  material  such  as	 comments,   file
	      headers,	and edit logs that are ignored by BibTeX.
	      No line in this part begins with an at-sign, ``@''.

	  2.  Preamble material	delineated by ``@Preamble{''  and
	      a	 matching closing ``}'', intended to be	processed
	      by TeX.  Normally, there is only one such	entry  in
	      a	 bibliography file, although BibTeX, and bibsort,
	      permit more than one.

	  3.  Macro  definitions  (abbreviations)  of  the   form
	      ``@String{...}''.	 Any single @String specification
	      may span multiple	 lines,	 and  there  are  usually
	      several such definitions.

	  4.  Bibliography  entries  such  as  ``@Article{...}'',
	      ``@Book{...}'', ``@InProceedings{...}'', and so on,
	      provided	that  their  citation  labels  have   not
	      already  been  encountered in a crossref assignment
	      in a preceding entry.  For bibsort, any  line  that
	      begins with an ``@'' followed by letters and digits
	      and an open brace	 is  considered	 to  be	 such  an
	      entry.   Optional	 spaces	and tabs may surround the
	      ``@'', and precede  the  first  open  brace;  these
	      spaces  and tabs will be deleted from the	output to
	      help standardize the appearance.

	  5.  ``@Proceedings{...}'' bibliography  entries,  which
	      are  likely to be	cross-referenced by ``@InProceed-
	      ings{...}'' entries,  and	 any  other  bibliography
	      entries  for  which  a  crossref assignment was met
	      before the entry itself.

	  An unfortunate implementation	limitation of the current
	  BibTeX  requires  cross-referenced  entries  to  appear
	  after	all  other  entries  that  cross-reference  them,
	  although this	limitation works to the	advantage of bib-
	  sort,	allowing single-pass processing.

     The order of these	parts is preserved in the output  stream.
     Part 1 will be unchanged, but parts 25 will be sorted within
     themselves.




Version	0.07	 Last change: 21 September 1994			2






BIBSORT(1)		  User Commands		       BIBSORT(1)



     The sort key of ``@Preamble'' entries is their initial line,
     of	 ``@String''  entries,	the  abbreviation  name.  For all
     other BibTeX entries, the sort key	is citation label between
     the open curly brace and the trailing comma, unless the sort
     key is prefixed with additional fields as	requested  by	-
     byvolume or -byyear switches.

     bibsort will correctly handle UNIX	files with LF line termi-
     nators, as	well as	IBM PC DOS files with CR LF line termina-
     tors; the essential requirement is	that input lines be  del-
     ineated by	LF characters.	Thus, files from the Apple Macin-
     tosh, which uses bare CR to  terminate  lines,  would  first
     have  to  be  converted to	UNIX or	PC DOS line format before
     giving them to bibsort.

CAVEATS
     BibTeX has	loose syntactical requirements that  the  current
     simple  implementation of bibsort does not	support.  In par-
     ticular, outer parentheses	may  not  be  used  in	place  of
     braces  following ``@keyword'' patterns.  If you have such	a
     file, you can use bibclean(1) to prettyprint it into a  form
     that bibsort can handle successfully.

     The user must be aware that sorting a  bibliography  is  not
     without peril, for	at least these reasons:

	  1.  BibTeX has a requirement that entry labels given in
	      crossref = label pairs in	a bibliography entry must
	      refer to entries defined later,  rather  than  ear-
	      lier,  in	 the bibliography file.	 This regrettable
	      implementation limitation	of the current	(pre-1.0)
	      BibTeX  prevents arbitrary ordering of entries when
	      crossref values are present.   To	 partially  solve
	      this  problem,  bibsort will place ``@Proceedings''
	      entries last,  since  they  are  frequently  cross-
	      referenced by ``@InProceedings'' entries.	 However,
	      it is also possible for ``@Book'', ``@InBook'', and
	      ``@InCollection''	   entries   to	  cross-reference
	      ``@Book''	entries, and for ``@Article'' entries  to
	      cross-reference  other  ``@Article'' entries.  Nei-
	      ther of these cases  are	dealt  with  by	 bibsort,
	      except   that  ``@Book''	entries	 that  contain	a
	      ``booktitle''  assignment,  and  entries	that  are
	      explicitly  cross-referenced  before  their defini-
	      tion, are	sorted with ``@Proceedings'',

	  2.  If the BibTeX file contains interspersed commentary
	      between  ``@keyword{...}''  entries,  this material
	      will be considered part of the preceding entry, and
	      will be sorted with it.  Leading commentary is more
	      common, and will be moved	elsewhere in the file.




Version	0.07	 Last change: 21 September 1994			3






BIBSORT(1)		  User Commands		       BIBSORT(1)



	      This is normally not  a  problem	for  the  part	1
	      material before the ``@Preamble'', since it is kept
	      together at the beginning	of the output stream.

	  3.  Some kinds of bibliography files should be kept  in
	      a	 different  order than alphabetically by citation
	      labels.  Good examples are a bibliography	file with
	      the  contents  of	a journal, or a	personal publica-
	      tion list, for both of which chronological publica-
	      tion order is likely to be preferred.

     While a much more sophisticated  implementation  of  bibsort
     could  deal  with	the first point, and the -byvolume switch
     provides a	partial	solution to the	third point, in	 general,
     a	satisfactory  solution	requires  human	 intelligence and
     natural language understanding that computers lack.

     bibsort uses octal	ASCII control characters 001 through 007,
     0177,  and	 0377,	for  temporary modifications of	the input
     stream.  If any of	these are already present in  the  input,
     they  will	 be  altered on	output.	 This is unlikely to be	a
     problem, because those characters have neither  a	printable
     representation,  nor  are	they  conventionally used to mark
     line or page boundaries in	text files.

PROGRAMMING NOTES
     Some text editors permit application of an	arbitrary  filter
     command  to a region of text.  For	example, in GNU	emacs(1),
     the   command   C-u    M-x	   shell-command-on-region,    or
     equivalently,  C-u	 M-|,  can  be	used  to run bibsort on	a
     region of the buffer that is devoid of cross references  and
     other material that cannot	be safely sorted.

     Some  implementations  of	BibTeX	editing	 support  in  GNU
     emacs(1)  have  a	sort-bibtex-entries command that is func-
     tionally similar to bibsort.  However, the	 file  size  that
     can  be  processed	by emacs(1) is limited,	while bibsort can
     be	used on	arbitrarily large  files,  since  it  acts  as	a
     filter,  processing  a  small amount of data at a time.  The
     sort stage	needs the entire data  stream,	but  fortunately,
     the  UNIX sort(1) command is clever enough	to deal	with very
     large inputs.

     The current implementation	of bibsort follows the UNIX trad-
     ition  of	combining simple already-available tools.  A six-
     stage pipeline of	egrep(1),  nawk(1),  sort(1),  and  tr(1)
     accomplishes  the	job  in	 one pass with about 500 lines of
     heavily-commented shell script, about 225 lines of	which  is
     a	nawk(1)	 program for insertion of sort keys.  The initial
     prototype of bibsort was written and tested on several large
     bibliographies  in	a couple of hours, and after considerable
     use, was later extended with advanced  sorting  capabilities



Version	0.07	 Last change: 21 September 1994			4






BIBSORT(1)		  User Commands		       BIBSORT(1)



     and cross-reference recognition in	a couple of days of work.
     By	contrast, bibtex(1) is more than 11 000	lines of code and
     documentation,  and  bibclean(1)  is  more	than 15	000 lines
     long; both	took months to develop,	implement, and test.

BUGS
     bibsort may fail on  some	UNIX  systems  if  their  sort(1)
     implementations  cannot  handle very long lines, because for
     sorting purposes, each complete bibliography entry	 is  tem-
     porarily  folded  into  a	single	line.  You may be able to
     overcome this problem by adding a	- znnnnn  switch  to  the
     sort(1)  command (passed via the command line to bibsort) to
     increase the maximum line size to some larger value of nnnnn
     bytes.   According	to their documentation,	some UNIX sort(1)
     implementations require a space after -z, others forbid  it,
     and  still	 others	 do not	support	it at all.  If a space is
     required, you must	quote the  pair,  to  prevent  the  nnnnn
     value from	being interpreted as a filename	by bibsort.

SEE ALSO
     bibclean(1),  bibextract(1),  biblex(1),  biborder(1),  bib-
     parse(1),	 bibtex(1),   bibunlex(1),   egrep(1),	emacs(1),
     nawk(1), sort(1), tr(1).

AUTHOR
     Nelson H. F. Beebe, Ph.D.
     Center for	Scientific Computing
     Department	of Mathematics
     University	of Utah
     Salt Lake City, UT	84112
     Tel: +1 801 581 5254
     FAX: +1 801 581 4148
     Email: <beebe@math.utah.edu>






















Version	0.07	 Last change: 21 September 1994			5



