


User Commands					       BIBSORT(1)



NAME
     bibsort - sort a BibTeX bibliography file

SYNOPSIS
     bibsort [-?]  [-author]
	     [-byday  or -bypages  or -byseriesvolume
	      or -byvolume  or -byyear]
	     [-copyright] [-help] [-version]
	     [ optional	sort(1)	options	]
	     [ <infile or BibTeXfile(s)	] >outfile

DESCRIPTION
     bibsort filters a BibTeX bibliography, or bibliography frag-
     ment,  on	its standard input, printing on	standard output	a
     sorted bibliography.

     Sorting is	normally by BibTeX citation  label  name,  or  by
     @String macro name, and letter case is always ignored in the
     sorting.

OPTIONS
     Command-line options may be abbreviated to	a unique  leading
     prefix.

     For the sort order	options	beginning -by, the last	one  seen
     overrides all earlier ones.

     All options are parsed before any input  bibliography  files
     are read, no matter what their order on the command line.

     Except for	the options described below,  command-line  words
     beginning	with  a	 hyphen	 are  assumed to be options to be
     passed to sort(1).

     The leading hyphen	 that  distinguishes  an  option  from	a
     filename  may  be	doubled,  for  compatibility with GNU and
     POSIX  conventions.   Thus,   -author   and   --author   are
     equivalent.

     All remaining command-line	words are  assumed  to	be  input
     files.   Should such a filename begin with	a hyphen, it must
     be	disguised by a leading	absolute  or  relative	directory
     path, e.g.	 /tmp/-foo.bib or ./-foo.bib.

     The sort(1) -f option to ignore letter case  differences  is
     always  supplied.	 The  -r option	reverses the order of the
     sort. The -u option removes duplicate  bibliography  entries
     from  the	input  stream;	however,  such entries must match
     exactly, including	all white space.

     Sort keys are constructed from several parts of  the  BibTeX
     entry.   If  non-numeric  values are found	where numbers are



Version	0.14	  Last change: 08 October 1999			1






User Commands					       BIBSORT(1)



     normally expected (that is, for BibTeX day,  number,  pages,
     volume, and year, keys), they are replaced	by large integers
     that will sort higher  than  any  reasonable  integer  value
     likely  to	 be present.  Nondigits	after the first	character
     are ignored, so 20S will reduce  to  20:	such  values  are
     occasionally seen for volume, number, and pages values.

     However, uncertain	year values of the form	19xx or	20xx  are
     sorted at the end of their	century.

     -?		      Give a brief help	message	on  stdout,  pro-
		      cess  all	 further options, but exit with	a
		      successful status	code (on UNIX, 0)  before
		      processing any files.

     -author	      Give an author credit on stdout, then  pro-
		      cess  all	 further options, but exit with	a
		      successful status	code (on UNIX, 0)  before
		      processing any files.

     -byday	      This  option  is	intended  for  use   with
		      bibliographies  of  publications containing
		      day, month, and year data, such as  techni-
		      cal reports, newspapers, and magazines.  It
		      causes entries to	be sorted by year, month,
		      day,   and  citation  label,  so	that  the
		      entries appear in	their  original	 publica-
		      tion order.

		      With  -byday  sorting,  a	 day  keyword  is
		      recognized  (it  will be standard	in BibTeX
		      1.0), but	for backward compatibility, month
		      entries of the form


		      "daynumber " # monthname
		      "daynumber~" # monthname
		      {daynumber } # monthname
		      {daynumber~} # monthname
		      monthname	# "daynumber "
		      monthname	# "daynumber~"
		      monthname	# {daynumber }
		      monthname	# {daynumber~}

		      are also recognized, and will yield both	a
		      day  and	a  month.  If a	day number is not
		      available, a very	large value  is	 assumed,
		      which will sort the entry	after others that
		      have day values in the same year and month.

     -bypages	      This option is intended for use with  jour-
		      nal  article bibliographies.  It is similar



Version	0.14	  Last change: 08 October 1999			2






User Commands					       BIBSORT(1)



		      to -byvolume, except that	the issue  number
		      is  ignored:  thus, it causes entries to be
		      sorted  by  journal,  year,  volume,  page,
		      year,  and  citation  label,  so	that  the
		      entries appear in	their  original	 publica-
		      tion  order.   The journal name is included
		      in the sort key, so that in a  bibliography
		      with  multiple journals, output entries for
		      each journal are kept together.

		      The reason for ignoring the issue	number is
		      that   some  journal  databases  lack  that
		      information.  If -byvolume were used,  then
		      articles	lacking	 issue	numbers	 would be
		      sorted separately	 from  those  with  issue
		      numbers, which makes it harder to	check for
		      duplicates, or to	compare	entries	with ori-
		      ginal journal issues.

     -byseriesvolume  This  option  is	intended  for  use   with
		      bibliographies  of  series, such as Lecture
		      Notes  in	 Mathematics.	Only  the  volume
		      number  and  citation  label  are	 used  in
		      preparing	the sort key.

     -byvolume	      This  option  is	intended  for  use   with
		      bibliographies   of  single  journals.   It
		      causes entries to	 be  sorted  by	 journal,
		      year, volume, number, page, year,	and cita-
		      tion label, so that the entries  appear  in
		      their   original	publication  order.   The
		      journal name is included in the  sort  key,
		      so  that	in  a  bibliography with multiple
		      journals,	output entries for  each  journal
		      are kept together.

		      With -byvolume sorting, warnings are issued
		      for  any entry in	which any of these fields
		      are missing, and a  value	 of  the  missing
		      field  is	 supplied  that	 will sort higher
		      than any printable value.

		      Because -byvolume	sorting	is first on jour-
		      nal  name,  it  is  essential that there be
		      only one form of	each  journal  name;  the
		      best  way	 to  ensure this is to always use
		      @String{...}    abbreviations   for   them.
		      Order  -byvolume is convenient for checking
		      a	bibliography against the  original  jour-
		      nal, but less convenient for a bibliography
		      user.




Version	0.14	  Last change: 08 October 1999			3






User Commands					       BIBSORT(1)



     -byyear	      If this option is	 given,	 then  the  entry
		      year  value is prefixed to the sort key, so
		      that sorting is  first  by  year,	 then  by
		      citation label.  This is useful for keeping
		      a	bibliography in	approximate chronological
		      order,  ordered  by  citation  label within
		      each year.

     -copyright	      Give a brief copyright message  on  stdout,
		      then  process all	further	options, but exit
		      with a successful	status code (on	UNIX,  0)
		      before processing	any files.

     -help	      Give a brief help	message	on  stdout,  then
		      process  all further options, but	exit with
		      a	 successful  status  code  (on	UNIX,  0)
		      before processing	any files.

     -version	      Give a  brief  version  number  message  on
		      stdout,  then  process all further options,
		      but exit with a successful status	code  (on
		      UNIX, 0) before processing any files.

BIBTEX FILE PARTS
     The input stream is conceptually divided  into  five  parts,
     any of which may be absent.

	  1.  Introductory  material  such  as	 comments,   file
	      headers,	and edit logs that are ignored by BibTeX.
	      No line in this part begins with an at-sign, ``@''.

	  2.  Preamble material	delineated by ``@Preamble{''  and
	      a	 matching closing ``}'', intended to be	processed
	      by TeX.  Normally, there is only one such	entry  in
	      a	 bibliography file, although BibTeX, and bibsort,
	      permit more than one.

	  3.  Macro  definitions  (abbreviations)  of  the   form
	      ``@String{...}''.	 Any single @String specification
	      may span multiple	 lines,	 and  there  are  usually
	      several such definitions.

	  4.  Bibliography  entries  such  as  ``@Article{...}'',
	      ``@Book{...}'', ``@InProceedings{...}'', and so on,
	      provided	that  their  citation  labels  have   not
	      already  been  encountered in a crossref assignment
	      in a preceding entry.  For bibsort, any  line  that
	      begins with an ``@'' followed by letters and digits
	      and an open brace	 is  considered	 to  be	 such  an
	      entry.   Optional	 spaces	and tabs may surround the
	      ``@'', and precede  the  first  open  brace;  these
	      spaces  and tabs will be deleted from the	output to



Version	0.14	  Last change: 08 October 1999			4






User Commands					       BIBSORT(1)



	      help standardize the appearance.

	  5.  ``@Proceedings{...}'' bibliography  entries,  which
	      are  likely to be	cross-referenced by ``@InProceed-
	      ings{...}'' entries,  and	 any  other  bibliography
	      entries  for  which  a  crossref assignment was met
	      before the entry itself.

	  An unfortunate implementation	limitation of the current
	  BibTeX  requires  cross-referenced  entries  to  appear
	  after	all  other  entries  that  cross-reference  them,
	  although this	limitation works to the	advantage of bib-
	  sort,	allowing single-pass processing.

     The order of these	parts is preserved in the output  stream.
     Part 1 will be unchanged, but parts 25 will be sorted within
     themselves.

     The sort key of ``@Preamble'' entries is their initial line,
     of	 ``@String''  entries,	the  abbreviation  name.  For all
     other BibTeX entries, the sort key	is citation label between
     the open curly brace and the trailing comma, unless the sort
     key is prefixed  with  additional	fields	as  requested  by
     -byvolume or -byyear options.

     bibsort will correctly handle UNIX	files with LF line termi-
     nators, as	well as	IBM PC DOS files with CR LF line termina-
     tors; the essential requirement is	that input lines be  del-
     ineated by	LF characters.	Thus, files from the Apple Macin-
     tosh, which uses bare CR to  terminate  lines,  would  first
     have  to  be  converted to	UNIX or	PC DOS line format before
     giving them to bibsort.

CAVEATS
     BibTeX has	loose syntactical requirements that  the  current
     simple  implementation of bibsort does not	support.  In par-
     ticular, outer parentheses	may  not  be  used  in	place  of
     braces  following ``@keyword'' patterns.  If you have such	a
     file, you can use bibclean(1) to prettyprint it into a  form
     that bibsort can handle successfully.

     The user must be aware that sorting a  bibliography  is  not
     without peril, for	at least these reasons:

	  1.  BibTeX has a requirement that entry labels given in
	      crossref = label pairs in	a bibliography entry must
	      refer to entries defined later,  rather  than  ear-
	      lier,  in	 the bibliography file.	 This regrettable
	      implementation limitation	of the current	(pre-1.0)
	      BibTeX  prevents arbitrary ordering of entries when
	      crossref values are present.   To	 partially  solve
	      this  problem,  bibsort will place ``@Proceedings''



Version	0.14	  Last change: 08 October 1999			5






User Commands					       BIBSORT(1)



	      entries last,  since  they  are  frequently  cross-
	      referenced by ``@InProceedings'' entries.	 However,
	      it is also possible for ``@Book'', ``@InBook'', and
	      ``@InCollection''	   entries   to	  cross-reference
	      ``@Book''	entries, and for ``@Article'' entries  to
	      cross-reference  other  ``@Article'' entries.  Nei-
	      ther of these cases  are	dealt  with  by	 bibsort,
	      except   that  ``@Book''	entries	 that  contain	a
	      ``booktitle''  assignment,  and  entries	that  are
	      explicitly  cross-referenced  before  their defini-
	      tion, are	sorted with ``@Proceedings'',

	  2.  If the BibTeX file contains interspersed commentary
	      between  ``@keyword{...}''  entries,  this material
	      will be considered part of the preceding entry, and
	      will be sorted with it.  Leading commentary is more
	      common, and will be moved	elsewhere in the file.

	      This is normally not  a  problem	for  the  part	1
	      material before the ``@Preamble'', since it is kept
	      together at the beginning	of the output stream.

	  3.  Some kinds of bibliography files should be kept  in
	      a	 different  order than alphabetically by citation
	      labels.  Good examples are a bibliography	file with
	      the  contents  of	a journal, or a	personal publica-
	      tion list, for both of which chronological publica-
	      tion order is likely to be preferred.

     While a much more sophisticated  implementation  of  bibsort
     could  deal  with	the first point, and the -byvolume option
     provides a	partial	solution to the	third point, in	 general,
     a	satisfactory  solution	requires  human	 intelligence and
     natural language understanding that computers lack.

     bibsort uses octal	ASCII control characters 001 through 007,
     0177,  and	 0377,	for  temporary modifications of	the input
     stream.  If any of	these are already present in  the  input,
     they  will	 be  altered on	output.	 This is unlikely to be	a
     problem, because those characters have neither  a	printable
     representation,  nor  are	they  conventionally used to mark
     line or page boundaries in	text files.

PROGRAMMING NOTES
     Some text editors permit application of an	arbitrary  filter
     command  to a region of text.  For	example, in GNU	emacs(1),
     the   command   C-u    M-x	   shell-command-on-region,    or
     equivalently,  C-u	 M-|,  can  be	used  to run bibsort on	a
     region of the buffer that is devoid of cross references  and
     other material that cannot	be safely sorted.





Version	0.14	  Last change: 08 October 1999			6






User Commands					       BIBSORT(1)



     Some  implementations  of	BibTeX	editing	 support  in  GNU
     emacs(1)  have  a	sort-bibtex-entries command that is func-
     tionally similar to bibsort.  However, the	 file  size  that
     can  be  processed	by emacs(1) is limited,	while bibsort can
     be	used on	arbitrarily large  files,  since  it  acts  as	a
     filter,  processing  a  small amount of data at a time.  The
     sort stage	needs the entire data  stream,	but  fortunately,
     the  UNIX sort(1) command is clever enough	to deal	with very
     large inputs.

     The current implementation	of bibsort follows the UNIX trad-
     ition  of	combining simple already-available tools.  A six-
     stage pipeline of	egrep(1),  nawk(1),  sort(1),  and  tr(1)
     accomplishes  the	job  in	 one pass with about 800 lines of
     heavily-commented shell script, about 400 lines of	which  is
     a	nawk(1)	 program for insertion of sort keys.  The initial
     prototype of bibsort was written and tested on several large
     bibliographies  in	a couple of hours, and after considerable
     use, was later extended with advanced  sorting  capabilities
     and cross-reference recognition in	a couple of days of work.
     By	contrast, bibtex(1) is more than 11 000	lines of code and
     documentation,  and  bibclean(1)  is  more	than 15	000 lines
     long; both	took months to develop,	implement, and test.

BUGS
     bibsort may fail on  some	UNIX  systems  if  their  sort(1)
     implementations  cannot  handle very long lines, because for
     sorting purposes, each complete bibliography entry	 is  tem-
     porarily  folded  into  a	single	line.  You may be able to
     overcome this problem by adding  a	 -znnnnn  option  to  the
     sort(1)  command (passed via the command line to bibsort) to
     increase the maximum line size to some larger value of nnnnn
     bytes.   According	to their documentation,	some UNIX sort(1)
     implementations require a space after -z, others forbid  it,
     and  still	 others	 do not	support	it at all.  If a space is
     required, you must	quote the  pair,  to  prevent  the  nnnnn
     value from	being interpreted as a filename	by bibsort.

SEE ALSO
     bibcheck(1),  bibclean(1),	 bibdup(1),  bibextract(1),  bib-
     join(1),  biblabel(1),  biblex(1),	biborder(1), bibparse(1),
     bibsearch(1), bibtex(1), bibunlex(1), citesub(1),	egrep(1),
     emacs(1), gawk(1),	mawk(1), nawk(1), sort(1), tr(1).

AUTHOR
     Nelson H. F. Beebe, Ph.D.
     Center for	Scientific Computing
     University	of Utah
     Department	of Mathematics,	322 INSCC
     155 S 1400	E RM 233
     Salt Lake City, UT	84112-0090
     USA



Version	0.14	  Last change: 08 October 1999			7






User Commands					       BIBSORT(1)



     Tel: +1 801 581 5254
     FAX: +1 801 585 1640, +1 801 581 4148
     Email: beebe@math.utah.edu, beebe@acm.org,	beebe@ieee.org (Internet)
     WWW URL: http://www.math.utah.edu/~beebe

AVAILABILITY
     bibsort is	freely available; its master distribution can  be
     found at

	  ftp://ftp.math.utah.edu/pub/tex/bib/

     in	the file bibsort-x.yy.tar.gz where x.yy	 is  the  current
     version.	Other  distribution formats are	usually	available
     in	the same location.

     That site is mirrored to several other Internet archives, so
     you  may  also be able to find it elsewhere on the	Internet;
     try searching for the string bibsort at one or more  of  the
     popular Web search	sites, such as

	  http://altavista.digital.com/
	  http://search.microsoft.com/us/default.asp
	  http://www.dejanews.com/
	  http://www.dogpile.com/index.html
	  http://www.euroseek.net/page?ifl=uk
	  http://www.excite.com/
	  http://www.go2net.com/search.html
	  http://www.google.com/
	  http://www.hotbot.com/
	  http://www.infoseek.com/
	  http://www.inktomi.com/
	  http://www.lycos.com/
	  http://www.northernlight.com/
	  http://www.snap.com/
	  http://www.stpt.com/
	  http://www.yahoo.com/



















Version	0.14	  Last change: 08 October 1999			8



