


BIBJOIN(1)		  User Commands		       BIBJOIN(1)



NAME
     bibjoin - join duplicate or  similar  entries  in	a  BibTeX
     bibliography file

SYNOPSIS
     bibjoin [-check-missing] [-version] < infile > outfile
     or
     bibjoin [-check-missing] [-version] BibTeXfile(s) > outfile

DESCRIPTION
     bibjoin  filters  one  or	more  BibTeX  bibliographies,  or
     bibliography  fragments,  from  the specified files, or from
     its standard input	if no filenames	are provided, printing on
     standard  output a	bibliography in	which adjacent duplicate,
     or	similar, entries have been joined into one  entry.   Such
     action  may  be necessary when bibliography entries are col-
     lected from many sources.

     bibjoin should be applied to a bibliography file only  after
     entries  have  been  suitably ordered so that candidates for
     joining appear  consecutively.   This  can	 be  done  mostly
     automatically if standardized citation labels are first gen-
     erated, then the bibliography is sorted by	citation  labels,
     such as by	bibsort(1).

     Only a human reader can reliably decide when two  bibliogra-
     phy  entries  are truly the same.	bibjoin	can help automate
     much of this work,	but manual editing will	almost	certainly
     still be necessary.  If two entries are joined, these condi-
     tions must	be satisfied:

	  o  identical citation	labels;

	  o  identical year;

	  o  if	a journal article entry, identical volume, and if
	    both   have	 page  numbers,	 identical  initial  page
	    numbers.
     When two `equal' value strings are	found for the  same  key,
     one of them is deleted.  Otherwise, both key/value	pairs are
     output.  Manual editing will  then	 be  required  to  choose
     between them.

     Special  handling	is  supplied  for  `pages'  entries.   If
     entries  are  found with identical	initial	page numbers, but
     one of them has question marks in place of	 the  final  page
     number,  or  has no final page number at all, such	as "123--
     127", "123--??", and "123", then the ones with the	 question
     marks or no final page numbers will be dropped.  This facil-
     itates merging in data from library databases  that  do  not
     record final page numbers.




Version	0.02	 Last change: 16 September 1996			1






BIBJOIN(1)		  User Commands		       BIBJOIN(1)



     Special handling is also supplied for `author' and	 `editor'
     fields.  When a personal name appears in two forms, one with
     initials, and one without,	such as	`P. D. Q. Bach'	and `Phi-
     lippe  D.	Q.  Bach', the names are considered to match, and
     the longer	form is	retained.  In addition,	to deal	with  the
     UnCover  database practice	of omitting authors 4, 5, ..., N-
     1,	two author/editor personal name	lists are  considered  to
     match  if one has 3 names and the other more than 3, and the
     first, second, and	last match as above; the longer	 form  is
     retained.

     Value strings are considered equal	if they	match  after  all
     non-alphanumeric  characters are removed, and letter case is
     ignored.  This choice helps to eliminate many match failures
     that  arise  from	minor variations in punctuation, spacing,
     and capitalization.  bibjoin has no way of	determining which
     of	the two	strings	should be preserved, so	it uniformly dis-
     cards the shorter one (which presumably has  less	`informa-
     tion'): this choice will frequently be wrong!

     Syntax errors in the input	stream will cause abrupt termina-
     tion  with	 a  fatal error	message	and a non-zero exit code.
     The output	will be	incomplete, so you should always  examine
     the  output  file	before	assuming that you can replace the
     input file	with the output	file.

     Key/value pairs in	output entries are sorted  alphabetically
     by	 key  name,  so	that duplicate keys arising from the join
     operation appear consecutively, simplifying  the  subsequent
     manual editing task.

     After completion of manual	corrections,  it  is  recommended
     that  the bibliography be processed by biborder(1)	to stand-
     ardize key/value order,  and  to  check  for  any	remaining
     duplicate keys or citation	labels.

OPTIONS
     Command-line options may be abbreviated to	a unique  leading
     prefix.

     To	avoid confusion	with options, if a filename begins with	a
     hyphen,  it must be disguised by a	leading	absolute or rela-
     tive directory path, e.g.	/tmp/-foo.bib or ./-foo.bib.

     - check-missing	If  this  option  is  specified,  missing
		      expected	key fields will	be supplied, with
		      the key field name prefixed with	OPT,  and
		      the  value string	set to a pair of question
		      marks, e.g.
			OPTvolume =    "??",
		      The OPT prefix  ensures  that  the  key  is
		      ignored  by  BibTeX,  so	that the question



Version	0.02	 Last change: 16 September 1996			2






BIBJOIN(1)		  User Commands		       BIBJOIN(1)



		      marks will not appear  in	 an  output  .bbl
		      file.   The  GNU	Emacs bibtex-mode editing
		      support has functions for	removing the  OPT
		      prefixes,	and so does bibclean(1).

		      The  doubled  question   marks   are   dis-
		      tinguished from single ones that might leg-
		      itimately	appear in value	strings, and also
		      serve  as	 a  convenient regular-expression
		      pattern for  bibextract(1),  allowing  easy
		      preparation  of  a  printed listing of just
		      those entries that have incomplete  biblio-
		      graphic data:
			   bibextract '' '[?][?]' BibTeXfiles |	 lpr

     -version	       Display the  bibjoin  version  number  and
		      date on stdout, and then exit immediately.

WARNING	AND ERROR MESSAGES
     bibjoin will issue	warning	messages in the	following cases:

     o	With -check-missing, for unrecognized BibTeX entry types.
       The  entry will be output without checking for missing key
       names.

     o	For duplicate key names.  Such key/value pairs are sorted
       together	by name, preserving their original order.

     bibjoin will issue	an error message and terminate with  exit
     code 1, and incomplete output, in the following cases:

     o	For an unrecognized command-line argument (only	the first
       letter of each option is	currently examined).

     o	End-of-file is reached while collecting	an entry.

     o	A line beginning with `@' is encountered while collecting
       an entry, before	balanced braces	have been found.

CAVEATS
     BibTeX has	loose syntactical requirements that  the  current
     simple  implementation of bibjoin does not	support.  In par-
     ticular, outer parentheses	may  not  be  used  in	place  of
     braces  following ``@keyword'' patterns.  If you have such	a
     file, you can use bibclean(1) to prettyprint it into a  form
     that bibjoin can handle successfully.

SEE ALSO
     bibcheck(1), bibclean(1), bibdup(1),  bibextract(1),  bibla-
     bel(1),  biblex(1),  biborder(1),	bibparse(1),  bibsort(1),
     bibtex(1),	bibunlex(1), citesub(1), emacs(1).




Version	0.02	 Last change: 16 September 1996			3






BIBJOIN(1)		  User Commands		       BIBJOIN(1)



AUTHOR
     Nelson H. F. Beebe, Ph.D.
     Center for	Scientific Computing
     Department	of Mathematics
     University	of Utah
     Salt Lake City, UT	84112
     Tel: +1 801 581 5254
     FAX: +1 801 581 4148
     Email: <beebe@math.utah.edu>
     WWW URL: http://www.math.utah.edu/~beebe













































Version	0.02	 Last change: 16 September 1996			4



