%% /u/sy/beebe/tex/bibcheck-0.09/README, Sat Oct 14 17:22:18 1995
%% Edit by Nelson H. F. Beebe <beebe@plot79.math.utah.edu>

=================
Table of Contents
=================

	Jump start
	Bug and problem reports
	Background literature
	Copyright issues
	Distribution contents
	Installation
	UNIX systems
		TO-BE-UPDATED: IBM PC DOS
		TO-BE-UPDATED: DEC Alpha OpenVMS
		TO-BE-UPDATED: DEC VAX VMS
	Performance
	Code optimization
	Details of UNIX installation attempts


==========
Jump start
==========

As with most GNUware, you can build, test, and install this program on
most UNIX systems by these simple steps:

csh et amici:
	setenv CC ...your favorite C or C++ compiler...
	./configure && make all check install

sh et amici:
	CC=...your favorite C or C++ compiler...
	export CC
	./configure && make all check install

If you don't set the CC environment variable, then gcc (or cc, if gcc
is not available) will be assumed.

Additional flexibility is provided by environment variables CPP (for
the C preprocessor: not recommended) and LEX (for choosing between lex
and flex; flex is the default, when available).  If you set LEX, it
MUST be either to lex or flex; do NOT specify a leading pathname,
because it will produce incorrect Makefile and config.h settings, and
incorrect bibcheck execution.

If you wish to undo a "make install", just do "make uninstall"; this
will remove any files in system directories put there by "make
install".

See below for further details, and for instructions for non-UNIX
systems.


=======================
Bug and problem reports
=======================

Please send notification of bugs and problems with bibcheck, and news
about ports to any new systems, to the author:

	Nelson H. F. Beebe
	Center for Scientific Computing
	Department of Mathematics
	University of Utah
	Salt Lake City, UT 84112
	USA
	Tel: +1 801 581 5254
	FAX: +1 801 581 4801
	Email: beebe@math.utah.edu (Internet)
	WWW URL: http://www.math.utah.edu/~beebe


=====================
Background literature
=====================

bibcheck is based on biblex, which is one of the tools described in
the following paper:

@String{TUGboat = "TUGBoat"}

@Article{Beebe:TB14-4-395-419,
  author =       "Nelson H. F. Beebe",
  title =        "Bibliography Prettyprinting and Syntax Checking",
  journal =      TUGboat,
  year =         "1993",
  volume =       "14",
  number =       "4",
  pages =        "395--419",
  note =         dec,
  bibdate =      "Fri Dec 31 12:15:07 1993",
}

The complete text of the TUGboat article is included with bibclean
2.08 and later versions.

You can find distributions of bibclean in a variety of formats in the
same place that you found the bibparse (et al) distribution.  The
master location is ftp.math.utah.edu:/pub/tex/bib.  Via electronic
mail, a message with the body

        help
        send index from tex/bib

to tuglib@math.utah.edu will get you started.


================
Copyright issues
================

Running
	bibcheck -copyright
will produce:

*********************************************
*********************************************
*** This program is in the PUBLIC DOMAIN. ***
*********************************************
*********************************************


=====================
Distribution contents
=====================

This directory contains bibcheck, a tool for applying heuristic checks
to BibTeX bibliography files.  The contents are:

CHANGELOG		revision history log
Makefile.in		GNU autoconf input file (autoconf NOT needed
			at end-user site)
README			this file
bibcheck.awk		bibcheck prototype in awk
bibcheck.c		bibcheck program in C (also compilable with C++)
bibcheck.hlp		ASCII text file with formatted manual pages in
			VAX VMS HELP format
bibcheck.man		manual pages (nroff/troff input)
bibcheck.ps		PostScript version of typeset manual pages
bibcheck.sh		bibcheck shell script template to run
			bibcheck.awk (automatically customized to
			local site by "make install")
bibcheck.sok		spelling exception dictionary for "make spell"
bibcheck.txt		ASCII text file with formatted manual pages
biblex.c		lex (or flex) output from biblex.l (may be
			usable for manual installation)
biblex.l		Lex program for parsing BibTeX files rigorously
bibyydcl.h		Function prototypes for lex and biblex functions
config.hin		GNU autoconf input file (autoconf NOT needed
			at end-user site)
configure		automatic script for creating Makefile and
			config.h
configure.in		GNU autoconf input file (autoconf NOT needed
			at end-user site)
configure.sed		sed script for patching configure (not used at
			end-user site)
custom.h		support header file
hash.c			hash table support
hash.h			support header file
man2ps			UNIX shell script for directing conversion of
			nroff/troff
			files to PostScript
rofvms.awk		awk script to convert .txt file to .hlp file
regexp/*		regular-expression support
save/*			copies of Makefile, config.h, and configure
			for bootstrapping and manual installation
strdup.c		string primitive
stricm.c		string primitive
strnic.c		string primitive
test/check001.in	Test BibTeX file with errors for simple
			validation test
test/check001.eok	Expected stderr output of "bibcheck check001.in"
test/check001.ook	Expected stdout output of "bibcheck check001.in"
xalloc.c		memory management functions
xalloc.h		support header file
xctype.h		support header file
xerrno.h		support header file
xstddef.h		support header file
xstdlib.h		support header file
xstring.h		support header file
xtypes.h		support header file
xunistd.h		support header file

Since several public and commercial implementations of nawk are
available for UNIX, IBM PC DOS, and DEC OpenVMS, this code should be
readily usable on most of the world's computers.


============
Installation
============

Starting with version 0.09, bibcheck has been adapted to use the GNU
autoconf automatic configuration system for UNIX installations.

GNU autoconf is run at the author's site to produce the configure
script from configure.in.

The configure script is run at each installer's UNIX site to produce
Makefile from Makefile.in, and config.h from config.hin.  The
configure script is a large (1500+ lines) Bourne shell program that
investigates various aspects of the local C (or C++) implementation, and
records its conclusions in config.h.  Interestingly, its probes
uncovered a bug in one compiler: lcc 3.4b on Sun Solaris 2.x has an
incorrect definition of toupper() in its ctype.h!

autoconf, at least at the current 2.4 version, is not as C++-aware as
it should be.  The Makefile must carry out minor edits of the
configure script to get it to even work with C++ compilers.  The small
test programs run by configure to determine the existence of assorted
Standard C library functions all lead to incorrect conclusions for
config.h, because they intentionally contain function prototypes with
different argument types.  Since C++ functions are compiled into
external names that encode the function and argument types, along with
the function name, these prototypes produce references to non-existent
functions, causing program linking to fail.  Fortunately, I've been
able to fix this problem too with additional automatic edits, all
carried out by "make configure".

Should you do a "make maintainer-clean" [NOT recommended, except at the
author's site], the configure script will be deleted, and you will
need recent versions of both GNU m4 and autoconf correctly installed
to reconstruct things, which can be done this way:

	autoconf	# Regenerate unedited configure
	./configure	# Regenerate config.h and Makefile
	rm configure	# delete configure
	make configure	# Regenerate edited configure

For convenience and safety, the distribution includes a subdirectory
named save that contains read-only copies of the files Makefile,
config.h, and configure created by autoconf and "make configure".
This will allow recovery from a lost or damaged configure file.

Suitable hand-crafted config.h files are provided for non-UNIX
systems, and in the unlikely event of a failure of the configure
script on a UNIX system, config.h can be manually produced from a copy
of config.hin with a few minutes' editing work. If you do this,
remember to save a copy of your config.h under a different name,
because running configure will destroy it.  If you have GNU autoconf
installed (the installation is very simple and source code is
available from prep.ai.mit.edu:/pub/gnu/autoconf-x.y.tar.gz), you
might try augmenting config.hin instead, then run autoconf and
configure.

Thus, on UNIX, installation normally consists of just two steps
(assuming a csh-compatible shell):

	setenv CC ...your favorite C or C++ compiler...
	./configure && make all check install

If you like, add OPT='your favorite optimization flags' to the make
command; by default, only -g (debug) is assumed.  If your compiler
won't accept -g with other optimization levels, then set CFLAGS
instead of OPT on the command line; be sure NOT to override any
non-optimizing flags in the CFLAGS set in the Makefile.

The GNU standard installation directories /usr/local/bin for binaries,
and /usr/local/man/man1 for manual pages are assumed.  The prefix
/usr/local can be overridden by providing an alternate definition on
the command line:

	make prefix=/some/other/path install

After installation, you can do
	make distclean
to restore the directories to their distribution state.  You should
also do this between builds for different architectures from the same
source tree; neglecting to do to will almost certainly lead to
failure, because the config.cache file created by configure will lead
to an incorrect config.h for the next build.


============
UNIX systems
============

The code can be compiled with either C (K&R or ISO/ANSI Standard C) or
C++ compilers; the Makefile contains several different suitable
settings of the CC macro.

On UNIX systems, the only changes that you are likely to need in the
Makefile are the settings of CC and CFLAGS, and possibly, DEFINES, and
if you wish to do "make install", the settings of bindir, MANDIR, and
MANEXT.

If you are installing bibcheck on a new system, you should definitely
run "make check" before installing it on your system.  This will
perform some simple validation checks on bibcheck; the only output
expected is the test name.  Any differences should be reported to the
author, if the cause cannot be determined locally.

The code has been tested under more than 50 different C and C++
compilers, and is in regular use to maintain the TeX User Group
bibliography collection stored on ftp.math.utah.edu:/pub/tex/bib, as
well as several other local bibliographies.  These files total more
than 200K lines and 18K bibliography entries.  Some of these
bibliographies are mirrored to the Comprehensive TeX Archive Network
(CTAN) hosts.  Do "finger ctan@pip.shsu.edu" to find a CTAN site on
the Internet near you.

bibcheck is also used for the BibNet Project, which collects
bibliographies in numerical analysis.  The master collection is
available on ftp.math.utah.edu:/pub/bibnet, and is mirrored from there
to netlib servers at AT&T and Oak Ridge National Laboratory.

If you port bibcheck to a new system, please select maximal error and
warning messages in your compiler, to better uncover problems.  If you
find massive numbers of errors complaining about function and argument
type mismatches, it is likely that this can be remedied by suitable
modifications of config.h.  As C implementations move towards
conformance with the December 1989 ISO/ANSI C Language Standard, the C
language is a moving target that must be tracked by config.h, which is
why that file is normally automatically generated on UNIX systems by
the configure script.  With C compilers, you can safely ignore
complaints about implicit declaration of library functions; they are
caused by deficiencies in the vendor-provided header files.

If you have a C++ compiler, please try that as well.  This code has
been successfully compiled under at least 12 C++ compilers, and the
stricter type checking has uncovered problems that slipped past other
compilers.

bibcheck has been (mostly successfully) built and tested with C and
C++ compilers, with both lex and flex, on these systems for the 0.09
release:

	DECstation 5000		ULTRIX 4.2	cc, gcc, g++, lcc
	DEC Alpha		OSF/1 3.0, 3.2c	cc, c89, cxx, gcc, g++
	HP 9000/380		BSD 4.3		cc, CC, gcc, g++
	HP 9000/735		HP-UX 9.0	cc, c89, CC, gcc, g++
	HP 9000/850		HP-UX 9.04	cc, c89, CC, gcc, g++
	IBM RS/6000		AIX 3.2		cc, c89, xlC, gcc, g++
	Intel 486		Linux 1.3.15	gcc, g++
	MIPS RC6280		RISCos 2.1.1AC	cc
	NeXT 68040		Mach 3.0	cc, gcc, g++
	SGI 4D/210		IRIX 4.0.5c	cc, gcc, lcc
	SGI Indigo/2		IRIX 5.3	cc, CC, gcc, g++, lcc
	SGI Power Challenge	IRIX 6.0.1	cc, CC
	Sun SPARCstation	Solaris 2.3,2.4	cc, CC, gcc, g++, lcc
	Sun SPARCstation	SunOS 4.1.3	acc, cc, CC, gcc, g++

Further details are given below.  Where builds have failed, it is
usually because of (a) conflicts between system header files, or
(b) lex output code is incompatible with C++.  Neither of these
can be satisfactorily worked around, although an installer could
do so at a local site by hand patching copies of the problem files.

The author uses the build-all.sh script for these tests; it tries
builds with every known compiler on the development systems.  If your
UNIX system has other compilers that can be tested, please send their
full path names to the author.


TO-BE-UPDATED: ==========
TO-BE-UPDATED: IBM PC DOS
TO-BE-UPDATED: ==========
TO-BE-UPDATED:
TO-BE-UPDATED: The ibmpc subdirectory contains these files for the IBM PC:
TO-BE-UPDATED:
TO-BE-UPDATED: 	bibcheck.exe		bibcheck executable program
TO-BE-UPDATED: 	bibcheck.uue		uuencoded version of bibcheck.exe
TO-BE-UPDATED: 	config.h		hand-coded configuration file
TO-BE-UPDATED: 	makefile.tcc		Makefile for Turbo C/C++ 3.0
TO-BE-UPDATED: 	makefile.msc		Makefile for Microsoft C 5.1, 6.0, 7.0
TO-BE-UPDATED: 				with Microsoft nmake (available in 6.0
TO-BE-UPDATED: 				and 7.0 distributions)
TO-BE-UPDATED: 	msc51bld.bat		Build bibcheck with Microsoft C 5.1
TO-BE-UPDATED: 	msc51pth.bat		Set PATH variable for Microsoft C 5.1
TO-BE-UPDATED: 	msc60bld.bat		Build bibcheck with Microsoft C 6.0
TO-BE-UPDATED: 	msc60pth.bat		Set PATH variable for Microsoft C 6.0
TO-BE-UPDATED: 	msc70bld.bat		Build bibcheck with Microsoft C 7.0
TO-BE-UPDATED: 	msc70pth.bat		Set PATH variable for Microsoft C 7.0
TO-BE-UPDATED: 	tcc20bld.bat		Build bibcheck with Turbo C 2.0
TO-BE-UPDATED: 	tcc20pth.bat		Set PATH variable for Turbo C 2.0
TO-BE-UPDATED: 	tcc30bld.bat		Build bibcheck with Turbo C++ 3.0
TO-BE-UPDATED: 	tcc30pth.bat		Set PATH variable for Turbo C/C++ 3.0
TO-BE-UPDATED: 	ibmtest.bat		Test bibcheck
TO-BE-UPDATED:
TO-BE-UPDATED: The executable program has been compiled with Borland Turbo C and C++
TO-BE-UPDATED: 3.0 under MS DOS 4.0 running on a 25MHz Intel 486/DX board in a Sun
TO-BE-UPDATED: SPARCstation 2 with SunPC.  With the 486 board, SunPC provides DOS on
TO-BE-UPDATED: native hardware.  Without it, or under an additional session, it
TO-BE-UPDATED: provides DOS by emulating the Intel instruction set.  bibcheck has
TO-BE-UPDATED: also been compiled and successfully tested with Turbo C 2.0, but the C
TO-BE-UPDATED: 3.0 compiled version is the one distributed, because it gives the
TO-BE-UPDATED: smallest bibcheck.exe file size:
TO-BE-UPDATED:
TO-BE-UPDATED: 	Turbo C 2.0:	55844
TO-BE-UPDATED: 	Turbo C 3.0:	53936
TO-BE-UPDATED: 	Turbo C++ 3.0:	97874
TO-BE-UPDATED:
TO-BE-UPDATED: [The distribution size may be slightly different; the given sizes are
TO-BE-UPDATED: a snapshot near the end of the 2.05 development.]
TO-BE-UPDATED:
TO-BE-UPDATED: bibcheck has also been successfully built with Microsoft C 5.1 and
TO-BE-UPDATED: 6.0.  Version 5.0 has fatal compiler errors that prevent its use for
TO-BE-UPDATED: this program; I could not find acceptable code workarounds.  I have
TO-BE-UPDATED: been unable to run version 7.0 on my SPARCstation; the compiler dies
TO-BE-UPDATED: with a "runtime error R6018 - unexpected heap error", despite my
TO-BE-UPDATED: having installed the Qualitas 386MAX memory manager that comes with
TO-BE-UPDATED: Microsoft C 7.0.  Thus, although there are .bat files and makefile.msc
TO-BE-UPDATED: for 7.0, they have not been tested, and regrettably, 7.0 C++ cannot be
TO-BE-UPDATED: used.  Each C++ compiler I've tried has exposed new things that need
TO-BE-UPDATED: fixing.  I tried unsuccessfully to install Microsoft C 7.0 on our lab
TO-BE-UPDATED: PCs, but that too failed, because 7.0 needs a 386 with more than 4MB
TO-BE-UPDATED: of memory, and our PCs are old 286 models, sigh...
TO-BE-UPDATED:
TO-BE-UPDATED: The bibcheck executable sizes from Microsoft C are larger than with
TO-BE-UPDATED: Turbo C, so I'm not distributing them:
TO-BE-UPDATED:
TO-BE-UPDATED: 	Microsoft C 5.1:	98601
TO-BE-UPDATED: 	Microsoft C 6.0:	92499
TO-BE-UPDATED:
TO-BE-UPDATED: The Microsoft port gave considerable trouble, because I was unable to
TO-BE-UPDATED: get the stack and data segments to be separated, despite using the
TO-BE-UPDATED: compiler option (-Asfu) to do so.  Compilation would complete
TO-BE-UPDATED: normally, then the linker would complain that stack + data exceeded
TO-BE-UPDATED: 64KB.  Since the Turbo C build had been successful in the compact
TO-BE-UPDATED: memory model, I did not want to go to the large model for Microsoft C.
TO-BE-UPDATED: I finally succeeded by using the -Gt1024 option, which forces objects
TO-BE-UPDATED: larger than 1024 bytes to be placed in a separate section.  This left
TO-BE-UPDATED: room for a stack size of 0xb000 (45056 bytes), which is more than
TO-BE-UPDATED: ample.
TO-BE-UPDATED:
TO-BE-UPDATED: A few small code codes changes were needed for the Microsoft C
TO-BE-UPDATED: compilers.  In particular, it would not accept const modifiers in
TO-BE-UPDATED: certain (legal) places.  The workaround is to rewrite them as a macro
TO-BE-UPDATED: CONST which expands to an empty string with M_I86 is defined (the only
TO-BE-UPDATED: symbol that appears to uniquely differentiate between C compilers from
TO-BE-UPDATED: Borland, Microsoft, and TopSpeed), and to const otherwise.
TO-BE-UPDATED:
TO-BE-UPDATED: Since bibcheck uses no floating-point arithmetic, and PC DOS has no
TO-BE-UPDATED: shared libraries, I expect that the executable will run on any version
TO-BE-UPDATED: of DOS greater than 4.0.  It may also run on earlier versions.  At the
TO-BE-UPDATED: time of writing 6.1 is current, and this bibcheck executable works
TO-BE-UPDATED: fine on it.
TO-BE-UPDATED:
TO-BE-UPDATED: To rebuild bibcheck, you will need to adjust directory paths in
TO-BE-UPDATED: makefile.msc, makefile.tcc, msc*.bat and/or tcc*.bat.  The *bld.bat
TO-BE-UPDATED: files can be used to build bibcheck, or if you have Turbo C/C++ 3.0,
TO-BE-UPDATED: you can copy makefile.tcc to makefile and type "make".  With Microsoft
TO-BE-UPDATED: C 5.1 and the 6.0 nmake utility, you can copy makefile.msc to makefile
TO-BE-UPDATED: and do "nmake".  The Microsoft 6.0 compiler is too big to run
TO-BE-UPDATED: underneath nmake; the workaround is fortunately simple:
TO-BE-UPDATED:
TO-BE-UPDATED: 	nmake -n >foo.bat
TO-BE-UPDATED: 	foo
TO-BE-UPDATED:
TO-BE-UPDATED: You should definitely run ibmtest to make sure the newly-built program
TO-BE-UPDATED: is working correctly; you'll need to copy bibcheck.exe and ibmtest.bat
TO-BE-UPDATED: into the top-level bibcheck directory in order for it to find the test
TO-BE-UPDATED: files.  Unlike the UNIX "make check", ibmtest does not require that
TO-BE-UPDATED: latex or bibtex be installed on your system.
TO-BE-UPDATED:
TO-BE-UPDATED: If you don't have uuencode/uudecode for IBM PC DOS, you can get it via
TO-BE-UPDATED: e-mail; for details, send a message with the lines
TO-BE-UPDATED:
TO-BE-UPDATED: 	help
TO-BE-UPDATED: 	send index from support
TO-BE-UPDATED: 	send index from ftp/ibmpc
TO-BE-UPDATED: 	send uuarc.arc from ftp/ibmpc
TO-BE-UPDATED: 	send uuencode.arc from ftp/ibmpc
TO-BE-UPDATED:
TO-BE-UPDATED: to tuglib@math.utah.edu.  Alternatively, you can use anonymous ftp to
TO-BE-UPDATED: ftp.math.utah.edu and fetch the files uuarc.arc or uuencode.arc from
TO-BE-UPDATED: /pub/ibmpc.  uuarc.arc contains only .com executables; uuencode.arc
TO-BE-UPDATED: contains sources, makefiles, and .exe files.
TO-BE-UPDATED:
TO-BE-UPDATED: If your transfer of these files did not translate UNIX LF line
TO-BE-UPDATED: terminators to PC DOS CR LF terminators, the ux2dos and dos2ux
TO-BE-UPDATED: utilities can be of assistance.  You can find their source code in
TO-BE-UPDATED: /pub/ibmpc/dos2ux.shar (via e-mail to tuglib@math.utah.edu, "send
TO-BE-UPDATED: dos2ux.shar from support").
TO-BE-UPDATED:
TO-BE-UPDATED:
TO-BE-UPDATED: =================
TO-BE-UPDATED: DEC Alpha OpenVMS
TO-BE-UPDATED: =================
TO-BE-UPDATED:
TO-BE-UPDATED: The vms/alpha subdirectory contains these files for DEC Alpha OpenVMS:
TO-BE-UPDATED:
TO-BE-UPDATED: 	bibcheck.exe		bibcheck executable program
TO-BE-UPDATED: 	bibcheck.uue		uuencoded version of bibcheck.exe
TO-BE-UPDATED: 	config.h		hand-coded configuration file
TO-BE-UPDATED: 	recomp.com		do @recomp foo to recompile foo.c
TO-BE-UPDATED: 	vmsclean.com		do @vmsclean to cleanup after a build
TO-BE-UPDATED: 	vmsmake.com		do @vmsmake to build bibcheck
TO-BE-UPDATED: 	vmstest.com		do @vmstest to test bibcheck
TO-BE-UPDATED:
TO-BE-UPDATED: You will have to change one line in vmstest.com to define the disk
TO-BE-UPDATED: location of bibcheck.exe in the foreign command symbol for bibcheck.
TO-BE-UPDATED: If you don't have uuencode/uudecode for Alpha OPenVMS, you can get it
TO-BE-UPDATED: via e-mail; for details, send a message with the lines
TO-BE-UPDATED:
TO-BE-UPDATED: 	help
TO-BE-UPDATED: 	send index from support
TO-BE-UPDATED:
TO-BE-UPDATED: to tuglib@math.utah.edu.
TO-BE-UPDATED:
TO-BE-UPDATED: Unlike the UNIX "make check", execution of vmstest.com does not
TO-BE-UPDATED: require that latex or bibtex be installed on your system.  [I didn't
TO-BE-UPDATED: have either on the Alpha OpenVMS system that I built bibcheck on.]
TO-BE-UPDATED:
TO-BE-UPDATED:
TO-BE-UPDATED: ===========
TO-BE-UPDATED: DEC VAX VMS
TO-BE-UPDATED: ===========
TO-BE-UPDATED:
TO-BE-UPDATED: The vms/vax subdirectory contains these files for DEC VAX VMS:
TO-BE-UPDATED:
TO-BE-UPDATED: 	bibcheck.exe		bibcheck executable program
TO-BE-UPDATED: 	bibcheck.uue		uuencoded version of bibcheck.exe
TO-BE-UPDATED: 	config.h		hand-coded configuration file
TO-BE-UPDATED: 	recomp.com		do @recomp foo to recompile foo.c
TO-BE-UPDATED: 	vmsclean.com		do @vmsclean to cleanup after a build
TO-BE-UPDATED: 	vmsmake.com		do @vmsmake to build bibcheck
TO-BE-UPDATED: 	vmstest.com		do @vmstest to test bibcheck
TO-BE-UPDATED:
TO-BE-UPDATED: You will have to change one line in vmstest.com to define the disk
TO-BE-UPDATED: location of bibcheck.exe in the foreign command symbol for bibcheck.
TO-BE-UPDATED: If you don't have uuencode/uudecode for VAX VMS, you can get it via
TO-BE-UPDATED: e-mail; for details, send a message with the lines
TO-BE-UPDATED:
TO-BE-UPDATED: 	help
TO-BE-UPDATED: 	send index from support
TO-BE-UPDATED:
TO-BE-UPDATED: to tuglib@math.utah.edu.
TO-BE-UPDATED:
TO-BE-UPDATED: Unlike the UNIX "make check", execution of vmstest.com does not require
TO-BE-UPDATED: that latex or bibtex be installed on your system.  [I didn't have
TO-BE-UPDATED: either on the VAX VMS system that I built bibcheck on.]
TO-BE-UPDATED:
TO-BE-UPDATED: You will find differences in the vmstest output between testbib1.bok
TO-BE-UPDATED: (correct Sun) and testbib1.bib (VAX VMS); characters with octal values
TO-BE-UPDATED: 211--215 and 240 disappear from the VAX VMS output.  The reason is
TO-BE-UPDATED: that on VAX VMS 5.4 (and likely other versions of VAX VMS) isspace()
TO-BE-UPDATED: from <ctype.h> classifies those characters as spaces.  This problem
TO-BE-UPDATED: does NOT exist on DEC Alpha OpenVMS 1.5.  As long as your .bib files
TO-BE-UPDATED: do not use those six characters, execution should be correct; for
TO-BE-UPDATED: portability, .bib files should restrict themselves to ASCII/ISO-8859
TO-BE-UPDATED: characters in the range 32--127, plus newline and tab.


===========
Performance
===========

The C implementation was based on the version 0.07 prototype in awk.
bibcheck.c is 3.5 times as long as bibcheck.awk.  When the hash table
and regular expression support code, and header files, is included,
the C code total rises to 8482 lines, compared to 378 lines of awk, a
factor of 22.4.

The C version is faster: on jacm.bib (a bibliography of the Journal of
the ACM), it runs 3.04 times faster than the stream

	biblex <../bib/jacm.bib | time nawk -f bibcheck.awk  >foo.old

on a Sun SPARCstation LX entry-level workstation running Solaris 2.3,
using Sun C++ copilation with -O4 (highest) optimization.  On a
high-end HP 9000/735 with C++ +O3 compilation on HP-UX 9.0, the
speedup is only 1.51.  On an entry-level DEC Alpha 3000/300LX system
with C++ -O2 compilation on OSF/1 3.0, the speedup is 1.85.

Profiling of the C implementation shows that major portions of time
are spent in regexec() (and its descendants) and strchr(), neither of
which can be sped up much.  The author of the regexp package used in
bibcheck has already spent a good deal of effort optimizing the code,
particularly for the common cases of simple regular expressions.

Here is part of a profile from the HP 9000/735 compilation using
jacm.bib as test input:

%time cumsecs seconds   calls   msec/call  name
 34.0   15.10   15.10                     _mcount
 26.5   26.87   11.78  5329777       0.00 regmatch(char*)
 10.5   31.54    4.67  5238794       0.00 _strchr
  8.9   35.49    3.96 12323351       0.00 regnext(char*)
  7.9   38.98    3.48  4324771       0.00 regtry(regexp*,const char*)
  2.5   40.11    1.13   842028       0.00 _strlen
  1.5   40.79    0.68   226627       0.00 regexec(regexp*,const char*)
  1.5   41.45    0.67   186771       0.00 yylook
  0.8   41.80    0.35   368481       0.00 stricmp(const char*,const char*)
  0.6   42.08    0.28   695279       0.00 regrepeat(char*)
  0.5   42.28    0.20   470573       0.00 next_char(void)
  0.4   42.47    0.19    65304       0.00 hash_lookup(const char*,hash_table*)
  0.3   42.61    0.14    65304       0.00 hash(const char*,const hash_table*)
  0.3   42.73    0.12    27152       0.00 _doprnt
  0.3   42.85    0.12    15750       0.01 out_string(void)
  0.2   42.96    0.11   186771       0.00 yylex
  0.2   43.05    0.09   360056       0.00 __toupper
  0.2   43.12    0.07      122       0.57 read
...

The _mcount function is part of the profiling software; it usually
accounts for the largest fraction of time.


=================
Code optimization
=================

An experiment was made with five different C and C++ compilers on a
Sun SPARCstation LX running Solaris 2.3, to see what the effect of
code optimization might be.  All compilers are recent releases (late
fall, 1994):
	gcc	2.6.0
	g++	2.6.0
	cc	3.0.1
	c++	3.0.1
	lcc	3.1

Here are the results, sorted in order of increasing CPU time, using
two large bibliographies (you need a display 150 characters wide to
view these tables).  Where possible, procedure inlining was requested
for functions known from profiling to be important, and for all but
lcc, code generation was requested for the more recent SPARC Version 8
architecture, which added integer multiply and divide instructions.

Two additional tests, indicated below with *********, were made with
elimination of the -mv8 option of the fastest case; the loss of
integer multiply and divide instructions slows the code by about 7%.

Finally, two tests, indicated below with #########, were made with
elimination of function inlining; this slows the code by about 2%.

======================================================================================================================================================
----------Time (sec)----------			cacm.bib (1548KB, 43807 lines, 2699 bibliographic entries)
real	user	sys   user+sys  make command
======================================================================================================================================================
157.7	156.7	0.3	157	make OPT=-O2\ -finline-functions\ -mv8 CC='gcc -D__solaris'
157.8	156.7	0.3	157	make OPT=-O3\ -finline-functions\ -mv8 CC='gcc -D__solaris'
160.2	159.1	0.4	159.5	make OPT=-O1\ -finline-functions\ -mv8 CC='gcc -D__solaris'
160.1	159.0	0.5	159.5	make OPT=-O2\ -mv8 CC='gcc -D__solaris' #########
168.8	167.3	0.3	167.6	make OPT=-O1\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
168.1	167.1	0.5	167.6	make OPT=-O2\ -finline-functions CC='gcc -D__solaris' *********
169.4	168.2	0.4	168.6	make OPT=-xO2\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
175.4	174.2	0.3	174.5	make OPT=-xO3\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
176.5	175.2	0.4	175.6	make OPT=-xO2\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
182.5	181.4	0.3	181.7	make OPT=-O3\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
182.6	181.4	0.3	181.7	make OPT=-O2\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
189.4	188.5	0.3	188.8	make OPT=-xO3\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
213	211.9	0.3	212.2	make CC='lcc -A -A -D__solaris'
250.7	249.3	0.3	249.6	make OPT=-g\ -finline-functions\ -mv8 CC='gcc -D__solaris'
259.8	258.4	0.4	258.8	make OPT=-xO1\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
292.4	291.1	0.3	291.4	make OPT=-g\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
356.2	354.7	0.4	355.1	make OPT=-xO1\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
364.6	361.3	0.3	361.6	make OPT=-g\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
426.3	424.1	0.4	424.5	make OPT=-g\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
======================================================================================================================================================

======================================================================================================================================================
----------Time (sec)----------			jacm.bib (990KB, 30548 lines, 2045 bibliographic entries)
real	user	sys   user+sys  make command
======================================================================================================================================================
79.1	78.4	0.2	78.6	make OPT=-O2\ -finline-functions\ -mv8 CC='gcc -D__solaris'
79.3	78.5	0.2	78.7	make OPT=-O3\ -finline-functions\ -mv8 CC='gcc -D__solaris'
80.8	79.8	0.3	80.1	make OPT=-O1\ -finline-functions\ -mv8 CC='gcc -D__solaris'
81.0	80.2	0.3	80.5	make OPT=-O2\ -mv8 CC='gcc -D__solaris' #########
83.9	83.2	0.2	83.4	make OPT=-xO2\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
85.8	84.3	0.2	84.5	make OPT=-O1\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
85.2	84.3	0.3	84.6	make OPT=-O2\ -finline-functions CC='gcc -D__solaris' *********
87.4	86.2	0.3	86.5	make OPT=-xO2\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
89.8	88.6	0.3	88.9	make OPT=-xO3\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
92.9	92	0.2	92.2	make OPT=-O3\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
94.9	92	0.3	92.3	make OPT=-O2\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
95.1	94.2	0.2	94.4	make OPT=-xO3\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
105.8	104.9	0.3	105.2	make CC='lcc -A -A -D__solaris'
121.9	121.1	0.2	121.3	make OPT=-g\ -finline-functions\ -mv8 CC='gcc -D__solaris'
135.2	133	0.3	133.3	make OPT=-xO1\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
138.9	137.7	0.3	138	make OPT=-g\ -finline-functions\ -mv8 CC='g++ -D__solaris -D__EXTERN_C__'
176.6	175.6	0.3	175.9	make OPT=-g\ -xcg92\ -xinline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='cc -Xc -D__ACC__ -D__solaris'
178.6	177.1	0.3	177.4	make OPT=-xO1\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
211.3	209	0.3	209.3	make OPT=-g\ -xcg92\ -inline=regmatch,regnext,regtry,strchr,strlen,yylook,stricmp CC='CC -D__solaris -D__EXTERN_C__'
======================================================================================================================================================


=====================================
Details of UNIX installation attempts
=====================================

Unless otherwise noted below, builds and tests were successful with
all available compilers (see list above), and with both lex and flex.


DEC Alpha OSF/1 (Digital UNIX) 3.0:
		/usr/local/bin/g++  -c  -g -O  -I. -I.  biblex.c
		fails with lex: lex produces non-C++ code


DEC ULTRIX 4.3:
		/usr/local/bin/g++  -c  -g -O  -I. -I.  bibcheck.c
		fails with lex: lex produces non-C++ code


HP HP-UX 9.0.3:
		With flex:
		/usr/local/bin/g++
		produces a bibcheck executable that fails at run time
		because of an unresolved symbol __builtin_vec_new in
		/usr/local/lib/libg++.sl; this appears to be a gcc
		installation problem.


IBM RS/6000 AIX 3.2.5:
		OK


Intel LINUX 1.3.15:
		OK


MIPS RISCos 2.1.1AC:
		OK


NeXT Mach 3.0:
	/bin/cc OK

	With flex:
	/usr/local/bin/g++ fails at link: ld: Undefined symbols: _isatty__Fi
	/usr/local/bin/gcc fails: conflicting types in system header files
	/usr/local/bin/lcc fails: ctype system header file conflict

	/usr/local/bin/g++  -c  -g -O  -I. -I.  biblex.c
	fails with lex because output is not C++-compatible


SGI IRIX 4.0.5c:
	With flex:
	/usr/bin/cc  -c  -g  -I. -I.  biblex.c
	*** Error code 1
	[Can be fixed by using "setenv CC '/usr/bin/cc -ansi'"]

	With lex:
	/usr/bin/cc -ansi -c  -g  -I. -I.  biblex.c
	accom: Error: biblex.c, line 20: integer constant expected
       FILE *yyin = {	(__stdin) }, *yyout = {	(__stdout) };
       ---------------------------^

	With both lex and flex, the executable from
	/usr/local/bin/lcc
	goes into an infinite loop, after apparently running
	successfully on its input file(s).  A debug traceback
	shows that the loop location is apparently in ioctl():

	(dbx) where
	>  0 ._ioctl._ioctl(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x0, 0x0) ["sys/_ioctl.s":16, 0x40ade8]
	   1 ._isatty._isatty(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x0, 0x0) ["_isatty.c":35, 0x40a6f4]
	   2 yy_init_buffer(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x402f98, 0x402f2c) [0x4037dc]
	   3 yyrestart(0x100006f4, 0x5408, 0x7fffc5a8, 0x1, 0x0, 0x0) [0x40359c]
	   4 yy_get_next_buffer(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x1, 0x100006f4) [0x402f94]
	   5 yylex(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x1, 0x7fffc6e4) [0x402b44]
	   6 dolex(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x0, 0x0) [0x4007f4]
	   7 main(0x0, 0x5408, 0x7fffc5a8, 0x1, 0x0, 0x0) [0x4005b8]


SGI IRIX 5.3:
	With flex and
	/usr/local/bin/lcc
	get error:
	Object file format error in:
		/usr/local/lib/libfl.a(libyywrap.o): shared Elf object
		"/usr/local/lib/libfl.a" cannot be linked non-shared.
	This might be fixable by rebuilding libfl.a with lcc, but I'm
	not going to.


SGI IRIX 6.0.1:
	/usr/local/bin/gcc fails because of bad code generation
	Recompilation with OPT=-mips2 fixes the problem; there is an
	apparent bug in the gcc 2.7.0 -mips3 code generator.
	
	/usr/local/bin/g++ fails because of bad code generation
	Recompilation with OPT=-mips2 fixes the problem, but the
	system I built this on was missing the libg++.a library file.

	With lex,
	/bin/CC
	produces a bibcheck executable that core dumps at the
	statement
		if (STREQUAL(key,"author"))
	in bibcheck.c:do_keyword().  Stepping instruction-wise
	in the debugger shows that the problem arises from bad
	code generation.  Recompilation with OPT=-mips3 (to
	generate 32-bit MIPS R4000 code instead of 64-bit MIPS R8000
	code) produces a correctly working executable.


Sun Solaris 2.3 and 2.4:
	lcc will build only if -g is removed from the Makefile.
	configure does not provide any simple way to do so for other
	than gcc.


Sun SunOS 4.1.3:
	With flex:
	/usr/lang/CC  -c  -g  -I. -I.  biblex.c
	"biblex.l", line 1100: error: bad argument  1 type for realloc():
		void * ( malloc_t  expected)
	"biblex.l", line 1110: error: bad argument  1 type for free():
		void * ( char * expected)

	No resolvable: code is generated by flex, and realloc() and free()
	have types that this old C++ compiler will not coerce to from void*.

	With lex:
	/usr/lang/CC  -c  -g  -I. -I.  biblex.c
	"biblex.c", line 223: error:  undefined function yylook called
	"biblex.c", line 226: error:  undefined function yywrap called
	"biblex.c", line 352: error:  undefined function lex_input called
	"biblex.c", line 1181: error:  undefined function yyback called


	With lex:
	/usr/local/bin/g++  -c  -g -O  -I. -I.  biblex.c
	fails because lex-generated code is not C++ compatible.

	/usr/local/bin/lcc -A -A -c  -g  -I. -I.  strdup.c
	strdup.c: /usr/include/sys/stdtypes.h:27: redeclaration of `size_t'
	strdup.c: /usr/include/sys/stdtypes.h:30: redeclaration of `wchar_t'
