% /u/sy/beebe/tex/bibindex/2-6/NEWS, Wed Oct  6 14:35:32 1993
% Edit by Nelson H. F. Beebe <beebe@plot79.math.utah.edu>

This file records development highlights of bibindex and biblook.  See
the separate README file for a summary of the systems on which these
programs have been successfully tested, and a description of what they
do.  See the PROBLEMS file for reports of problems in porting the
programs to other systems.  See the bibindex.c and biblook.c files for
a detailed change history.

------------------------------------------------------------------------
[15-Sep-1993] -- [06-Oct-1993] 
Final (I hope) release of bibindex 2.6.

Eliminate fixed-length strings in most of bibindex and biblook.  This
reduces the size of the .bix files noticeably, and the cost of some
extra time to allocate them dynamically.  

Make biblook ignore both .bix and .bib extensions on its argument file
name.

Add convenience targets to the Makefile; 24 such targets make it
simple to build the programs on several different UNIX systems with
various C and C++ compilers.

Revise the secondary hash function calculation according to Knuth's
guidelines, eliminating the infinite loops that were detected in
testing of older versions of bibindex on some systems.

Port bibindex and biblook to DEC VAX VMS and DEC Alpha OpenVMS.  The
distribution now contains command files for building the programs on
those systems, and VMS executables for VAX and Alpha architectures,
because many VMS sites do not license the C compiler.  A dozen years
of experience with VMS show that despite the disk-space-saving virtue
of shareable libraries, DEC has not been very good about keeping those
libraries usable from one minor release of VMS to the next, with the
result that shareable executables stop working on a new VMS release.
Consequently, these executables are linked with non-shareable
libraries, making them somewhat larger, in the hope that they will be
usable on a larger number of VMS systems.

Used tcov, prof, and gprof profilers to study the performance on
several machines, and made several small tweaks, and one large one, to
improve efficiency.

One important small tweak was to eliminate multiplication and modulus
operators in the hash function inner loops.  Sun SPARC systems (before
MicroSPARC and SuperSPARC), and HP PA-RISC 1.0 and 1.1 systems, lack
integer multiply and divide instructions, and on those systems,
profiling showed that major chunk of time (10%-20%) was being spent in
the hash function computation.  Use of exclusive-OR brought the hash
function time down to 2% or less.  On the IBM 3090, printf() and
sprintf() now account for 97% of the run time of bibindex, and further
tuning is thus beyond user control.

Another important small change was the replacement of the safegetc()
function (through which EVERY character of the .bib file must pass) by
an inline macro.  This produced significant improvements on some
systems.

A third important small tweak was to use hashing for accessing the
badwords[] table; now a large table poses no performance penalty.

The large tweak is the addition of a clean implementation of memory
mapped input.  For biblook on the large 3.6MB test bibliography, this
decreased startup time on my Sun SPARCstation LX (Solaris 2.2) by a
factor of 75 (yes, that much), because there is now only one system
call to read the file, rather than many small ones.  I/O overall is
reduced by an average factor of two.

The memory-mapped input feature is designed as a separate module (and
indeed, many be freely redistributed independently of bibindex), and
can be used or not, based on a compile time choice.  It is definitely
worth using on those systems that support it: IBM (RS/6000, but not
PS/2 or S/370), Silicon Graphics (IRIX 4.0 and 5.1), and Sun (SunOS
and Solaris).  DEC (Ultrix 4.3) supports it, but in every test I made
with various arguments to mmap(), it always returned -1 (failure), and
set errno to EINVAL.  HP 9000/7xx systems with HP-UX support it, but
because of the fileno() overhead noted below, it is slower than
regular I/O.  The tradeoff with memory-mapped I/O is that the run-time
memory image is larger, since the entire .bix file is mapped into
memory.  I do not view this as a problem on any virtual memory
workstation today: the .bix file for my 3.6MB test .bib file is only
about 900KB, so only about 4.5MB is needed to have them both memory
mapped.
  
On the HP 9000/735, integer modulus now accounts for less than 3% of
the time.  Curiously, fileno() is now a major consumer (7%): HP-UX 8.0
and 9.0 use a 16-bit file descriptor, and fileno() is always a
function call, not an inline macro.  The macros in memio.h were
revised to reduce the number of calls to fileno() because of the
performance issue on this system.  fileno() is called on average 2.3
times for every input character when memory-mapped input is used (via
feof(), getc(), and ungetc()).  Consequently, bibindex runs about 25%
SLOWER with memory-mapped input on this system, and the HP convenience
targets in the Makefile therefore build the programs without
memory-mapped input.  Other programs that do not require feof() or
ungetc(), but just getc(), would likely run faster on this system with
memory mapped input than without; biblook certainly does.

gprof results for bibindex on several of these systems (IBM RS/6000,
IBM 3090, HP 9000/7xx, Sun) show malloc() taking less than 2% of the
time, so the dynamic allocation of variable-length strings is
definitely NOT a performance issue.

tcov results for bibindex show that the three hash function loops in
GetHashTable (443,474), GetHashCell (2,160,305), and InHashCell
(1,218,161), and the basic character reading loop in GetNextWord
(1,730,713) are the most frequently executed statements (counts are
for the 3.6MB test file).

------------------------------------------------------------------------
[16-Sep-1993]
At the urging of Jeff Erickson, add support for pagination of help
output.

------------------------------------------------------------------------
[14-Sep-1993]
Complete merging and testing of changes from Jeff Erickson to make 
test release for bibindex 2.6.  Here is a copy of my mail to co-developers:

I've taken the bibindex 2.6 test code sent out last week by Jeff, and
tested in on about 25 compilers and O/S combinations, using a
concatenation of all of my .bib files (3.6MB, 11919 entries, 121338
lines) as the testbed.  This turned up several bugs and portability
problems, all of which have now been fixed.  I've placed the new code
on ftp.math.utah.edu in pub/tex/bib in these files:

-rw-rw-r--  1 beebe         596 Sep 13 22:18 bibindex-2-6.tar-lst
-rw-rw-r--  1 beebe       60689 Sep 13 22:18 bibindex-2-6.tar.z
-rw-rw-r--  1 beebe       49108 Sep 13 22:18 bibindex-2-6.zip
-rw-rw-r--  1 beebe         972 Sep 13 22:18 bibindex-2-6.zip-lst
-rw-rw-r--  1 beebe       69356 Sep 13 22:18 bibindex-2-6.zoo
-rw-rw-r--  1 beebe         763 Sep 13 22:18 bibindex-2-6.zoo-lst

(.tar.z == .tar.Z, i.e. compress, not pack or gzip).

The most interesting bug only showed up after tests passed on 23
systems.  All files in the distribution have been updated, and there
is a new PROBLEMS file as well.

The badwords list in bibindex is now searched via a hash table,
instead of linearly, which now makes it cheap to have a large table;
without it, the exception search took most of the time.  The badwords
list is still searched linearly in biblook, but that matters little,
because it is only done for each user "find" command, instead of for
the entire database.

There is now a short help and a long help; you get the latter by
asking for help a second time.  This is very important for things like
the siggraph.org "telnet biblio" service; remote users will not have a
manual page to look up details.

The README file lists the machines on which testing has been
successfully completed; it includes 10 C++ compilers.  There are 3
more C++ implementations recorded in the PROBLEMS file which failed
because of bugs in the C++ libraries or include files, neither of
which are fixable by ordinary users.

I've used the tcov, prof, and gprof profilers on various systems to
measure the performance of this new version, and I'm reasonably
satisfied that it is doing reasonably well.  The one thing that might
improve things is to use memory mapped I/O instead of
fread()/fwrite(); on some machines, this speeds I/O by a factor of 1.5
to 2.  Unfortunately, it is not available on all UNIX systems, so both
styles would need to be supported.  Maybe someday I'll make some
experiments along these lines.

Here is part of the flat profile from gprof on an HP-9000/735 with C++
compilation (which generates the funny long external names that encode
argument types).

%time cumsecs seconds   calls  msec/call name    
 23.1    5.73    5.73                    _mcount
 12.2    8.75    3.02                    $$remU
 10.3   11.31    2.56 1132957       0.00 memcpy
  7.8   13.24    1.93 3784501       0.00 safegetc__FP4FILEPCc
  4.6   14.38    1.15                    $$mulU
  4.2   15.44    1.05      65      16.15 ExtendHashTable__FP11ExHashTable
  4.0   16.42    0.98  280308       0.00 GetNextWord__FP4FILEPc
  3.5   17.30    0.88  334294       0.00 GetHashCell__FP11ExHashTablePCc
  2.7   17.97    0.67 1200453       0.00 _strcmp
  2.5   18.59    0.62  360428       0.00 _fwrite
  1.8   19.03    0.44  117196       0.00 tree_cut
  1.6   19.43    0.40 1644272       0.00 __tolower
  1.4   19.77    0.34  206695       0.00 InHashCell__FP11ExHashTablePCc
  1.1   20.04    0.28  720931       0.00 iskeychar__Fci
  1.0   20.28    0.24  268489       0.00 InsertEntry__FP11ExHashTablePcUs

mcount is part of the profiling code.

$$remU and $$mulU are unsigned integer modulus and multiply (HP
PA-RISC 1.1 and Sun SPARC before MicroSPARC and SuperSPARC lack
hardware for these operations), arising mostly from hash function
computations.

safegetc is where the .bib file is read; it is called in nearly 30
places, but I suppose could become a macro to get the benefits of the
in-line access to the I/O buffer that getc() and putc() afford in most
reasonable C implementations.  The code currently reads

char safegetc(FILE *fp, const char *what)
{
    char c;

    if (feof(fp))
	die("Unexpected end of file", what);

    c = getc(fp);
    if (c == '\n')
	line_number++;
    return (c);
}

On most UNIX implementations, feof() is an in-line macro.  On VAX VMS,
both feof() and getc() are function calls, and it would be better to
rewrite this code as

char safegetc(FILE *fp, const char *what)
{
    int c;

    c = getc(fp);
    if (c == '\n')
	line_number++;
    else if (c == EOF)
	die("Unexpected end of file", what);
    return ((char)c);
}

to halve the number of function calls; I have not made this change
however.

On Sun systems, there is a performance problem with large .bib files
because of the way Sun's malloc() works.  I'm going to look into this
some more, and see if I can fix it.  I already did so in another
program, so I know where to look.  On my 3.6MB test file on a Sun
SPARCstation 10/30 with SunOS 4.1.3, bibindex takes 23 sec of CPU
time, but on a SPARCstation ELC, it takes 173 CPU sec, and uses 26MB
of memory.  On an HP 9000/735, the same job takes 35MB of memory, but
"size bibindex" says "150019 + 51232 + 74520 = 275771", so all of the
memory is use has been snarfed by malloc().

More tomorrow or the next day.
