The library catalog server can be specified by a command-line option, with the default server being the world's largest library catalog, the US Library of Congress.
ANSI/NISO Standard Z39.50-1995 and ISO Standard 23950:1998 ``Information and documentation --- Information retrieval (Z39.50) --- Application service definition and protocol specification'' define a library catalog protocol that allows client programs to communicate with library catalog servers around the world, and retrieve data in a small number of different formats, notably USMARC (United States MAchine-Readable Cataloging) and SUTRS (Simple Unstructured Text Record Syntax).
Because of its frequency of use, the option may be abbreviated to a single letter, even though it shares a two-character prefix with other options.
\s-2http://www.indexdata.dk/targettest/\s+2
The option name may be abbreviated to a single letter.
The option can be specified as many times as needed, and all specified servers are accumulated into a master list that is searched on completion of command-line processing.
The special server name home means all of the aliases in the user-specific file $HOME/.cattobib.rc. Similarly, the special server name local means all of the aliases in the local system-wide file /usr/uumath/share/cattobib/cattobib-0.14/cattobib.rc. Any of the aliases in those files can also be used individually in the -server option. See the section INITIALIZATION FILES for more about such files.
In this list, vertical bars separate alternatives, and asterisk matches any word with that prefix:
\s-2
ALL | A* All Z39.50 servers known to cattobib
NATIONAL | N* National libraries and union catalogs
alberta | ab University of Alberta
amherst | umass University of Massachusetts, Amherst
amicus | ca National Library of Canada
anu Australian National University
be | ulb Université Libre de Bruxelles, Belgium
bibsys Norwegian Union Catalog
bne | es National Library of Spain
bnp | pt National Library of Portugal
boulder | co University of Colorado, Boulder
british | br British Library
byu Brigham Young University
calgary University of Calgary
caltech California Institute of Technology
carnegie | cmu Carnegie Mellon University
chicago University of Chicago
columbia | cu Columbia University
congress | lc | loc US Library of Congress
copac | uk COPAC (union of 24 research-university
catalogs in the UK and Ireland)
cosmos Danish National Library of Science & Medicine
crl Center for Research Libraries
dartmouth Dartmouth College, NH
denmark | dk Royal Library of Denmark
dsb | dsl Danish State Library
duke Duke University
edinburgh | ed Edinburgh University
emory Emory University
eu European University Institute Library
florida | fl Florida Center for Library Automation
gbv German Union Catalog
gmu George Mason University
harvard Harvard University
hopkins | jhu The Johns Hopkins University
indiana | iu Indiana University
indystate | in Indiana State University Consortium
kings | kcl King's College (London)
ku-leuven | ku Katholieke Universiteit Leuven, Belgium
madrid Universidad Autonoma de Madrid, Spain
marriott | utah | ut University of Utah Marriott Library
mcgill McGill University
melbourne University of Melbourne
melvyl | cal University of California
michigan | mi University of Michigan
minn* | mn University of Minnesota
mit Massachusetts Institute of Technology
montreal | um Université de Montréal
nbi | bohr Niels Bohr Institut
newyork | ny New York University
nla | au National Library of Australia
nlm | nih National Library of Medicine (US)
nlnz | nz National Library of New Zealand
nls | scotland National Library of Scotland
nlw | cwm | wales National Library of Wales
northwestern | nw Northwestern University
norway | no National Library of Norway
nsw | unsw University of New South Wales
nus | sg National University of Singapore
odense | sdu University of Southern Denmark
oregon | uo University of Oregon
oxford | ox* Oxford University
pennstate | psu | pa Pennsylvania State University
poland | pl National Library of Poland
princeton Princeton University
quebec | uq Université de Québec
rlg Research Libraries Group
rutgers Rutgers University
sfu Simon Fraser University
sudoc | abes | fr French Union Catalog
sweden | se National Library of Sweden
stanford | su Stanford University
stockholm Stockholm University
tamu Texas A&M University
texas | tx University of Texas at Austin
toronto University of Toronto
trinity | tcd Trinity College, Dublin
tub | berlin Technische Universität Berlin, Germany
tud | darmstadt Technische Universität Darmstadt, Germany
tufts Tufts University
ub | bern Universität Bern, Switzerland
ucc | cork University College Cork, Ireland
ucd | dublin University College Dublin, Ireland
ucsf University of California, San Francisco
upenn | penn University of Pennsylvania
usc University of Southern California
uta University of Texas, Arlington
uwo University of Western Ontario
vanderbilt Vanderbilt University
victoria | vu Victoria University, Melbourne, Australia
westpoint | usma United States Military Academy West Point
witwatersrand | wit University of Witwatersrand, Johannesburg, South Africa
wustl Washington University St. Louis
yale Yale University\s+2
A later section provides information about representing accented characters in searches of the catalog of the National Library of Poland.
The option name may be abbreviated to a single letter.
\s-2% cattobib 1-57586-011-2
%% Searching [z3950.loc.gov:7090/Voyager] for [1575860112]: flags = [@attr 1=7]
@Book{Knuth:1999:DT,
author = "Donald Ervin Knuth",
title = "Digital typography",
volume = "78",
publisher = "CSLI Publications",
address = "Stanford, Calif.",
pages = "xv + 685",
year = "1999",
ISBN = "1-57586-011-2 (cloth), 1-57586-010-4 (paperback)",
ISBN-13 = "978-1-57586-011-4 (cloth), 978-1-57586-010-7 (paperback)",
LCCN = "Z249.3 .K59 1999",
bibdate = "Wed Jun 22 18:49:36 2005",
bibsource = "z3950.loc.gov:7090/Voyager",
series = "CSLI lecture notes",
URL = "ftp://uiarchive.cso.uiuc.edu/pub/etext/gutenberg/;
http://www.loc.gov/catdir/description/cam029/98027331.html;
http://www.loc.gov/catdir/toc/cam022/98027331.html",
acknowledgement = ack-nhfb,
subject = "Printing; Data processing; Computerized typesetting;
Computer fonts; TeX (Computer file); METAFONT",
}\s+2
Remark: The ISBN is a unique identifier assigned to books published throughout the world since about 1972. It consists of ten decimal digits, the last of which may also be the letter X, divided into four hyphen- (or rarely, space-) separated parts: country or language, publisher, book number within the publisher, and a final check digit that can be used to detect invalid ISBNs.
Country/language groups 0 and 1 are English, 2 is French, 3 is German, 4 is Japanese, 5 is Russian, and so on. The Republic of Srpska (1996 population about 1.4 million people) is 99938.
Large publishers have small numbers (e.g., Collins is 00, McGraw-Hill is 07, and Prentice-Hall is 13), and small publishers have big numbers (e.g., Peachpit Press is 938151 and Personal TeX is 9631044).
When a publisher exhausts its range of book numbers, it gets a new publisher number: for example, O'Reilly Media Inc. now has publisher numbers 937175, 56592, 4493, and 596. Each of those steps allows a ten-fold change in the number of possible book numbers.
Because the 10-digit ISBN numbers are rapidly being exhausted, effective 1-Jan-2007, they are no longer issued, and instead are replaced by new 13-digit values based on the European Article Numbering (EAN) system. The name for the new system is ISBN-13, and such values are also EAN values.
From version 0.02, cattobib output includes both ISBN(-10) and ISBN-13 data, because the latter are now found in many online bookstore and library catalogs, and many publishers now print them both with the back-cover bar code.
ISBN-13 translations of ISBN-10 data are handled automatically by the biborder(1) utility, and consist of the prefix 978-, followed by the first nine digits of the ISBN-10 value with the same (optional) hyphenation as before, followed by a hyphen and a new check digit. The latter is computed by a different algorithm, and does not in general match the tenth digit (the check digit) of the ISBN-10 value.
ISBN-13 values can also begin with 979-, but they are still rare, and do not have ISBN-10 equivalents. They are needed when a publisher exhausts its assigned book-number range(s) in the 978- group, and no free ranges are available elsewhere from the assignment authority.
Search the default Z39.50 server for a book by its title:
\s-2% cattobib --title 'Digital Typography Sourcebook'
%% Searching [z3950.loc.gov:7090/Voyager] for [Digital Typography Sourcebook]: flags = []
@Book{Bryan:1996:DTS,
author = "Marvin Bryan",
title = "The digital typography sourcebook",
publisher = "Wiley",
address = "New York",
pages = "xxiv + 384, 3",
year = "1996",
ISBN = "0-471-14811-3 (paper/CD-ROM)",
ISBN-13 = "978-0-471-14811-1",
LCCN = "Z250.7 .B79 1996",
bibdate = "Wed Jun 22 18:49:36 2005",
bibsource = "z3950.loc.gov:7090/Voyager",
URL = "ftp://uiarchive.cso.uiuc.edu/pub/etext/gutenberg/;
http://www.loc.gov/catdir/bios/wiley047/96013161.html;
http://www.loc.gov/catdir/description/wiley033/96013161.html;
http://www.loc.gov/catdir/toc/onix04/96013161.html",
acknowledgement = ack-nhfb,
subject = "Computer fonts",
}\s+2
Search the British Library for the same book:
\s-2% cattobib --server br --title 'Digital Typography Sourcebook'
%% Searching [z3950cat.bl.uk:9909/BLAC] for [Digital Typography Sourcebook]: flags = []
%% IGNORED: Number of hits: 1, setno 1
...
@Book{Bryan:1997:DTS,
author = "Marvin Bryan",
title = "The digital typography sourcebook",
publisher = "Wiley",
address = "New York ; Chichester",
pages = "xxiv + 384",
year = "1997",
ISBN = "0-471-14811-3 (paperback)",
ISBN-13 = "978-0-471-14811-1",
bibdate = "Wed Jun 22 18:49:36 2005",
acknowledgement = ack-nhfb,
subject = "Computer fonts",
}\s+2
Search the National Library of Australia for two books by ISBN:
\s-2% cattobib -q --server au --ISBN 0-06-621285-5 0-19-860702-4
@Book{Winchester:2003:KDW,
author = "Simon Winchester",
title = "Krakatoa: the day the world exploded, 27 August 1883",
publisher = "HarperCollins Publishers",
address = "New York",
pages = "xvi + 416",
year = "2003",
ISBN = "0-06-621285-5",
ISBN-13 = "978-0-06-621285-2",
bibdate = "Wed Jun 22 18:49:36 2005",
bibsource = "catalogue.nla.gov.au:7090/Voyager",
acknowledgement = ack-nhfb,
remark = "Includes bibliographical references and index.",
subject = "Natural disasters; Indonesia; Krakatoa; Social
aspects; Volcanoes; Indonesia; Krakatoa; Krakatoa
(Indonesia); Eruption, 1883",
usmarc-019 = "019 1 $a 24669279",
usmarc-043 = "043 $a a-io---",
usmarc-250 = "250 $a 1st U.S. ed.",
usmarc-984 = "984 $a ANL $c YY 551.2109598 W759",
}
@Book{Winchester:2003:MES,
author = "Simon Winchester",
title = "The meaning of everything: the story of the Oxford
English Dictionary",
publisher = "Oxford University Press",
address = "Oxford",
pages = "xxv + 260",
year = "2003",
ISBN = "0-19-860702-4 (hbk.), 0-19-860702-4 (hbk.)",
ISBN-13 = "978-0-19-860702-1 (hbk.), 978-0-19-860702-1 (hbk.)",
bibdate = "Wed Jun 22 18:49:36 2005",
bibsource = "catalogue.nla.gov.au:7090/Voyager",
price = "No price",
acknowledgement = ack-nhfb,
remark = "Includes ndex.",
subject = "Oxford English dictionary; Lexicology; History",
usmarc-019 = "019 1 $a 25073662",
}\s+2
cattobib handles translation to TeX of just the parts of that character set that are needed for the Polish accented letters. Input search strings are, however, not in any standard encoding, but instead require an awkward and idiosyncratic representation of the 22 accented letters:ftp://dkuug.dk/i18n/charmaps/117
\s-2{834}A LATIN CAPITAL LETTER A WITH ACUTE
{834}C LATIN CAPITAL LETTER C WITH ACUTE
{834}E LATIN CAPITAL LETTER E WITH ACUTE
{834}N LATIN CAPITAL LETTER N WITH ACUTE
{834}O LATIN CAPITAL LETTER O WITH ACUTE
{834}S LATIN CAPITAL LETTER S WITH ACUTE
{834}Z LATIN CAPITAL LETTER Z WITH ACUTE
{834}a LATIN SMALL LETTER A WITH ACUTE
{834}c LATIN SMALL LETTER C WITH ACUTE
{834}e LATIN SMALL LETTER E WITH ACUTE
{834}n LATIN SMALL LETTER N WITH ACUTE
{834}o LATIN SMALL LETTER O WITH ACUTE
{834}s LATIN SMALL LETTER S WITH ACUTE
{834}z LATIN SMALL LETTER Z WITH ACUTE
{839}Z LATIN CAPITAL LETTER Z WITH DOT ABOVE
{839}z LATIN SMALL LETTER Z WITH DOT ABOVE
{846}A LATIN CAPITAL LETTER A WITH OGONEK
{846}E LATIN CAPITAL LETTER E WITH OGONEK
{846}a LATIN SMALL LETTER A WITH OGONEK
{846}e LATIN SMALL LETTER E WITH OGONEK
{888} LATIN CAPITAL LETTER L WITH STROKE
{888} LATIN SMALL LETTER L WITH STROKE\s+2
Notice that the
same
input encoding is used for both lowercase and uppercase l-with-stroke.
The prefix {834} represents the acute accent, {839} the
dot accent, and {846} the ogonek (a hook accent attached near
the lower right corner of the letter). Pictures of all of the Polish
accented letters are available on the Web at
http://www.eki.ee/letter/chardata.cgi?lang=pl+Polish&script=latin
Thus, to search for the author name represented in TeX as Bie{\'n}, use the command
cattobib --server pl --author "Bie{834}n"
For the title represented in TeX as Wi{\k{e}}{\'z}niowie Moskwy,
use the command
cattobib --server pl --title "Wi{846}e{834}zniowie Moskwy"
It would of course be much easier for users to allow matches of
unaccented letters with accented ones, but that feature is not
supported by the library catalog Z39.50 server. Instead,
cattobib
provides a convenient alternative: TeX markup for the Polish
accented letters is silently translated to the form needed for the
National Library of Poland. You can then write the sample search
commands as:
cattobib --server pl --author "Bie{\'n}"
cattobib --server pl --title "Wi{\k{e}}{\'z}niowie Moskwy"
Outer braces surrounding accented letters may be omitted: both
{\k{a}} and \k{a} are recognized, as are {\'z},
\'z, {\.z}, \.z, and so on.
- Optional comments run from sharp to end of line, and are discarded first.
- Long lines may be continued by a backslash at end of line; the backslash and newline are removed, and thus, may be embedded in a name if that is useful.
- lines of the form alias name v1 v2 ... vk define name to be a possibly-empty whitespace-separated list of values. Each value is normally a Z39.50 server name, such as z3950.loc.gov:7090/Voyager, but may also be a previously-defined alias whose value is to be substituted for that alias.
- All other lines are silently ignored.
Here is a sample file to illustrate the syntax, with short uppercase names in place of long Z39.50 server names:
\s-2
### Test file for initialization file preprocessing
### Start with some basic definitions
alias one ONE # comment
alias two TWO # comment
alias three THREE # comment
alias four FOUR # comment
alias five FIVE # comment
alias six SIX # comment
alias seven SEVEN # comment
alias eight EIGHT # comment
alias nine NINE # comment
alias ten TEN # comment
### redefine an alias name
alias three UNO \
DOS \
TRES # comment
alias v3 three
### undefine an alias name
alias three # comment
alias v4_5 three FOUR FIVE # comment
alias nine_ten NINE TEN # comment
### show line continuation: backslash-newline disappears,
### and thus, may appear in the middle of a name
alias v6_10 six \
seven \
eight \
nine_\
ten # comment
alias v_even two four six eight ten
alias v_odd one three five seven nine
\s+2
cattobib
reduces that file by discarding comments, joining wrapped lines, and
expanding aliases to produce a temporary file that looks like this:
\s-2 alias one ONE alias two TWO alias three THREE alias four FOUR alias five FIVE alias six SIX alias seven SEVEN alias eight EIGHT alias nine NINE alias ten TEN alias three UNO DOS TRES alias v3 UNO DOS TRES alias three alias v4_5 FOUR FIVE alias nine_ten NINE TEN alias v6_10 SIX SEVEN EIGHT NINE TEN alias v_even TWO FOUR SIX EIGHT TEN alias v_odd ONE FIVE SEVEN NINE \s+2
There are no restrictions on what characters may occur in the whitespace-separated words, except that sharp cannot survive the reduction, because it always starts a comment that is removed. No special marker, such as the Unix shell's dollar sign, is needed to request expansion; the number of alias names is likely to be small enough that no conflicts are likely.
Here is a fragment of an initialization file that shows the convenience of aliases of aliases, and alias expansion:
\s-2 # Z39.50 catalogs of the eight members of the Ivy League alias brown # no known Z39.50 server alias columbia clio-db.cc.columbia.edu:7090/Voyager alias dartmouth catalog-lib.dartmouth.edu:210/innopac alias harvard z3950s://navigator.fas.harvard.edu/boston alias mit library.mit.edu:9909/mit01 alias princeton catalog.princeton.edu:7090/voyager alias upenn libdb.lib.upenn.edu:7090/voyager alias yale prodorbis.library.yale.edu:7090/voyager # alternate short abbreviations: alias dar dartmouth alias pen upenn alias pri princeton alias har harvard alias col columbia # all of the Ivy League Z39.50 university catalogs alias ivy brown columbia dartmouth harvard mit princeton upenn yale # large universities in Utah alias byu catalog.lib.byu.edu:2200 alias usu ht02aggies.ser321.usu.edu:20003/OPAC alias utah hip.library.utah.edu:210/horizon alias ut-all byu utah usu \s+2
Each word in the value list is looked up just once in the table of already-defined aliases. The substituted value is not scanned for further aliases, so there is no possibility of an infinite loop during alias substitution.
- completely wrong author lists;
- duplicated records, sometimes with minor variations;
- faulty title capitalization;
- incomplete, inaccurate, or missing page numbers;
- incorrect author order;
- mangled and missing accents;
- off-by-one copyright years;
- truncated author lists and titles;
- ...
The best advice to the user is to search three or more catalogs for the same data, and then merge the results, using a majority vote to resolve discrepancies.
When multiple catalogs provide the same data, it may indicate that the data are likely to be reliable. However, the user is warned that libraries around the world share cataloging data, so there may not be as much data independence as might appear from geographically-distant catalogs.
While the conversion of USMARC and SUTRS markup to BibTeX works reasonably well, there are many catalog record types that are not converted. When they are known not to be useful in BibTeX entries, they are silently discarded. Otherwise, cattobib preserves them as additional key/value pairs, such as the usmarc-nnn keys in the BibTeX output in the EXAMPLES section, or else complains about them in diagnostic messages.
cattobib produces only BibTeX @Book{...} entries, even for conference proceedings, for which a @Proceedings{...} entry is required. Library catalog information often does not distinguish between those document types, so the user must convert such entries.
A certain amount of manual cleanup of the BibTeX output is almost always necessary.
Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 Tel: +1 801 581 5254 FAX: +1 801 581 4148 Email: beebe@math.utah.edu, beebe@acm.org, beebe@computer.org WWW URL: http://www.math.utah.edu/~beebe