Go to the first, previous, next, last section, table of contents.


3. Path searching

This chapter describes the generic path searching mechanism Kpathsea provides. For information about searching for particular file types (e.g., TeX fonts), see the next chapter.

3.1 Searching overview

A search path is a colon-separated list of path elements, which are directory names with a few extra frills. A search path can come from (a combination of) many sources; see below. To look up a file `foo' along a path `.:/dir', Kpathsea checks each element of the path in turn: first `./foo', then `/dir/foo', returning the first match (or possibly all matches).

The "colon" and "slash" mentioned here aren't necessarily `:' and `/' on non-Unix systems. Kpathsea tries to adapt to other operating systems' conventions.

To check a particular path element e, Kpathsea first sees if a prebuilt database (see section 3.4 Filename database (ls-R)) applies to e, i.e., if the database is in a directory that is a prefix of e. If so, the path specification is matched against the contents of the database.

If the database does not exist, or does not apply to this path element, or contains no matches, the filesystem is searched (if this was not forbidden by the specification with `!!' and if the file being searched for must exist). Kpathsea constructs the list of directories that correspond to this path element, and then checks in each for the file being searched for. (To help speed future lookups of files in the same directory, the directory in which a file is found is floated to the top of the directory list.)

The "file must exist" condition comes into play with VF files and input files read by the TeX `\openin' command. These files may not exist (consider `cmr10.vf'), and so it would be wrong to search the disk for them. Therefore, if you fail to update `ls-R' when you install a new VF file, it will never be found.

Each path element is checked in turn: first the database, then the disk. If a match is found, the search stops and the result is returned. This avoids possibly-expensive processing of path specifications that are never needed on a particular run. (Unless the search explicitly requested all matches.)

Although the simplest and most common path element is a directory name, Kpathsea supports additional features in search paths: layered default values, environment variable names, config file values, users' home directories, and recursive subdirectory searching. Thus, we say that Kpathsea expands a path element, meaning transforming all the magic specifications into the basic directory name or names. This process is described in the sections below. It happens in the same order as the sections.

Exception to all of the above: If the filename being searched for is absolute or explicitly relative, i.e., starts with `/' or `./' or `../', Kpathsea simply checks if that file exists.

Ordinarily, if Kpathsea tries to access a file or directory that cannot be read, it gives a warning. This is so you will be alerted to directories or files that accidentally lack read permission (for example, a `lost+found'). If you prefer not to see these warnings, include the value `readable' in the TEX_HUSH environment variable or config file value.

This generic path searching algorithm is implemented in `kpathsea/pathsearch.c'. It is employed by a higher-level algorithm when searching for a file of a particular type (see section 4.2 File lookup, and section 4.3 Glyph lookup).

3.2 Path sources

A search path can come from many sources. In the order in which Kpathsea uses them:

  1. A user-set environment variable, e.g., TEXINPUTS. Environment variables with an underscore and the program name appended override; for example, TEXINPUTS_latex overrides TEXINPUTS if the program being run is named `latex'.
  2. A program-specific configuration file, e.g., an `S /a:/b' line in Dvips' `config.ps' (see section `Config files' in Dvips).
  3. A line in a Kpathsea configuration file `texmf.cnf', e.g., `TEXINPUTS=/c:/d' (see below).
  4. The compile-time default (specified in `kpathsea/paths.h').

You can see each of these values for a given search path by using the debugging options (see section 2.6.3 Debugging).

These sources may be combined via default expansion (see section 3.3.1 Default expansion).

3.2.1 Config files

As mentioned above, Kpathsea reads runtime configuration files named `texmf.cnf' for search path and other definitions. The search path used to look for these configuration files is named TEXMFCNF, and is constructed in the usual way, as described above, except that configuration files cannot be used to define the path, naturally; also, an `ls-R' database is not used to search for them.

Kpathsea reads all `texmf.cnf' files in the search path, not just the first one found; definitions in earlier files override those in later files. Thus, if the search path is `.:$TEXMF', values from `./texmf.cnf' override those from `$TEXMF/texmf.cnf'.

While (or instead of) reading this description, you may find it helpful to look at the distributed `texmf.cnf', which uses or at least mentions most features. The format of `texmf.cnf' files follows:

Here is a configuration file fragment illustrating most of these points:

% TeX input files -- i.e., anything to be found by \input or \openin ...
latex209_inputs = .:$TEXMF/tex/latex209//:$TEXMF/tex//
latex2e_inputs = .:$TEXMF/tex/latex//:$TEXMF/tex//
TEXINPUTS = .:$TEXMF/tex//
TEXINPUTS.latex209 = $latex209_inputs
TEXINPUTS.latex2e = $latex2e_inputs
TEXINPUTS.latex = $latex2e_inputs

Although this format has obvious similarities to Bourne shell scripts--change the comment character to #, disallow spaces around the =, and get rid of the .name convention, and it could be run through the shell. But there seemed little advantage to doing this, since all the information would have to passed back to Kpathsea and parsed there anyway, since the sh process couldn't affect its parent's environment.

The implementation of all this is in `kpathsea/cnf.c'.

3.3 Path expansion

Kpathsea recognizes certain special characters and constructions in search paths, similar to that in shells. As a general example: `~$USER/{foo,bar}//baz' expands to all subdirectories under directories `foo' and `bar' in $USER's home directory that contain a directory or file `baz'. These expansions are explained in the sections below.

3.3.1 Default expansion

If the highest-priority search path (see section 3.2 Path sources) contains an extra colon (i.e., leading, trailing, or doubled), Kpathsea inserts at that point the next-highest-priority search path that is defined. If that inserted path has an extra colon, the same happens with the next-highest. (An extra colon in the compile-time default value has unpredictable results, so installers beware.)

For example, given an environment variable setting

setenv TEXINPUTS /home/karl:

and a TEXINPUTS value from `texmf.cnf' of

.:$TEXMF//tex

then the final value used for searching will be:

/home/karl:.:$TEXMF//tex

Since Kpathsea looks for multiple configuration files, it would be natural to expect that (for example) an extra colon in `./texmf.cnf' would expand to the path in `$TEXMF/texmf.cnf'. Or, with Dvips' configuration files, that an extra colon in `config.$PRINTER' would expand to the path in `config.ps'. This doesn't happen. It's not clear this would be desirable in all cases, and trying to devise a way to specify the path to which the extra colon should expand seemed truly baroque.

Technicality: Since it would be useless to insert the default value in more than one place, Kpathsea changes only one extra `:' and leaves any others in place (they will eventually be ignored). Kpathsea checks first for a leading `:', then a trailing `:', then a doubled `:'.

You can trace this by debugging "paths" (see section 2.6.3 Debugging). Default expansion is implemented in the source file `kpathsea/kdefault.c'.

3.3.2 Variable expansion

`$foo' or `${foo}' in a path element is replaced by (1) the value of an environment variable `foo' (if defined); (2) the value of `foo' from `texmf.cnf' (if defined); (3) the empty string.

If the character after the `$' is alphanumeric or `_', the variable name consists of all consecutive such characters. If the character after the `$' is a `{', the variable name consists of everything up to the next `}' (braces may not be nested around variable names). Otherwise, Kpathsea gives a warning and ignores the `$' and its following character.

You must quote the $'s and braces as necessary for your shell. Shell variable values cannot be seen by Kpathsea, i.e., ones defined by set in C shells and without export in Bourne shells.

For example, given

setenv tex /home/texmf
setenv TEXINPUTS .:$tex:${tex}prev

the final TEXINPUTS path is the three directories:

.:/home/texmf:/home/texmfprev

The `.progname' suffix on variables and `_progname' on environment variable names are not implemented for general variable expansions. These are only recognized when search paths are initialized (see section 3.2 Path sources).

Variable expansion is implemented in the source file `kpathsea/variable.c'.

3.3.3 Tilde expansion

A leading `~' in a path element is replaced by the value of the environment variable HOME, or `.' if HOME is not set.

A leading `~user' in a path element is replaced by user's home directory from the system `passwd' database.

For example,

setenv TEXINPUTS ~/mymacros:

will prepend a directory `mymacros' in your home directory to the default path.

As a special case, if a home directory ends in `/', the trailing slash is dropped, to avoid inadvertently creating a `//' construct in the path. For example, if the home directory of the user `root' is `/', the path element `~root/mymacros' expands to just `/mymacros', not `//mymacros'.

Tilde expansion is implemented in the source file `kpathsea/tilde.c'.

3.3.4 Brace expansion

`x{a,b}y' expands to `xay:xby'. For example:

foo/{1,2}/baz

expands to `foo/1/baz:foo/2/baz'. `:' is the path separator on the current system; e.g., on a DOS system, it's `;'.

Braces can be nested; for example, `x{A,B{1,2}}y' expands to `xAy:xB1y:xB2y'.

Multiple non-nested braces are expanded from right to left; for example, `x{A,B}{1,2}y' expands to `x{A,B}1y:x{A,B}2y', which expands to `xA1y:xB1y:xA2y:xB2y'.

This feature can be used to implement multiple TeX hierarchies, by assigning a brace list to $TEXMF, as mentioned in `texmf.in'.

You can also use the path separator in stead of the comma. The last example could have been written `x{A:B}{1:2}y'.

Brace expansion is implemented in the source file `kpathsea/expand.c'. It is a modification of the Bash sources, and is thus covered by the GNU General Public License, rather than the Library General Public License that covers the rest of Kpathsea.

3.3.5 KPSE_DOT expansion

When KPSE_DOT is defined in the environment, it names a directory that should be considered the current directory for the purpose of looking up files in the search paths. This feature is needed by the `mktex...' scripts section 2.2.9 `mktex' scripts, because these change the working directory. You should not ever define it yourself.

3.3.6 Subdirectory expansion

Two or more consecutive slashes in a path element following a directory d is replaced by all subdirectories of d: first those subdirectories directly under d, then the subsubdirectories under those, and so on. At each level, the order in which the directories are searched is unspecified. (It's "directory order", and definitely not alphabetical.)

If you specify any filename components after the `//', only subdirectories which match those components are included. For example, `/a//b' would expand into directories `/a/1/b', `/a/2/b', `/a/1/1/b', and so on, but not `/a/b/c' or `/a/1'.

You can include multiple `//' constructs in the path.

`//' at the beginning of a path is ignored; you didn't really want to search every directory on the system, did you?

I should mention one related implementation trick, which I took from GNU find. Matthew Farwell suggested it, and David MacKenzie implemented it.

The trick is that in every real Unix implementation (as opposed to the POSIX specification), a directory which contains no subdirectories will have exactly two links (namely, one for `.' and one for `..'). That is to say, the st_nlink field in the `stat' structure will be two. Thus, we don't have to stat everything in the bottom-level (leaf) directories--we can just check st_nlink, notice it's two, and do no more work.

But if you have a directory that contains a single subdirectory and 500 regular files, st_nlink will be 3, and Kpathsea has to stat every one of those 501 entries. Therein lies slowness.

You can disable the trick by undefining UNIX_ST_LINK in `kpathsea/config.h'. (It is undefined by default except under Unix.)

Unfortunately, in some cases files in leaf directories are stat'd: if the path specification is, say, `$TEXMF/fonts//pk//', then files in a subdirectory `.../pk', even if it is a leaf, are checked. The reason cannot be explained without reference to the implementation, so read `kpathsea/elt-dirs.c' (search for `may descend') if you are curious. And if you can find a way to solve the problem, please let me know.

Subdirectory expansion is implemented in the source file `kpathsea/elt-dirs.c'.

3.4 Filename database (ls-R)

Kpathsea goes to some lengths to minimize disk accesses for searches (see section 3.3.6 Subdirectory expansion). Nevertheless, at installations with enough directories, searching each possible directory for a given file can take an excessively long time (depending on the speed of the disk, whether it's NFS-mounted, how patient you are, etc.).

In practice, a font tree containing the standard PostScript and PCL fonts is large enough for searching to be noticeably slow on typical systems these days. Therefore, Kpathsea can use an externally-built "database" file named `ls-R' that maps files to directories, thus avoiding the need to exhaustively search the disk.

A second database file `aliases' allows you to give additional names to the files listed in `ls-R'. This can be helpful to adapt to "8.3" filename conventions in source files.

The `ls-R' and `aliases' features are implemented in the source file `kpathsea/db.c'.

3.4.1 `ls-R'

As mentioned above, you must name the main filename database `ls-R'. You can put one at the root of each TeX installation hierarchy you wish to search ($TEXMF by default); most sites have only one hierarchy. Kpathsea looks for `ls-R' files along the TEXMFDBS path, so that should presumably match the list of hierarchies.

The recommended way to create and maintain `ls-R' is to run the mktexlsr script, which is installed in `$(bindir)' (`/usr/local/bin' by default). That script goes to some trouble to follow symbolic links as necessary, etc. It's also invoked by the distributed `mktex...' scripts.

At its simplest, though, you can build `ls-R' with the command

cd /your/texmf/root && ls -LAR ./ >ls-R

presuming your ls produces the right output format (see the section below). GNU ls, for example, outputs in this format. Also presuming your ls hasn't been aliased in a system file (e.g., `/etc/profile') to something problematic, e.g., `ls --color=tty'. In that case, you will have to disable the alias before generating `ls-R'. For the precise definition of the file format, see section 3.4.3 Database format.

Regardless of whether you use the supplied script or your own, you will almost certainly want to invoke it via cron, so when you make changes in the installed files (say if you install a new LaTeX package), `ls-R' will be automatically updated.

The `-A' option to ls includes files beginning with `.' (except for `.' and `..'), such as the file `.tex' included with the LaTeX tools package. (On the other hand, directories whose names begin with `.' are always ignored.)

If your system does not support symbolic links, omit the `-L'.

ls -LAR /your/texmf/root will also work. But using `./' avoids embedding absolute pathnames, so the hierarchy can be easily transported. It also avoids possible trouble with automounters or other network filesystem conventions.

Kpathsea warns you if it finds an `ls-R' file, but the file does not contain any usable entries. The usual culprit is running plain `ls -R' instead of `ls -LR ./' or `ls -R /your/texmf/root'. Another possibility is some system directory name starting with a `.' (perhaps if you are using AFS); Kpathsea ignores everything under such directories.

Because the database may be out-of-date for a particular run, if a file is not found in the database, by default Kpathsea goes ahead and searches the disk. If a particular path element begins with `!!', however, only the database will be searched for that element, never the disk. If the database does not exist, nothing will be searched. Because this can surprise users ("I see the font `foo.tfm' when I do an ls; why can't Dvips find it?"), it is not in any of the default search paths.

3.4.2 Filename aliases

In some circumstances, you may wish to find a file under several names. For example, suppose a TeX document was created using a DOS system and tries to read `longtabl.sty'. But now it's being run on a Unix system, and the file has its original name, `longtable.sty'. The file won't be found. You need to give the actual file `longtable.sty' an alias `longtabl.sty'.

You can handle this by creating a file `aliases' as a companion to the `ls-R' for the hierarchy containing the file in question. (You must have an `ls-R' for the alias feature to work.)

The format of `aliases' is simple: two whitespace-separated words per line; the first is the real name `longtable.sty', and second is the alias (`longtabl.sty'). These must be base filenames, with no directory components. `longtable.sty' must be in the sibling `ls-R'.

Also, blank lines and lines starting with `%' or `#' are ignored in `aliases', to allow for comments.

If a real file `longtabl.sty' exists, it is used regardless of any aliases.

3.4.3 Database format

The "database" read by Kpathsea is a line-oriented file of plain text. The format is that generated by GNU (and most other) ls programs given the `-R' option, as follows.

For example, here's the first few lines of `ls-R' (which totals about 30K bytes) on my system:

bibtex
dvips
fonts
ls-R
metafont
metapost
tex
web2c

./bibtex:
bib
bst
doc

./bibtex/bib:
asi.bib
btxdoc.bib
...

3.5 kpsewhich: Standalone path searching

The Kpsewhich program exercises the path searching functionality independent of any particular application. This can also be useful as a sort of find program to locate files in your TeX hierarchies, perhaps in administrative scripts. It is used heavily in the distributed `mktex...' scripts.

Synopsis:

kpsewhich option... filename...

The options and filename(s) to look up can be intermixed. Options can start with either `-' or `--', and any unambiguous abbreviation is accepted.

3.5.1 Path searching options

Kpsewhich looks up each non-option argument on the command line as a filename, and returns the first file found. There is no option to return all the files with a particular name (you can run the Unix `find' utility for that, see section `Invoking find' in GNU find utilities).

Various options alter the path searching behavior:

`--dpi=num'
Set the resolution to num; this only affects `gf' and `pk' lookups. `-D' is a synonym, for compatibility with Dvips. Default is 600.
`--format=name'
Set the format for lookup to name. By default, the format is guessed from the filename, with `tex' being used if nothing else fits. The recognized filename extensions (including any leading `.') are also allowable names. All formats also have a name, which is the only way to specify formats with no associated suffix. For example, for Dvips configuration files you can use `--format="dvips config"'. (The quotes are for the sake of the shell.) Here's the current list of recognized names and the associated suffixes. See section 4.1 Supported file formats, for more information on each of these.
gf: gf
pk: pk
bitmap font
afm: .afm
base: .base
bib: .bib
bst: .bst
cnf: .cnf
ls-R: ls-R
fmt: .fmt
map: .map
mem: .mem
mf: .mf
mfpool: .pool
mft: .mft
mp: .mp
mppool: .pool
MetaPost support
ocp: .ocp
ofm: .ofm .tfm
opl: .opl
otp: .otp
ovf: .ovf
ovp: .ovp
graphic/figure: .eps .epsi
tex: .tex
TeX system documentation
texpool: .pool
TeX system sources
PostScript header/font: .pro
Troff fonts
tfm: .tfm
type1 fonts: .pfa .pfb
vf: .vf
dvips config
ist: .ist
truetype fonts: .ttf .ttc
type42 fonts
web2c files
other text files
other binary files
This option and `--path' are mutually exclusive.
`--interactive'
After processing the command line, read additional filenames to look up from standard input.
`-mktex=filetype'
`-no-mktex=filetype'
Turn on or off the `mktex' script associated with filetype. The only values that make sense for filetype are `pk', `mf', `tex', and `tfm'. By default, all are off in Kpsewhich. See section 2.2.9 `mktex' scripts.
`--mode=string'
Set the mode name to string; this also only affects `gf' and `pk' lookups. No default: any mode will be found. See section 2.2.9.3 `mktex' script arguments.
`--must-exist'
Do everything possible to find the files, notably including searching the disk. By default, only the `ls-R' database is checked, in the interest of efficiency.
`--path=string'
Search along the path string (colon-separated as usual), instead of guessing the search path from the filename. `//' and all the usual expansions are supported (see section 3.3 Path expansion). This option and `--format' are mutually exclusive. To output the complete directory expansion of a path, instead of doing a one-shot lookup, see `--expand-path' in the following section.
`--progname=name'
Set the program name to name; default is `kpsewhich'. This can affect the search paths via the `.prognam' feature in configuration files (see section 3.2.1 Config files).

3.5.2 Auxiliary tasks

Kpsewhich provides some additional features not strictly related to path lookup:

3.5.3 Standard options

Kpsewhich accepts the standard GNU options:


Go to the first, previous, next, last section, table of contents.