# A Brief Introduction to TeX and LaTeX for the University of Utah SIAM Student Chapter Meeting Tuesday 13 November 2018 by Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA E-mail: beebe@math.utah.edu, beebe@acm.org, beebe@computer.org Web: http://www.math.utah.edu/~beebe Tel: +1 801 581 5254 FAX: +1 801 581 4148

Last updates: Wed Nov 21 11:23:15 2018 … Fri May 21 11:18:43 2021

#### Abstract

This document, and its accompanying lecture, presents some of the background of the TeX and LaTeX typesetting systems, then shows outlines of simple, moderate, and complex documents, with a goal of taking a beginner far enough to be able to prepare articles, reports, and books, and optional indexes and bibliographies therein. It gives pointers to extensive online resources throughout the document, and lists important reference books in a final section.

## The TeX typesetting system

TeX (pronounced techhh), and its companion, the METAFONT font design system, were designed and programmed by Donald Knuth at Stanford University, with some help from his students, from 1977 to 1982; some work continued to about 1992.

TeX and METAFONT are remarkable pieces of software that remain in use 40 years after they were first conceived, and because of the care given to their design, they run on computers as small as watches and mobile devices, through tablets, laptops, and desktops, up to supercomputers. On all systems, they produce identical output with respect to fonts, line breaking, and page breaking.

TeX and METAFONT are copyrighted, but are freely available for both private and commercial use. Many large publishers use TeX behind the scenes for preparation of books and journals. The annual TeX Live releases provide TeX and METAFONT, and much else, as easy-to-install downloadable ISO images for most popular platforms. Many O/S distributions also offer similar software bundles in their package systems.

There are several national / language / regional groups devoted to the support, distribution, and use of TeX and METAFONT, most notably, the international TeX User Group (TUG), with a Web site at http://www.tug.org/ . Our Department of Mathematics hosts the master North American mirror of the Comprehensive TeX Archive Network (CTAN) at http://tug.ctan.org and TeX Live at http://www.math.utah.edu/pub/texlive-utah-2018/ .

TeX's view of the typesetting problem is to reduce the task to putting words into boxes whose dimensions are determined by those of character glyphs in the current font, and kerning information that sometimes adjusts interletter spacing (for example, AW usually has tighter spacing than AM), and then those boxes are separated with flexible horizontal space, called glue. The paragraph-breaking algorithm then finds optimal places to break lines to satisfy default, or user-specified, constraints. The lines are assembled into a page galley, and when the page is filled up, an output routine is automatically invoked to decide where to break the page, a process that can introduce additional vertical spacing to fill up an underfull page, or can output the page, with some lines left over for the next page.

The boxes-and-glue model, global line breaking and page breaking algorithms, and powerful multilingual hyphenation, distinguish TeX from all other document formatting and typesetting systems, many of which place lines on a fixed grid, and do line breaking only locally.

The user interfaces to TeX and LaTeX have changed relatively little since they were first designed, and it is rare for changes to introduce incompatibilities. Thus, documents written decades ago can still be typeset without changes; that feature is of utmost importance for many documents, especially in academia and for publishers. In short, TeX and LaTeX are well suited for production of archival documents.

TeX and METAFONT are extraordinarily reliable programs, and their bug history is reflected in the number of fractional digits in their current version numbers: 3.14159265 and 2.7182818 respectively, or 8 and 7 bugs since March 1990, with fixes applied in 1991, 1992, 1995, 1996, 2002, 2003, 2008, and 2014.

Importantly, TeX has a powerful, albeit somewhat unusual, command language that enables extensive customizations of its behavior, as we illustrate in the rest of this document.

## LaTeX document preparation system

While TeX provides superb document typesetting, many of its controls are at a deeper level than most users are interested in. Consequently, in the mid-1980s, Leslie Lamport (then at SRI near Stanford) designed the LaTeX (pronounced La-Techhh or Lah-Techhh or Lay-Techhh, but never L. A. Techhhh) document preparation system to allow authors to describe their documents in terms of higher-level concepts such as document classes, font and style packages, sectioning commands (part, chapter, section, subsection, subsubsection, paragraph, and subparagraph), figures, tables, math displays, and so on.

Donald Knuth won the ACM Turing Award in 1974, and Leslie Lamport won it in 2013. In both cases, the award is for their fundamental research in areas other than TeX and METAFONT.

The goal of LaTeX is that authors should be able to write their documents in terms of common logical elements, such as equations, figures, front matter, index entries, literature citations, sectional commands, tables, and so on, all of which are independent of the typeset appearance of the document. It is the job of document classes, packages, and styles to turn logical elements into particular typeset appearances, and most LaTeX users never need to implement, or modify, those complex software systems.

Importantly, typographic objects, such as sections, equations, figures, and tables can be automatically numbered, and with suitable symbolic labels, can be cross referenced elsewhere in the document.

Because TeX and LaTeX files contain only ordinary printable text, they can be processed with numerous software tools, including editors, e-mail systems, spell checkers, and text filters.

Remember: LaTeX markup indicates what things are, not what they look like when typeset. If you find yourself creating input that reflects visual appearance, you are probably doing something undesirable, or just plain wrong.

## TeX's input and output

TeX normally reads a single input file, say, mydoc.tex, but that file can include other input files. Its output is normally a pair of files: a .log file that summarizes typesetting progress and records error and warning messages, and a .dvi (DeVice Independent) file that is a compact representation of typesetting decisions, most of which are of the flavor move to this point on the page, and display this string. The files are related with a common basename, here, mydoc, that is available in TeX as the control word \jobname.

TeX has commands for reading and writing named files, so in more complex cases, more output files can be produced, such as for a table of contents, a list of figures, and data for preparing a document bibliography and index.

The .dvi file cannot be output or viewed without translation to the requirements of some particular device, such as a workstation window, or a printer page. Thus, helper programs, such as dvialw and dvips (for PostScript output), dvipdfm (for PDF output), and xdvi (for X Window System workstation displays), are generally required after a successful TeX run.

TeX knows nothing about character glyph shapes, only their measurements, called metrics. Consequently, it does not read font files, but only .tfm (TeX Font Metric) files. LaTeX additionally reads .fd (Font Descriptor) files that are needed to support font families of various sizes and shapes.

## Document indexing

Although TeX can write out phrases and their page numbers to a raw data file, it cannot itself turn that data into a sorted index, with duplicates eliminated, and consecutive page numbers elided. A separate program, such as makeindex or xindy, handles that complex task, outputting a formatted index file that can be read by TeX on a subsequent run to typeset the index.

## Bibliographies

As with indexes, TeX does not itself produce references to bibliographic data that usually appear in a sorted list at the end of the document, or rarely, chapter. Instead, it must write citation requests and other information to an output file that another program, such as BibTeX, reads, then finds the needed citations, and prepares a sorted list of references formatted according to a user-chosen style. On a subsequent run, TeX can then include a suitable marker at the point of the citation, and typeset the bibliography, usually near the document end.

## Graphics in figures

TeX itself contains little support for incorporating graphics files into a document: instead, it provides an escape mechanism that permits the name of a graphics file in any of several supported formats to be written to the output .dvi file, and the DVI translator program must then insert the graphic in the output display.

## Extended TeX's

Because the source code for TeX is freely available, its author expected that some people would modify it for additional typographical features. Among the several such examples are these:

• ConTeXt [high-level document structuring defined entirely on top of TeX, originally without internal modifications]
• AmSLaTeX [AmSTeX features on top of LaTeX]
• AmSTeX [significantly extended mathematics typesetting, with new math fonts, defined entirely on top of TeX, originally without internal modifications]
• eTeX [enlarged internal tables, and a few extra low-level commands]
• JadeTeX [converts XML input for TeX typesetting]
• LAmSTeX [significantly extended mathematics typesetting, plus LaTeX-influenced document structuring commands, defined entirely on top of TeX, originally without internal modifications]
• LaTeX [high-level document structuring defined entirely on top of TeX, without internal modifications]
• LuaTeX [rewriting parts of TeX's internal algorithms in a robust scripting language, Lua, that is then accessible to the document]
• PDFTeX [PDF, instead of DVI, output, with additional typographic controls, and support for Unicode, document hyperlinks, and image transparency]
• pTeX [support of Unicode and both horizontal and vertical writing directions, targeting Japanese commercial publishing market]
• TeX-XeT [support for mixed left-to-right and right-to-left typesetting]
• XeTeX [access to system-dependent Unicode fonts, important for support of all human languages and writing systems]
• XMLTeX and PDFXMLTeX [typesetting of documents in any of several XML schemes to DVI and PDF]

Warning: Because of the complexity of some of the world's writing systems, Unicode fonts require special handling by a software layer provided by the operating system. Document formatters then do not work a character at a time, but instead, they pass a word to the Unicode software layer, and get back a description of the box that holds that word, along with the details of glyph placement and linking within the word. All text display on that operating system should then produce the same output words, but because the Unicode software layer was developed independently for Apple, Microsoft, and Unix systems, there is no guarantee that the returned word dimensions are identical across those three families. That in turn means that TeX is likely to produce different line and page breaking for the same input document, unlike its behavior with other fonts.

## TeX commands

All TeX commands are visible in the input stream, using only ordinary printable characters, so documents always consist of a mixture of commands to be acted on, and text to be typeset.

TeX has a powerful mechanism for classifying input characters into different types of commands, such as begin command, begin/end group, and begin/end math, based on their category codes. Few users ever change the default codes, so for most purposes, here are the default characters that are assigned each of the 16 codes:

• \ escape character that begins a command, also called a control sequence (control symbol or control word), or a macro
• { begin group
• } end group

# ash, bash, dash, ksh, sh, zsh, ... Bourne-shell family syntax:
$TEXINPUTS=.:$HOME/tex/mymacros:
\$ export TEXINPUTS


There is more to TeX path searching than we describe here; for details, run the command texdoc texlive. That important document is available in about ten languages, so texdoc texlive-ru would display the Russian-language version.

That document also describes a useful command tool for exploring TeX's search-path algorithm, which is particularly useful when you are customizing search paths and want to ensure that the correct file is found. Here is an example:

    % kpsewhich article.cls
/usr/uumath/texlive/2018/texmf-dist/tex/latex/base/article.cls


You can also find the values of search-path variables that might not be set in the environment, but are determined by local run-time defaults:

    % kpsewhich --var-value TEXINPUTS
.:/usr/uumath/texlive/2018/texmf-local//:/usr/uumath/texlive/2018/texmf-extra//:


To find out which file types and environment variables are known to your TeX installation, run commands like these::

    % kpsewhich -help-formats
[long output]

% kpsewhich -help-formats | grep TEXINPUTS
graphic/figure:  .eps .epsi  [variables: TEXPICTS TEXINPUTS]
tex: .tex  .sty .cls .fd .aux .bbl .def .clo .ldf  [variables: TEXINPUTS]


Warning: One thing to be aware of with both cache files and recursive directory searching is that the order in which files are found is system dependent. It is most definitely not lexicographic order in general. Thus, if two or more files in a directory tree have the same name, but different contents, you cannot predict which one is found. TeX distributions normally try to avoid duplicate filenames when those files are expected to be read by TeX. The TeX .log file always records full pathnames of each file that it reads, so you can easily verify that it found the expected instance. If it did not, redefine the search path variable.

## More software for TeX, METAFONT, BibTeX, and Emacs

The binary executables directories in TeX Live distributions supply a large number of tools that users have found convenient to have on all platforms. The tool count has grown from about 250 in 2003 to almost 450 in 2018. Feel free to browse around, like this:

    % which tex
/usr/uumath/texlive/2018/bin/x86_64-linux-centos-7/tex

% ls /usr/uumath/texlive/2018/bin/x86_64-linux-centos-7 | less
[long list to page through]


In addition, in our standard local installation directories, /usr/local/bin or /usr/uumath/bin, we supply several thousand more executables that support our users. Among the TeX-related ones are chktex, detex, lacheck, and texpretty.

This author has long been a software developer and supporter of BibTeX and bibliographic databases. Among the many related tools on our systems are bibcheck, bibclean, bibdup, bibextract, bibjoin, biblabel, biblex, biborder, bibparse, bibsearch, bibsort, bibsplit, bibsql, bibtosql, bibunlex, cattobib, chkdelim, citesub, doi-to-bibtex, dw, myspell, ref2bib, and scholar. He has developed more than 400 programs that support conversion of publisher Web metadata to BibTeX, plus several extended BibTeX styles that provide richer bibliographic information, and more than 950 functions for the emacs text editor. Feel free to communicate with him if such software could be useful to you.

## Customizing TeXware limits

Most users of TeX and LaTeX never encounter a TeX capacity exceeded message. Nevertheless, large documents sometimes require bigger internal tables. Originally, their sizes were compiled into the software, which meant that large documents might have to be produced in several parts. Work by this author, Karl Berry, and other members of the TeX Live team has largely eliminated compile-time table sizes in TeXware and MFware, in favor of dynamically allocated tables whose sizes are set in default, and optional user-supplied, configuration files named texmf.cnf. In TeX Live 2018, the default file is found in the directory texmf-dist/web2c, and it contains documentation comments before each customizable value.

You can supply personal settings in a private file, texmf.cnf, in the directory with your big document. For example, this author deals with large bibliography files using assignments like these:

    hash_size.bibtex   =   50000
max_cites.bibtex   =   20000
max_strings.bibtex = 2048000


There is no need to duplicate the default settings, because they are read before your customizations.

## Updating TeX Live

The annual TeX Live releases have proven to be reliable software distributions, and many users who install them on personal machines may not even update them yearly. Nevertheless, even though TeX and METAFONT can be viewed as essentially bug free, the same may not be true for the hundreds of other programs that come with TeX Live. Also, package development is active, and there are frequent updates. You might therefore decide that you always want to have the latest stuff', in which case, you need to know how to install updates.

The first step is to choose a suitable package repository mirror. The master TeX Live site now resides in Paris, France, but there are numerous mirrors around the world that you can find listed here. That site can also alert you to problems at your chosen mirror, because it reports their status. Rarely, a mirror site may experience hardware or network problems that take it offline for an extended period, leaving it somewhat stale when it returns to service. In most cases, all should be well within a day or two.

This author's site at the University of Utah is the master North American mirror, and if that is suitably close to you network-wise, then you can make it your default source like this:

    % tlmgr option repository http://ctan.math.utah.edu/tex-archive/systems/texlive/tlnet


You could revert to the Paris repository like this:

    % tlmgr option repository http://mirror.ctan.org/systems/texlive/tlnet


Updating is now a simple two-step process. The first step installs any updates for the package-management system itself, and the second updates everything else:

    % tlmgr update --self
% tlmgr update --all


## Books about TeX and LaTeX

This author keeps an extensive BibTeX bibliography of books and other publications about TeX and METAFONT here. It records more than 300 such books.

The standard reference for TeX is Donald Knuth's The TeXbook, the first of the five-volume Computers & Typesetting series.

The standard reference for LaTeX is the second edition of Leslie Lamport's LaTeX: a Document Preparation System: User's Guide and Reference Manual . However, it was written before many support packages had been written, and does not cover them at all.

Three supplementary volumes have been produced by members of the LaTeX team, in various languages and editions: The LaTeX Companion, The LaTeX Graphics Companion, and The LaTeX Web Companion: integrating TeX, HTML and XML. A new edition of the first of those may be available by late 2019.

For a student on a low budget who just wants one book, the Kopka & Daley books called Guide to LaTeX: Tools and Techniques for Computer Typesetting, available in German and English editions, are a good start. They cover core LaTeX, and a small set of useful LaTeX packages.

Mathematicians and scientists should also get the most recent edition of George Grätzer's excellent More Math Into LaTeX. It is a treasure trove for typesetting complex mathematical displays.

Several LaTeX packages support the preparation of lectures slides and Herbert Voß's book Presentations with LaTeX is a useful guide.

Finally, you really do need an authoritative writing style guide. There are several such books, listed in the Handbooks on writing section of this author's Typesetting and Writing Hints for Theses and Dissertations .