CHECKSUM 1L "06 May 1994" "Version 1.06"

Table of contents


checksum - install, or validate, file checksum


checksum <infile >outfile
checksum infile >outfile
checksum infile outfile
checksum -c < infile
checksum -c infile
checksum -v < infile
checksum -v infile

checksum computes a 16-bit cyclic redundancy checksum (CRC) for a file, as well as counts of the words, lines and characters.

With no filenames on the command line, stdin and stdout are assumed. With one filename, input is from that file, with output to stdout. With two filenames, input is from the first, and output to the second.

If the checksum has previously been installed in the input file, and the input file has not been corrupted since then, the output file will be identical to the input file.

With the -c option, only an input file is expected, and it may come from the command line, or from stdin. checksum will then compute a new checksum, and then output on stdout only the new checksum line. This may be convenient for programmable editors to update a file checksum.

With the -v option, only an input file is expected, and it may come from the command line, or from stdin. checksum will then verify whether the checksum embedded in the file is correct or not. A zero status code is returned for a correct checksum, and a non-zero one otherwise; in UNIX, the status code may be conveniently tested in shell scripts. In either case, an informative message is printed on stdout.


checksum will search for the first line of its input which contains the word checksum in lowercase and no other alphabetic characters. We refer to this line as the ``critical line'' of the file. If the critical line contains no quotation marks, then the output file created is a copy of the input file, except that the critical line is replaced by a line where the word checksum is replaced by checksum = "xxxxx lc wc cc". Here, lc is the number of lines in the output file, written in decimal. Similarly, wc is the word count of the output file, and cc is the character count when the end of line character is taken to be the single character ASCII newline (octal 012).

For many text files, it is possible to hide the ``critical line'' in a comment near the beginning of the file.

It is difficult to arrange that a file contains its own checksum. Instead, the field xxxxx contains the checksum, written in decimal in a five-digit field (with possible leading 0's) of the file obtained from the output file by replacing the field containing the checksum by the string ZZZZZ.

If the critical line already contains after the word ``checksum'' precisely two quotation marks, and the first is the last character of the four-character string `` = "'' (i.e. <blank><equals><blank><quotation mark>) then the material between the two quotation marks will be deleted and replaced by a checksum and three counts as described above.

While the counts of words, characters, and lines could be obtained by the UNIX wc(1) utility, that information is still not sufficient to detect character substitutions, or transpositions of characters, lines, and words. The CRC-16 checksum remedies that, since the resulting checksum depends on the order and value of every single byte in the file.

checksum is intended to support the reliable exchange of text files between different computers, even ones with different operating systems. Thus, the newline character sequence that terminates each line is treated as if it were an ASCII newline (linefeed) character, even though it may be a carriage return, a carriage return and a line feed, or simply an end-of-record condition in the file, depending on the operating system and file type. The file checksum is therefore independent of the particular representation of end-of-line.

Although UNIX systems have a file checksum utility, sum(1), the result it produces differs between UNIX variants, and in any event, it is neither publicly available for porting to other systems, nor independent of the end-of-line representation. checksum is freely available.


sum(1), wc(1).


Robert M. Solovay
Department of Mathematics
University of California
Berkeley, CA, USA
Tel: +1 415 642 2252

Amiga support and many typographical formatting improvements:

Andreas Scherer
Abt Wolf Strasse 17
96215 Lichtenfels
Tel: (0 95 71) 2013

General maintenance for the TeX Users Group:

Nelson H. F. Beebe
Center for Scientific Computing
Department of Mathematics
University of Utah
Salt Lake City, UT 84112
Tel: +1 801 581 5254
FAX: +1 801 581 4148
Email: (Internet)


The master source distribution for checksum is maintained on the Internet archive host (University of Utah, Salt Lake City, UT, USA) in the anonymous ftp directory path /pub/tex/checksum. Copies should be mirrored from there to the Comprehensive TeX Archive Network hosts, which include at least these machines: (Aston University, Birmingham, UK), (University of Stuttgart, Stuttgart, Germany), and (Sam Houston State University, Huntsville, Texas, USA). On each of these machines, you should be able to locate this software in the archives with the ftp command quote site index checksum.