CHECKSUM 1L "19 June 1992"

Table of contents


NAME

checksum - install, or validate, file checksum

SYNOPSIS

checksum <infile >outfile

checksum infile >outfile

checksum infile outfile

checksum -v < infile

checksum -v infile

checksum computes a 16-bit cyclic redundancy checksum (CRC) for a file, as well as counts of the words, lines and characters.

With no filenames on the command line, stdin and stdout are assumed. With one filename, input is from that file, with output to stdout. With two filenames, input is from the first, and output to the second.

If the checksum has previously been installed in the input file, and the input file has not been corrupted since then, the output file will be identical to the input file.

With the -v option, only an input file is expected, and it may come from the command line, or from stdin. checksum will then verify whether the checksum embedded in the file is correct or not. A zero status code is returned for a correct checksum, and a non-zero one otherwise; in UNIX, the status code may be conveniently tested in shell scripts. In either case, an informative message is printed on stdout.


DESCRIPTION

checksum will search for the first line of its input which contains the word checksum in lowercase and no other alphabetic characters. We refer to this line as the ``critical line'' of the file. If the critical line contains no quotation marks, then the output file created is a copy of the input file, except that the critical line is replaced by a line where the word checksum is replaced by checksum = "xxxxx lc wc cc". Here, lc is the number of lines in the output file, written in decimal. Similarly, wc is the word count of the output file, and cc is the character count when the end of line character is taken to be the single character ASCII newline (octal 012).

For many text files, it is possible to hide the ``critical line'' in a comment near the beginning of the file.

It is difficult to arrange that a file contains its own checksum. Instead, the field xxxxx contains the checksum, written in decimal in a five-digit field (with possible leading 0's) of the file obtained from the output file by replacing the field containing the checksum by the string ZZZZZ.

If the critical line already contains after the word ``checksum'' precisely two quotation marks, and the first is the last character of the four-character string `` = "'' (i.e. <blank><equals><blank><quotation mark>) then the material between the two quotation marks will be deleted and replaced by a checksum and three counts as described above.

While the counts of words, characters, and lines could be obtained by the UNIX wc(1) utility, that information is still not sufficient to detect character substitutions, or transpositions of characters, lines, and words. The CRC-16 checksum remedies that, since the resulting checksum depends on the order and value of every single byte in the file.

checksum is intended to support the reliable exchange of text files between different computers, even ones with different operating systems. Thus, the newline character sequence that terminates each line is treated as if it were an ASCII newline (linefeed) character, even though it may be a carriage return, a carriage return and a line feed, or simply an end-of-record condition in the file, depending on the operating system and file type. The file checksum is therefore independent of the particular representation of end-of-line.

Although UNIX systems have a file checksum utility, sum(1), the result it produces differs between UNIX variants, and in any event, it is neither publicly available for porting to other systems, nor independent of the end-of-line representation. checksum is freely available.


SEE ALSO

sum(1), wc(1).

AUTHOR

Robert M. Solovay

Department of Mathematics

University of California

Berkeley, CA, USA

Tel: (415) 642-2252

Email: solovay@math.berkeley.edu