checksum infile >outfile
checksum infile outfile
checksum -v < infile
checksum -v infile
checksum computes a 16-bit cyclic redundancy checksum (CRC) for a file, as well as counts of the words, lines and characters.
With no filenames on the command line, stdin and stdout are assumed. With one filename, input is from that file, with output to stdout. With two filenames, input is from the first, and output to the second.
If the checksum has previously been installed in the input file, and the input file has not been corrupted since then, the output file will be identical to the input file.
With the -v option, only an input file is expected, and it may come from the command line, or from stdin. checksum will then verify whether the checksum embedded in the file is correct or not. A zero status code is returned for a correct checksum, and a non-zero one otherwise; in UNIX, the status code may be conveniently tested in shell scripts. In either case, an informative message is printed on stdout.
For many text files, it is possible to hide the ``critical line'' in a comment near the beginning of the file.
It is difficult to arrange that a file contains its own checksum. Instead, the field xxxxx contains the checksum, written in decimal in a five-digit field (with possible leading 0's) of the file obtained from the output file by replacing the field containing the checksum by the string ZZZZZ.
If the critical line already contains after the word ``checksum'' precisely two quotation marks, and the first is the last character of the four-character string `` = "'' (i.e. <blank><equals><blank><quotation mark>) then the material between the two quotation marks will be deleted and replaced by a checksum and three counts as described above.
While the counts of words, characters, and lines could be obtained by the UNIX wc(1) utility, that information is still not sufficient to detect character substitutions, or transpositions of characters, lines, and words. The CRC-16 checksum remedies that, since the resulting checksum depends on the order and value of every single byte in the file.
checksum is intended to support the reliable exchange of text files between different computers, even ones with different operating systems. Thus, the newline character sequence that terminates each line is treated as if it were an ASCII newline (linefeed) character, even though it may be a carriage return, a carriage return and a line feed, or simply an end-of-record condition in the file, depending on the operating system and file type. The file checksum is therefore independent of the particular representation of end-of-line.
Although UNIX systems have a file checksum utility, sum(1), the result it produces differs between UNIX variants, and in any event, it is neither publicly available for porting to other systems, nor independent of the end-of-line representation. checksum is freely available.
Department of Mathematics
University of California
Berkeley, CA, USA
Tel: (415) 642-2252
Email: solovay@math.berkeley.edu