How can you know if a file has been corrupted -- by accident or by a malicious user? You can check the number of characters with ls -l (Section 50.2), but the corrupted file could have the same number of characters, just some different ones. You can check the last-modification date (Section 8.2), but that's easy to change, to any time you want, with touch. And, of course, you can read through the file, unless it's a binary (nonprintable) file or it's just too long.
Go to http://examples.oreilly.com/upt3 for more information on: md5sum
The easy way is to compute a checksum -- an electronic fingerprint or message digest -- that identifies the file at a time you know it's correct. Save that checksum in a secure place (on an unwritable CD-ROM, on a filesystem with write protection disabled in hardware, or just on a piece of paper). Then, when you want to verify the file, recompute the checksum and compare it to the original. That's just what the md5sum utility does.
md5sum is a more secure version of the earlier Unix sum program, and it's also handier to use. By default, you give md5sum a list of pathnames; it will write checksums to its standard output. Later, use the md5sum -c ("check") option to compare the files to their checksums. The first command below calculates checksums for some gzipped tar archives and saves it in a temporary file. (If we were doing this "for real," I'd copy that temporary file someplace more secure!) The second command shows the file. The third command compares the files to their stored checksums:
$ md5sum *.tar.gz > /tmp/sums.out $ cat /tmp/sums.out 018f4aee79e049095a7b16ed1e7ec925 linux-ar-40.tar.gz 52549f8e390db06f9366ee83e59f64de nvi-1.79.tar.gz 856b4af521fdb78c978e5576f269c1c6 palinux.tar.gz 61dcb5614a61bf123e1345e869eb99d4 sp-1.3.4.tar.gz c22bc000bee0f7d6f4845eab72a81395 ssh-1.2.27.tar.gz e5162eb6d4a40e9e90d0523f187e615f vmware-forlinux-103.tar.gz ...sometime later, maybe... $ md5sum -c /tmp/sums.out linux-ar-40.tar.gz: OK nvi-1.79.tar.gz: OK palinux.tar.gz: OK sp-1.3.4.tar.gz: OK ssh-1.2.27.tar.gz: OK vmware-forlinux-103.tar.gz: OK $ echo $?
If all the files match, md5sum returns an exit status of 0. Files that don't match give a FAILED message and a nonzero exit status.
The exit status -- as well as the options -- status (no output, only return statuses) and -w (warn if the checksum line is improperly formatted) -- can help you set up an automated checking system. Some software downloading and distribution systems, like RPM (Section 40.11), can do this for you (although in automated systems, it's worth thinking about the integrity of the checksum: does it come from a system you can trust?). If you're a system administrator, look into Tripwire, a tool for tracking MD5 checksums of lots of files on your system.
-- JP
Copyright © 2003 O'Reilly & Associates. All rights reserved.