As I was working on this book, I was constantly editing lots of random files all through a directory tree. I archived some of the files in a revision control system (Section 39.4), but those archives, as well as the nonarchived files, still would be vulnerable if my disk crashed. (And naturally, close to a deadline, one hard disk started making whining noises...)
The answer I came up with was easy to use and simple to set up. It's a script named ptbk, and this article explains it. To run the script, I just type its name. It searches my directory tree for files that have been modified since the last time I ran ptbk. Those files are copied into a dated compressed tar archive and copied to a remote system using scp. The process looks like this:
$ ptbk upt/upt3_changes.html upt/BOOKFILES upt/art/0548.sgm upt/art/1420.sgm upt/art/1430.sgm upt/art/0524.sgm upt/BOOKIDS upt/ulpt3_table Now copying this file to bserver: -rw-rw-r-- 1 jpeek 323740 Jan 3 23:08 /tmp/upt-200101032308.tgz upt-200101032308.tgz | 316 KB | 63.2 kB/s | ETA: 00:00:00 | 100%
The script actually doesn't copy all of the files in my directory tree. I've set up a tar exclude file that makes the script skip some files that don't need backing up. For instance, it skips any filename that starts with a comma (,). Here's the file, named ptbk.exclude:
upt/ptbk.exclude upt/tarfiles upt/gmatlogs upt/drv-jpeek-jpeek.ps upt/drv-jpeek.3l upt/BOOKFILES~ upt/ch*.ps.gz upt/ch*.ps upt/,* upt/art/,*
After the script makes the tar file, it touches a timestamp file named ptbk.last. The next time the script runs, it uses find -newer (Section 9.8) to get only the files that have been modified since the timestamp file was touched.
The script uses scp and ssh-agent to copy the archive without asking for a password. You could hack it to use another method. For instance, it could copy using rcp (Section 1.21) or simply copy the file to another system with cp via an NFS-mounted filesystem (Section 1.21).
This doesn't take the place of regular backups, if only because re-creating days' worth of work from the little individual archives would be tedious. But this system makes it painless to take snapshots, as often as I want, by typing a four-letter command. Here's the ptbk script:
|| Section 35.14, '...' Section 28.14
#!/bin/sh # ptbk - back up latest UPT changes, scp to $remhost dirbase=upt dir=$HOME/$dirbase timestamp=$dir/ptbk.last # the last time this script was run exclude=$dir/ptbk.exclude # file with (wildcard) pathnames to skip remhost=bserver # hostname to copy the files to remdir=tmp/upt_bak/. # remote directory (relative to $HOME) cd $dir/.. || exit # Go to parent directory of $dir datestr=`date '+%Y%m%d%H%M'` outfile=/tmp/upt-$datestr.tgz # Don't send vim recovery files (.*.swp): tar czvlf $outfile -X $exclude \ `find $dirbase -type f -newer $timestamp ! -name '.*.swp' -print` mv -f $timestamp $dir/,ptbk.last echo "Timestamp file for $0. Don't modify." > $timestamp echo "Now copying this file to $remhost:" ls -l $outfile scp $outfile ${remhost}:${remdir}
If the copy fails (because the remote machine is down, for instance), I have to either copy the archive somewhere else or wait and remember to copy the archive later. If you have an unreliable connection, you might want to modify the script to touch the timestamp file only if the copy succeeds -- at the possible cost of losing a data file that was modified while the previous archive was (not?) being transferred to the remote host.
-- JP
Copyright © 2003 O'Reilly & Associates. All rights reserved.