Interruptable gets with wget (Unix Power Tools, 3rd Edition)

start page | rating of books | rating of authors | reviews | copyrights

40.7. Interruptable gets with wget

The GNU utility wget can be used to access files through the Internet using HTTP, HTTPS, or FTP. The best thing about the utility is that if the process is interrupted and started again, it continues from where it left off.

Go to http://examples.oreilly.com/upt3 for more information on: wget

The wget utility is installed by default in a lot of systems, but if you can't find it, it can be downloaded from GNU, at http://www.gnu.org/software/wget/wget.html.

The basic syntax for wget is very simple: type wget followed by the URL of the file or files you're trying to download:

wget http://www.somefile.com/somefile.htm
wget ftp://www.somefile.com/somefile

The file is downloaded and saved and a status is printed out to the screen:

--16:51:58--  http://dynamicearth.com:80/index.htm
           => `index.htm'
Connecting to dynamicearth.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 9,144 [text/html]

    0K -> ........                                               [100%]

16:51:58 (496.09 KB/s) - `index.htm' saved [9144/9144]

The default use of wget downloads the file into your current location. If the download is interrupted, by default wget does not resume at the point of interruption. You need to specify an option for this behavior. The wget options can be found in Table 40-2. Short and long forms of each option are specified, and options that don't require input can be grouped together:

> wget -drc URL

For those options that do require an input, you don't have to separate the option and the input with whitespace:

> wget -ooutput.file URL

Table 40-2. wget options

Option	Purpose	Examples
`-V`	Get version of wget	wget -V
`-h` or `--help`	Get listing of wget options	wget -help
`-b` or `--background`	Got to background after start	wget -b `url`
`-e` or `--execute=``COMMAND`	Execute command	wget -e COMMAND `url`
`-o` or `--output-file=``file`	Log messages to file	wget -o `filename url`
`-a` or `--append-output=``file`	Appends to log file	wget -a `filename url`
`-d` or `--debug`	Turn on debug output	wget -d `url`
`-q` or `--quiet`	Turn off wget's output	wget -q `url`
`-v` or `--verbose`	Turn on verbose output	wget -v `url`
`-nv` or `-non-verbose`	Turn off verbose output	wget -nv `url`
`-i` or `--input-file=``file`	Read urls from file	wget -I `inputfile`
`-F` or `--force-html`	Force input to be treated as HTML	wget -F `url`
`-t` or `--tries=``number`	Number of re-tries to get file	wget -t 3 `url`
`-O` or -`-output-document=``file`	Forces all documents into specified	wget -O `savedfile` `-i inputfile`
`-nc` or `--no-clobber`	Don't clobber existing file	wget -nc `url`
`-c` or `--continue`	Continue getting file	wget -c `url`
`--dot-style=``style`	Retrieval indicator	wget -dot-style=binary `url`
`-N` or `--timestamping`	Turn on time-stamping	wget -N `url`
`-S` or `--server-response`	Print HTTP headers, FTP responses	wget -S `url`
`--spider`	Wget behaves as a web spider, doesn't download	wget --spider `url`
`-T` or `--timeout=``seconds`	Set the time out	-wget -T 30 `url`
`-w` or `--wait=``seconds`	Wait specified number of seconds	wget -w 20 `url`
`-Y` or `--proxy=``on/off`	Turn proxy on or off	wget -Y on `url`
`-Q` or `--quota=``quota`	Specify download quota size	wget -Q2M `url`
`-nd` or `--no-directories`	Do not create directories in recursive download	wget -nd `url`
`-x` or -- force-directories	Opposite of -nd	wget -x `url`
`-nh` or `--no-host-directories`	Disable host-prefixed directories	wget -nh `url`
`--cut-dirs=``number`	Ignore number directories	wget -cur-dirs=3 `url`
`-P` or `--directory-prefix=``prefix`	Set directory to prefix	wget -P test `url`
`--http-user=``user` `--http-passwd=``passwd`	Set username and password	wget --http-user=`user` `--http-passwd=password url`

-- SP


40.6. Installing Software with Debian's Apt-Get		40.8. The curl Application and One-Step GNU-Darwin Auto-Installer for OS X