$ wget --random-wait -r -p -e robots=off -U Mozilla \
http://web.archive.org/web/20110726051510/http://feedparser.org/docs/
Downloads recursively the content of the url.
--random-wait - wait between 0.5 to 1.5 seconds between requests.
-r - turn on recursive retrieving.
-e robots=off - ignore robots.txt.
-U Mozilla - set the "User-Agent" header to "Mozilla". Though a better choice is a real User-Agent like "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)".
Some other useful options are:
--limit-rate=20k - limits download speed to 20kbps.
-o logfile.txt - log the downloads.
-l 0 - remove recursion depth (which is 5 by default).
--wait=1h - be sneaky, download one file every hour.
10
to lower number, but it's hard to guess. Now there is a fileintroduction.html
,introduction.html.1
,introduction.html.2
and I rather ended the process. – xralf Nov 25 '11 at 18:14--mirror
option for the links to direct to the filesystem? – xralf Nov 25 '11 at 18:16-nd
, so differentindex.html
s are put in the same directory, and without-k
, you'll not get rewriting of the links. – Ulrich Schwarz Nov 25 '11 at 18:35