Apple by default rejects the html file download. I used the commands that you had specified in my machine. If you carefully look at the output, you will get something like this.
Loading robots.txt; please ignore errors.
--2014-05-24 10:43:50-- https://itunes.apple.com/robots.txt
Connecting to itunes.apple.com|23.206.210.217|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 234 [text/plain]
Saving to: `robots.txt'
So as per this answer we can ignore the robots.txt file by using -e robots=off
in the command.
Wget
by default honours the robots.txt standard for crawling
pages, just like search engines do, and for archive.org, it disallows
the entire /web/ subdirectory. To override, use -e robots= off
,
So, I modified your command to add -e robots= off
and when I ran the command again, I got the below output.
Connecting to itunes.apple.com|23.204.162.217|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `id854897303?mt=12.html'
[ <=> ] 33,456 --.-K/s in 0.001s
2014-05-24 10:48:38 (30.1 MB/s) - `id854897303?mt=12.html' saved [33456]
Removing id854897303?mt=12.html since it should be rejected.
As you can see, the file download is prevented by apple and we cannot do anything about it.
EDIT: Even without -e robots=off
, we are not able to download the html file. It is saying rejected with your original wget too. So, I suspect apple is disallowing wget
downloads.
--user-agent
switch, to trick the server into believing this is a normal web browser. – goldilocks May 24 '14 at 16:30