Running with -d
shows what's going on:
Location: http://blogs.gamefilia.com/lord-areg [following]
....
Deciding whether to enqueue "http://blogs.gamefilia.com/lord-areg".
Going to "" would escape "lord-areg" with no_parent on.
Decided NOT to load it.
Redirection "http://blogs.gamefilia.com/lord-areg" failed the test.
The redirected page was outside the area specified, so although it was retrieved, its contents aren't followed when recursing.
Removing the final /
means there's no redirection, but as you found, also means wget doesn't treat lord-areg
as a directory, and uses the previous /
, so the whole site matches:
Note that, for HTTP (and HTTPS), the trailing slash is very important to ‘--no-parent’. HTTP has no concept of a “directory”—Wget relies on you to indicate what’s a directory and what isn’t. In ‘http://foo/bar/’, Wget will consider ‘bar’ to be a directory, while in ‘http://foo/bar’ (no trailing slash), ‘bar’ will be considered a filename (so ‘--no-parent’ would be meaningless, as its parent is ‘/’).
(4.3 Directory Based Limits)
So you need to restrict the results in some other manner. -I lord-areg
nearly works, but will skip pages of the form /lord-areg?page=1
. To match those too, describe the required URLs in more detail:
--accept-regex '^http:\/\/blogs\.gamefilia\.com\/lord-areg[?/]'
wget --recursive -e robots=off --no-parent http://blogs.gamefilia.com/lord-areg/
and it isn't working either. – 830b et May 10 '16 at 01:46/
. The initial redirection makes a difference (for me) but I haven't yet worked out why. – JigglyNaga May 10 '16 at 20:14/
wget tries to mirror the entire gamefilia site, and it's huge... I'm sure there is a way to download individual blogs, but I just can't figure it out. Thank you very much anyways. – 830b et May 11 '16 at 01:07-I lord-areg
as well as omitting the trailing/
. See the Note under--no-parent
in the wget manual. – JigglyNaga May 11 '16 at 07:17