so I am trying to wget a specific webpage using this command in bash scripting:
wget --no-cookies --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -O $2/content.html $1
And the result is that I get the bot page from the website because wget is reusing the existing connection (I think). This command was working before I spam tested and now my server is getting a bot test redirect from the site (can't use this).
--2017-12-12 19:16:42-- https://www.kayak.co.uk/h/bots/human-redirect.vtl?url=%2Fflights%2FDUB-LAX%2F2018-06-04%2F2018-06-25%2F2adults%3Fsort%3Dbestflight_a
Reusing existing connection to [www.kayak.co.uk]:443.
HTTP request sent, awaiting response... 200 OK
My question is: is there anyway to stop wget from using the existing connection and reconnect the site to download each time?
wget
is not (can not) reusing an old connection. Your previous attempts probably triggered a system on the website that now thinks you are a bot and redirects you to a page probably letting you know about that and maybe giving you a captcha to solve the situation. Try to open the page in your browser to read it. It is probably your IP that has now been considered a bot, so no new wget commands could change the result (you may try changing the headers but this could also make the situation even worse and ban you for real) – Patrick Mevzek Dec 12 '17 at 19:31