45

What's the Netscape format of wget's cookies.txt? I need to mirror a website that requires login. I use a Chrome extension that returns cookies in that format, I save them in cookies.txt, import with wget command but to no use, it just downloads the content like I'm not logged in at all.

I appreciate any help.

geckon
  • 105
Zarko Djuric
  • 593
  • 1
  • 4
  • 6

3 Answers3

67

The format is Netscape format as stated in the man page and this format is:

The layout of Netscape's cookies.txt file is such that each line contains one name-value pair. An example cookies.txt file may have an entry that looks like this:

.netscape.com TRUE / FALSE 946684799 NETSCAPE_ID 100103

Each line represents a single piece of stored information. A tab is inserted between each of the fields.

From left-to-right, here is what each field represents:

domain - The domain that created AND that can read the variable.

flag - A TRUE/FALSE value indicating if all machines within a given domain can access the variable. This value is set automatically by the browser, depending on the value you set for domain.

path - The path within the domain that the variable is valid for.

secure - A TRUE/FALSE value indicating if a secure connection with the domain is needed to access the variable.

expiration - The UNIX time that the variable will expire on. UNIX time is defined as the number of seconds since Jan 1, 1970 00:00:00 GMT.

name - The name of the variable.

value - The value of the variable.

(From "The Unofficial Cookie FAQ", edited for clarity)

Venning
  • 165
ETL
  • 831
6

The Netscape cookies file format for each data line is as above, but you won't be able to read it in with HTTP::Cookies::Netscape unless it has a header line like this, which the complete file format requires:

# Netscape HTTP Cookie File

or this:

# HTTP Cookie File
kenorb
  • 20,988
  • 4
    this is such black voodoo... do you have a reference to where in docs it is mentioned? it's not in curl docs (https://curl.haxx.se/docs/http-cookies.html) and none of the pages linked in there... (it did solve my issue though! So definitely needed like you said!) – Aviel Gross Apr 29 '20 at 17:28
  • I probably had a reference when I wrote that 4 years ago, but I don't remember it now. I haven't used Firefox for any web scraping since the great plugin massacre of Firefox 57. – Phil Goetz Apr 30 '20 at 18:37
  • 3
    For others wondering, this comment header is also required for youtube-dl to accept a cookie file with --cookies mycookies.txt – Someguy123 Nov 13 '20 at 10:38
5

One way of getting cookies for wget is to use the --keep-session-cookies options of wget.

For example :

wget --keep-session-cookies --save-cookies cookies.txt "http://MYSITE/?__login=USER&__password=PASS"

The ?__login etc depends on the web site you're trying to mirror, you might have to look at how the authentication form works.

Then you can use :

wget --mirror --load-cookies cookies.txt http://MYSITE/