I'm trying to download a directory using a recursive wget command
wget -m -nH --cut-dirs=5 https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/
This works for some of the files, but also outputs a flurry of 403 Forbidden errors such as
--2023-06-13 08:43:51-- https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
Reusing existing connection to data.darts.isas.jaxa.jp:443.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-13 08:43:51 ERROR 403: Forbidden.
However, if I try to download these files individually, it works
wget -m -nH --cut-dirs=5 https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
--2023-06-13 09:06:44-- https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
Resolving data.darts.isas.jaxa.jp (data.darts.isas.jaxa.jp)... 133.74.198.108
Connecting to data.darts.isas.jaxa.jp (data.darts.isas.jaxa.jp)|133.74.198.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1382 (1.3K)
Saving to: ‘ck/SEL_M_200710_S_V03.lbl’
ck/SEL_M_200710_S_V03.lb 100%[================================>] 1.35K --.-KB/s in 0s
2023-06-13 09:06:44 (18.3 MB/s) - ‘ck/SEL_M_200710_S_V03.lbl’ saved [1382/1382]
FINISHED --2023-06-13 09:06:44--
Total wall clock time: 0.7s
Downloaded: 1 files, 1.3K in 0s (18.3 MB/s)
I have tried:
-e robots=off
--user-agent=Mozilla/5.0
--trust-server-names
- Looked at the request header through Chrome developer tools for a single file. There is no cookie and no referer that I can identify.
GET /pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200711_D_V03.BC HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: data.darts.isas.jaxa.jp
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
sec-ch-ua-mobile: ?0
By the way, these urls are from Data ARchives and Transmission System (DARTS) which archives high-level data products obtained by JAXA's (Japan Aerospace Exploration Agency) space science missions. It is meant for public download of these data products and should not have any authentication requirements.
Resources used
--no-http-keep-alive
. – berndbausch Jun 14 '23 at 02:50--no-http-keep-alive
is the solution. If you want to write an answer, I'll accept it. Thank you. – 2cents Jun 14 '23 at 13:58