Downloading files using wget

Question

I am trying to download files from this website.

The URL is: http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

When I use this command:

wget http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

I get only index.html?acc=GSE48191 which is some kind of binary format.

How can I download the files from this HTTP site?

Qeole · Answer 1 · 2014-07-22T17:17:08.710

27

I think your ? gets interpreted by shell (Correction by vinc17: more likely, it's the & which gets interpreted).

Just try with simple quotes around your URL:

wget 'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Note that the file you are requesting is a .tar file but the above command will save it as index.html?acc=GSE48191&format=file. To have it correctly named, you can either rename it to .tar:

mv 'index.html?acc=GSE48191&format=file' GSE4819.tar

Or you can give the name as an option to wget:

wget -O GSE48191.tar 'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

The above command will save the downloaded file as GSE48191.tar directly.

edited Jul 22 '14 at 17:17

answered Jul 22 '14 at 16:46

Qeole

694

It gets downloaded but it is not even a directory. If you look at the link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48191 , you can see there are multiple .gz files. I still can't access them?? – user3138373 Jul 22 '14 at 16:57
I suppose that the OP uses a shell that ignores ? as a wildcard since nothing matches. The main problem is &: this will run the part that precedes (thus with an incomplete URL) in the background. But the solution is the same: to quote the URL. – vinc17 Jul 22 '14 at 17:07
Thanks to you terdon and vinc for edit/corrections. @user3138373: I can't find your .gz files on provided links, could you please tell again what URL you use to see/access them? – Qeole Jul 22 '14 at 17:10
1

@user3138373 the file you download is an archive (.tar file) that contains the .gz files. Once you have downloaded it, run tar xvf GSE4819.tar to expand the archive and access the files. – terdon Jul 22 '14 at 17:25

score 3 · Answer 2 · edited Jul 22 '14 at 22:07

3

Another way that might possibly work is by using this command:

wget -O nameOfTar.tar "http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file"

The -O command will specify the name to download to.

Of course, your initial problem is because the "&" was being interpreted by the shell, surrounding the URL with double quotes fixes the issue.

edited Jul 22 '14 at 22:07

answered Jul 22 '14 at 17:02

ryekayo

4,763

2

-O option is used to specify the name of the file in which dowloaded data is saved. It has no incidence on downloaded data (maybe that's what you meant, but I found it unclear). – Qeole Jul 22 '14 at 17:16
Yes sorry, I will make my correction – ryekayo Jul 22 '14 at 17:17
I'm not sure why this got downvoted. – ryekayo Jul 22 '14 at 17:51
3

I did not downvote, but that's probably because your solution does not fix problem: & is interpreted by shell, and download of .tar file will fail. – Qeole Jul 22 '14 at 17:54

score 1 · Answer 3 · answered Jan 14 '21 at 13:30

None of these answers worked for me.

However, you can find GSE* folders within the NCBI ftp page:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/

You can then copy the link address from that file and just do a simple wget:

wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

Samman Bikram Thapa · Answer 4 · 2015-07-22T04:43:02.643

0

wget -O "name-you-want-to-save-as.format" http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file

That should get you the file you want to download to the current directory you are in.

edited Jul 22 '15 at 04:43

answered Jul 19 '15 at 17:39

Samman Bikram Thapa

111

wget: missing URL is what wget replies to that, because you are missing the argument to -O. Also, I think this probably doesn't solve the OP's problem anyway. – Celada Jul 19 '15 at 18:00
Because the URL contains &, this answer doesn't work unless you add "" or '' around the URL. – Aaron Franke Jan 08 '18 at 02:33

score 0 · Answer 5 · answered Nov 06 '18 at 04:43

From $ curl -G http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191">here</a>.</p>
</body></html>

So you need to do

wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

Notice the "s" after http. I tried it myself and it worked just fine.

score 0 · Answer 6 · answered Mar 11 '21 at 04:11

What would help better is giving the page you got the link from which is: https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191

Now with that page the clickable link is: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

So use wget with the link is: wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48191/suppl/GSE48191_RAW.tar

Downloading files using wget

6 Answers6

Linked

Related