wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
The file you are downloading is a tar
archive (a binary file), provided by a dynamic link from a web server. wget
would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).
However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget
is able to use if you use the --content-disposition
option. This option is marked "experimental" in my manual for wget
.
You also need to quote the URL so that the shell does not interpret the &
and ?
characters in it.
The equivalent thing using curl
:
curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Or, using the equivalent long options:
curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Once you have downloaded the file, you need to unpack it:
tar -xvf GSE48191_RAW.tar
Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip
-compressed CEL
files.
--trust-server-names
argument towget
- – ivanivan Sep 26 '17 at 17:11