Analysis
I agree with other answers saying there is no way for tar
to seek the compressed archive. To find the file(s) you're after, the tool needs to process the archive from the beginning and not to skip anything.
However with GNU tar
you don't necessarily need to process it to the end. Consider this scenario when creating an archive:
Supposing you change the file blues
and then append the changed version to collection.tar
. […], the original blues
is in the archive collection.tar
. If you change the file and append the new version of the file to the archive, there will be two copies in the archive. When you extract the archive, the older version of the file will be extracted first, and then replaced by the newer version when it is extracted.
(source)
This means, when extracting a specific file, tar
keeps processing the archive even after it extracts the file, because maybe another copy is later in the archive.
But then:
If you wish to extract the first occurrence of the file blues
from the archive, use --occurrence
option
(ibid.)
If you are sure the file you're after occurs exactly one time in the archive, use tar --occurrence
and tar
will stop after extracting the file. Then your wget
will abort due to SIGPIPE
, it won't download the rest of the archive in vain.
Limited usefulness
Note this is not really useful in your exact case because phoenix/S6/zl548/MegaDepth_v1/0000
is a directory (right?). While extracting the directory with --occurrence
, tar
won't stop early, unless it encounters another entry for the directory itself. The reason is: there can always be a unique phoenix/S6/zl548/MegaDepth_v1/0000/foo
at the very end of the archive. Before tar
gets to the end, it cannot be sure the directory with all its content is complete.
Still if you were after one or few non-directories, if you knew the path(s) and if you knew there is exactly one instance of each in the archive, then --occurrence
would allow you to download as little of the archive as necessary. If you were lucky and the file(s) happened to be near the beginning of the archive, then --occurrence
would make a significant difference.
Probably this answer won't help you much. It's for users who can provide a list of non-directories.
Unless…
If you saved the output of wget -qO- … | tar -tz
(when you most likely downloaded and processed the whole archive, and threw it away), you would now be able to provide a list of non-directories (possibly using --files-from=
or --verbatim-files-from
; especially useful if the list is too long for a single command line). In this case --occurrence
may work for you. Additionally the saved output of tar -t
would allow you to confirm that each non-directory you're after occurs exactly once in the archive, so you would know --occurrence
won't make you miss an updated version.
The above assumes MegaDepth_v1.tar.gz
on the server does not change. In general (if the archive may have changed) your saved output of tar -t
may be no longer valid.
Let's assume you can create a list of non-directories to extract. The list must not specify any directory explicitly, or else --occurrence
won't help you. Still tar
will create necessary directories, but only for the purpose of placing non-directories in them, not because it will really extract the directories from the archive. In other words: archive members for the directories themselves won't matter. This means directories will be created, but options like --preserve-permissions
won't apply to them.
Proof of concept
I used your first command (the one with tar -t
) and found out that phoenix/S6/zl548/MegaDepth_v1/0162/dense0/depths/16384199365_2b34b42cf4_b.h5
is a non-directory near the beginning of the archive. This pipeline:
wget -qO- https://www.cs.cornell.edu/projects/megadepth/dataset/Megadepth_v1/MegaDepth_v1.tar.gz \
| tar -xvz phoenix/S6/zl548/MegaDepth_v1/0162/dense0/depths/16384199365_2b34b42cf4_b.h5
extracts the file and continues (I can Ctrl+c); but this one:
wget -qO- https://www.cs.cornell.edu/projects/megadepth/dataset/Megadepth_v1/MegaDepth_v1.tar.gz \
| tar --occurrence -xvz phoenix/S6/zl548/MegaDepth_v1/0162/dense0/depths/16384199365_2b34b42cf4_b.h5
extracts the file and terminates automatically.