libarchive
's bsdtar
can handle most of those file formats, so you could do:
find . \( -name '*.zip' -o \
-name '*.tar' -o \
-name '*.tar.gz' -o \
-name '*.tar.bz2' -o \
-name '*.tar.xz' -o \
-name '*.tgz' -o \
-name '*.tbz2' -o \
-name '*.7z' -o \
-name '*.iso' -o \
-name '*.cpio' -o \
-name '*.a' -o \
-name '*.ar' \) \
-type f \
-exec bsdtar tf {} '*vacation*jpg' \; 2> /dev/null
Which you can simplify (and improve to match case-insensitively) with GNU find
with:
find . -regextype egrep \
-iregex '.*\.(zip|7z|iso|cpio|ar?|tar(|\.[gx]z|\.bz2)|tgz|tbz2)' \
-type f \
-exec bsdtar tf {} '*vacation*jpg' \; 2> /dev/null
That doesn't print the path of the archive where those *vacation*jpg
files are found though. To print that name you could replace the last line with:
-exec sh -ac '
for ARCHIVE do
bsdtar tf "$ARCHIVE" "*vacation*jpg" |
awk '\''{print ENVIRON["ARCHIVE"] ": " $0}'\''
done' sh {} + 2> /dev/null
which gives an output like:
./a.zip: foo/blah_vacation.jpg
./a.zip: bar/blih_vacation.jpg
./a.tar.gz: foo/blah_vacation.jpg
./a.tar.gz: bar/blih_vacation.jpg
Or with zsh
:
setopt extendedglob # best in ~/.zshrc
for archive (**/*.(#i)(zip|7z|iso|cpio|a|ar|tar(|.gz|.xz|.bz2)|tgz|tbz2)(.ND)) {
matches=("${(f@)$(bsdtar tf $archive '*vacation*jpg' 2> /dev/null)"})
(($#matches)) && printf '%s\n' "$archive: "$^matches
}
Note that there are a number of other file formats that are just zip
or tgz
files in disguise like .jar
or .docx
files. You can add those to your find
/zsh
search pattern, bsdtar
doesn't care about the extension (as in, it doesn't rely on the extension to determine the type of the file).
Note that *vacation*.jpg
above is matched on the full archive member path, not just the file name, so it would match on vacation.jpg
but also on vacation/2014/file.jpg
.
To match on the filename only, one trick would be to use the extract mode, use -s
(substitution) which uses regexps with a p
flag to print the names of the matching files and then make sure no file is extracted, like:
bsdtar -'s|.*vacation[^/]*$||' -'s|.*||' -xf "$archive"
Note that it would output the list on stderr and append >>
to every line. In any case, bsdtar
, like most tar
implementations may mangle the file names on display if they contain some characters like newline or backslash (rendered as \n
or \\
).
IOError: [Errno 21] Is a directory: '.'
– golimar Jul 27 '17 at 08:24