3

I'm using this command to find patterns in zip files (similar to the one) suggested here https://superuser.com/questions/144926/unix-grep-for-a-string-within-all-gzip-files-in-all-subdirectories

find . -regex ".*/.*zip" | xargs zgrep -m 1 -E "PATTERN"

Grepping still continues after first match. Probably find/xargs is the culprit. How to stop finding after grep finds first match?

P.S. How to stop the find command after first match? won't work because find needs to be stopped after a match that succeeds grep and not just the first match of find.

user13107
  • 5,335

4 Answers4

3

Several things:

  • zgrep is to look into .z or .gz compressed files, not files inside compressed zip archives.

    There's a (broken) zipgrep script sometimes bundled with unzip, to look into zip archives, but what it does is run egrep on each member of the archive (so with -m1 each egrep would report the first match for each file).

    zgrep, similarly is a script that comes with gzip that feeds the output of gzip -cdfq to grep for each file. gzip -d can uncompress zip files, but only does so for the first member of the archive and only if it is compressed (in zip files, not all members are necessarily compressed, especially small ones).

  • xargs runs as few commands as necessary but it may still run several if the list of files is big.

Here, your best bet is probably to implement zipgrep by hand (here with GNU tools):

find . -name '*.zip' -type f -exec sh -c '
    unzip -Z1 "$1" |
      while IFS= read -r file; do
        unzip -p "$1" "$file" | grep --label="$1//$file" -Hm1 -- "$0" && exit
      done' PATTERN {} \; -quit

That runs one shell per file, but so would zipgrep and zipgrep runs many more commands.

It can fail if archive members have names that contain wildcard characters (*, [, ?) or other characters like ASCII characters 0x1 to 0x1f and various other ones, but that's mostly due to bugs and limitations in unzip, and that's not as bad as when using zipgrep.

  • But zgrep can actually search through zip archives, at least it does on my system. Is it somehow a bad idea nevertheless? – terdon Sep 19 '13 at 14:42
  • 3
    @terdon, that's only for zip files that contain only one compressed file. See the Files created by zip can be uncompressed by gzip only if they have a single member compressed with the 'deflation' method. This feature is only intended to help conversion of tar.zip files to the tar.gz format. in gzip man page. – Stéphane Chazelas Sep 19 '13 at 14:52
2

Try:

find . -iname '*.zip' -print0 | xargs -0r zgrep -l -E 'PATTERN'

I've used -iname rather than -regex - it works as well for this and is, IMO, less confusing than find's weird regex handling. -print0 and xargs -0 are used so that any filenames with spaces or shell metacharacters in them will be handled correctly.

grep's -l option is documented in the man page:

   -l, --files-with-matches
          Suppress  normal  output;  instead  print the name of each input
          file from which output would normally have  been  printed.   The
          scanning  will  stop  on  the  first match.

The first match mentioned is per file, so if multiple files match, they will all be printed. note that this means that grep will continue searching the other files, even after it has found one match.

If you want it to stop after the very first match, you could use grep's --line-buffered option and pipe grep's output into head -1. When the first match is printed, head will print it and terminate, grep will no longer have a stdout so it will terminate, and find will follow.

find . -iname '*.zip' -print0 | xargs -0r zgrep --line-buffered -l -E 'PATTERN' | head -1
cas
  • 78,579
  • Thanks but I want to avoid grep will continue searching the other files, even after it has found one match – user13107 Sep 19 '13 at 06:21
  • 1
    AFAIK, grep doesn't have any option to do that...but you might be able to do it with grep's --line-buffered option and pipe grep into head -1. i'll add an example to my answer. – cas Sep 19 '13 at 06:32
1

grep's (or zgrep's) -m option will cause it to stop reading the current file on the first match:

   -m NUM, --max-count=NUM
          Stop reading a file after NUM matching lines.  

That will not stop it from searching the next file. For example:

$ echo "hello" > foo
$ echo "hello" > bar
$ grep -m 1 hello foo bar
foo:hello
bar:hello

So, the issue is not xargs but the fact that you are grepping multiple files. In order to have grep (or zgrep) stop after the first matching file, you would have to run a little loop like @Stephane has suggested. Or, something like this with bash :

shopt -s globstar
for i in **/*.zip; do
  zgrep -l pattern "$i" && break; 
done

Or, for zip archives that contain multiple files (thanks @Stephane):

shopt -s globstar
for i in **/*.zip; do
  if unzip -p "$i" | grep -q hello; then 
    echo "$i" && break;
  fi;
done
terdon
  • 242,166
  • 1
    Note that **/*.zip, contrary to the find counterpart would not include dot-files or dot-dirs, also, expansions may start with - (while with find . they start with ./), so you may want to add a --. Also, contrary to zsh or ksh93 where it copied that feature from, bash's **/ follows symlinks when descending the directory tree, so is to be used with care. – Stéphane Chazelas Sep 19 '13 at 19:52
0

grep -m 1 lists the first match of every file.

There's an easy way of listing just the first match: pipe through head -n 1. The search will soon die of a SIGPIPE.

find . -regex ".*/.*zip" -print0 | xargs -0 zgrep -E "PATTERN" | head -n 1