8

I have a directory with ~1M files and need to search for particular patterns. I know how to do it for all the files:

find /path/ -exec grep -H -m 1 'pattern' \{\} \;

The full output is not desired (too slow). Several first hits are OK, so I tried to limit number of the lines:

find /path/ -exec grep -H -m 1 'pattern' \{\} \; | head -n 5

This results in 5 lines followed by

find: `grep' terminated by signal 13

and find continues to work. This is well explained here. I tried quit action:

find /path/ -exec grep -H -m 1 'pattern' \{\} \; -quit

This outputs only the first match.

Is it possible to limit find output with specific number of results (like providing an argument to quit similar to head -n)?

Andrey
  • 820

4 Answers4

6

Since you're already using GNU extensions (-quit, -H, -m1), you might as well use GNU grep's -r option, together with --line-buffered so it outputs the matches as soon as they are found, so it's more likely to be killed of a SIGPIPE as soon as it writes the 6th line:

grep -rHm1 --line-buffered pattern /path | head -n 5

With find, you'd probably need to do something like:

find /path -type f -exec sh -c '
  grep -Hm1 --line-buffered pattern "$@"
  [ "$(kill -l "$?")" = PIPE ] && kill -s PIPE "$PPID"
  ' sh {} + | head -n 5

That is, wrap grep in sh (you still want to run as few grep invocations as possible, hence the {} +), and have sh kill its parent (find) when grep dies of a SIGPIPE.

Another approach could be to use xargs as an alternative to -exec {} +. xargs exits straight away when a command it spawns dies of a signal so in:

 find . -type f -print0 |
   xargs -r0 grep -Hm1 --line-buffered pattern |
   head -n 5

(-r and -0 being GNU extensions). As soon as grep writes to the broken pipe, both grep and xargs will exit and find will exit itself as well the next time it prints something after that. Running find under stdbuf -oL might make it happen sooner.

A POSIX version could be:

trap - PIPE # restore default SIGPIPE handler in case it was disabled
RE=pattern find /path -type f -exec sh -c '
  for file do
    awk '\''
      $0 ~ ENVIRON["RE"] {
        print FILENAME ": " $0
        exit
      }'\'' < "$file"
    if [ "$(kill -l "$?")" = PIPE ]; then
      kill -s PIPE "$PPID"
      exit
    fi
  done' sh {} + | head -n 5

Very inefficient as it runs several commands for each file.

  • Thank you. I will use grep -r for this particular task, but it's good to know how to solve a general case. One note: xargs still prints error xargs: grep: terminated by signal 13, while find stops. I'd add 2> /dev/null here. – Andrey Dec 12 '16 at 20:07
  • The second approach may examine 100s or 1000s of files even though you're only interested in a couple because you are not limiting the lines that xargs will pass to grep. It also doesn't mask the SIGPIPE. Have a look at my suggestion. – V13 Dec 13 '16 at 09:15
  • @V13, you want to pass as many files to grep as possible, to run as few grep instances as possible. executing grep (which implies forking a process, loading the executables and libraries, do dynamic linking, initialise all libraries, internationalisation... and all the clean-up afterwards) has a formidable cost compared to find finding the files. With --line-buffered, grep should be killed early, so should not examine too many extra files once it has found 5 matching files. – Stéphane Chazelas Dec 13 '16 at 10:11
  • @V13, also calling one grep for each file won't stop find finding extra files. A pipe (in a find | xargs -n 1 approach) can hold thousands of files before getting full and causing find to block. – Stéphane Chazelas Dec 13 '16 at 10:25
  • @Andrey, 2> /dev/null would remove every error by grep and xargs, not just the terminated by signal 13 error by xargs. – Stéphane Chazelas Dec 13 '16 at 10:37
  • Yes, I know, maybe it's just for my application I can ignore such errors. As a side note, -r is not supported by some implementations of zgrep, so only find-based solutions would work for compressed files. – Andrey Dec 13 '16 at 11:01
  • @Andrey, zgrep is a generally a script that calls gzip and grep for each file anyway. – Stéphane Chazelas Dec 13 '16 at 11:53
2

A solution to avoid the errors could be this:

find / -type f -print0 \
  | xargs -0 -L 1 grep -H -m 1 --line-buffered 2>/dev/null \
  | head -10

In this example, xargs will stop once the command fails, so there'll be just one pipe error, which will be filtered by the stderr redirection.

V13
  • 4,749
  • Looks ok, and seems more elegant than mine. – xhienne Dec 13 '16 at 00:43
  • The combination of -0 with -L is not documented. In current versions of GNU xargs at least, -L does appear to work like -n with -0 but I wouldn't count on it as that's not what it's meant to do without -0. Using -n 1 would make more sense. In any case, running one grep per file would generally be terribly inefficient. – Stéphane Chazelas Dec 13 '16 at 10:22
1

You grep one file at a time. With your -quit, you stop the find at the first successful grep.

[update] My first solution was to grep mutiple files at once:

find /path/ -type f -exec grep -H -m 1 'pattern' \{\} + -quit | head -n 5

(the magic is in the + at the end of the -exec sub-command. Added -type f. You may want to remove the -H option to grep if you are certain that /path/ contains several files)

The problem here, as reported by @StéphaneChazelas, is that the -exec command is executed asynchronously and returns always true => find quits at the first file.

If we want find to stop when head has finished, find must also receive the SIGPIPE that grep is getting (signal 13). That means that find must send something through the pipe.

Here is a quick-and-dirty hack, enhanced with Stéphane's suggestions:

find /path/ -type f -exec grep -H -m 1 --line-buffered 'pattern' {} + -printf '\r' | head -n 5

With -printf '\r' I force find to output a harmless character that will (hopefully) not alter the output of grep. Once head has stopped, find will receive a SIGPIPE and stop too.

[update2] I warned you that this is a dirty hack. Here is a better solution:

find /path/ -type f -exec grep --quiet 'pattern' {} ";" -print | head -n 5

Here, this is no longer grep that prints the filename, but find => no more "grep terminated by signal 13" and find stops with head. The problem is that matched lines are no longer printed by grep.

[update3] Finally, as suggested by @Andrey, the shamelessly hideous command below would solve this last issue:

find /path/ -type f \
    -exec grep --quiet 'pattern' {} \; \
    -printf '%p:' \
    -exec grep -h -m 1 'pattern' {} \; \
| head -n 5`
xhienne
  • 17,793
  • 2
  • 53
  • 69
  • @StéphaneChazelas Thank you for your suggestions. The problem here, as I understood it in the OP, is that find doesn't die because it never receives the SIGPIPE itself. Without -quit the only solution I see is to force find sending something to stdout. I'll amend my answer this way. Your opinion is welcome. – xhienne Dec 12 '16 at 16:28
  • Another option is to use grep -r since we're already using several other GNU extensions here. – Stéphane Chazelas Dec 12 '16 at 16:47
  • @StéphaneChazelas grep -r is a far different solution that should be proposed independently, probably by you. (as for you kill "$PPID", thanks, I'm no longer ashamed by my ugly hacks ;-) – xhienne Dec 12 '16 at 16:56
  • Nice trick! A couple of thoughts: grep -m 1 is essential in my case since each file might have thousands of matches; it's possible to emulate grep -H with find /path/ -type f -printf '%p:' -exec grep -hm 1 'pattern' \{\} \; | head -n 5. – Andrey Dec 12 '16 at 20:13
  • 1
    @Andrey I did preserve the -m1 flag, except in my last proposal where the --quiet option makes also grep stop at the first match. As for your command, I'm not sure this is what you want since you're printfing every file before ensuring there is a grep. Maybe you meant find /path/ -type f -exec grep -q 'pattern' {} \; -printf '%p:' -exec grep -hm 1 'pattern' \{\} \;. That would do the trick indeed (in which case, tell me, I may add this to my answer) – xhienne Dec 13 '16 at 00:38
  • Yes, you're right, just checked man for -q. The compound command grep-printf-grep works as expected. – Andrey Dec 13 '16 at 01:05
-1

An alternate route for simpler cases could be just a here-string instead of a pipe. E.g -

find . -exec stat -c %y {} \; | head -n1

Would see the same issue as above. One easy method to consider -

head -n1 <<<$(find . -exec stat -c %y {} \;)
Jay
  • 1