Piped greps for looking inside files

Question

(self-migrated from ask-ubuntu because it's linux-related, not ubuntu, and my os isn't ubuntu)

I'm trying to make a grep that looks like this:

grep -r 2019 | grep -riv FAILED | grep -rl DSL

I want to get filenames (-l) of files containing 2019 in them, AND NOT (-v) containing FAILED AND containing DSL.

Here, only the last grep is executed. I understand it's because of the -r, so each grep greps on all files instead of the previous result. But I can't figure out how to make it work without -r.

Maybe there's another way to use multiple patterns on a grep but with "positive" and "negative" match I haven't found anything.

(a) You can ask text-processing questions on Ask Ubuntu if you're using Ubuntu, and (b) this is not a migrated post, this is manually cross-posted (which is not allowed here). To migrate your post from Ask Ubuntu to here, flag it for moderator attention. — muru, Jul 24 '19 at 08:12
@muru Yeah I wasn't sure how to do that but I realized I had no reason to post it on Ubuntu as my os isn't even an ubuntu. Should I delete my other post? — Teleporting Goat, Jul 24 '19 at 08:14
You should, but you can't, since it has an upvoted answer. Flag it for moderator attention, and ask it to be migrated here, instead. — muru, Jul 24 '19 at 08:15
Have you tried removing the -r from the 2nd and 3rd grep? — ctrl-alt-delor, Jul 24 '19 at 08:58
Are you trying to process files content, or file-names? You question is not clear on this. — ctrl-alt-delor, Jul 24 '19 at 08:59

Kusalananda · Accepted Answer · 2019-07-24T10:13:37.723

The last grep in the pipeline would be reading from the previous grep (if it hadn't used the -r option, see later), so it would have no idea from what file the data came from, which in turn means it can't report the pathname of the file.

Instead, consider using find like so:

find . -type f \
    -exec grep -q 2019 {} \; \
    -exec grep -q DSL {} \; \
    ! -exec grep -qi FAILED {} \; \
    -print

This would take each regular file from the current directory and any subdirectory (recursively) and test whether it contains the strings 2019, DSL, and FAILED (case insensitively). It would print the pathnames of file that contain the first two string but that does not contain the third.

If a file does not contain 2019 the other two tests will not be carried out, and if it does not contain DSL, the last test will ont be carried out, etc.

Note that instead of grep -v -qi FAILED I'm using a negation of grep -qi FAILED as the third test. I'm not interested in whether the file contains lines not containing FAILED, I'm interested in whether the file contains FAILED, and in that case I'd like to skip this file.

Understanding the -exec option of `find`

The issue with your pipeline,

grep -r 2019 | grep -riv FAILED | grep -rl DSL

is that the last grep will look recursively in all the files in the current directory and below and will ignore the input from the previous stages of the pipeline. The two initial grep invocations may produce some data, but they would fail to forward this through the pipeline and will eventually be killed when the last grep is done.

Also, as I already noted above, the middle grep would not find files that does not contain FAILED, it would find files that contain lines with things other than FAILED. Incidentally, it would also ignore the input from the preceding grep.

Stéphane Chazelas · Answer 2 · 2019-07-24T09:07:57.017

2

With GNU grep (-r is already a GNU extension) and GNU xargs or compatible:

grep -rlZ 2019 . |
  xargs -r0 grep -LiZ FAILED |
  xargs -r0 grep -l DSL

You need xargs to be able to pass the list of files output by one grep as arguments to the next grep. And -Z for that list of files to be NUL-delimited. To report the list of files that don't contain FAILED, it's -L (a GNU extension as well), not -vl which report the files that contain at least one line that doesn't match.

That should limit the number of grep invocations to a minimum, and for a large number of files could leverage up to three processors concurrently.

edited Jul 24 '19 at 09:07

answered Jul 24 '19 at 08:33

Stéphane Chazelas

544,893

I'm not familiar with xargs so I don't know what I'm doing wrong but when I try your command I get xargs: illegal option -- r and xargs: Usage: xargs: [-t] [-p] [-e[eofstr]] [-E eofstr] [-I replstr] [-i[replstr]] [-L #] [-l[#]] [-n # [-x]] [-s size] [cmd [args ...]] – Teleporting Goat Jul 24 '19 at 09:08
@mosvy, it would mean loading files entirely in memory (which you can only do with -z if the files don't contain NUL bytes). Also note that -P is not always enabled and is still considered experimental in GNU grep. – Stéphane Chazelas Jul 24 '19 at 09:10
1

@TeleportingGoat, your xargs doesn't appear to be GNU xargs. You can drop -r as it's only there to prevent running commands for empty inputs. But from the usage, it looks like it doesn't support -0 either so it can probably not be used reliably. What system are you on? – Stéphane Chazelas Jul 24 '19 at 09:13
@mosvy, no I meant that for grep -Pr to be able to report files that contain as a whole DSL, 2019 and not FAILED (using lookaround operators), you'd need to use -z and assume the files don't contain NUL characters. – Stéphane Chazelas Jul 24 '19 at 09:15
ok right, got it. – Jul 24 '19 at 09:17
@StéphaneChazelas Solaris 10 1/13 – Teleporting Goat Jul 24 '19 at 09:26
@TeleportingGoat, xargs is not usable reliably on Solaris, you may want to install the GNU tools there. If you have a grep that supports -r and -L, you probably have some GNU tools already. Maybe GNU xargs is available as gxargs. – Stéphane Chazelas Jul 24 '19 at 10:03
@StéphaneChazelas It's a company VM, I can't install anything. I'll have to use what's available. – Teleporting Goat Jul 24 '19 at 12:19

Piped greps for looking inside files

2 Answers2