1

I have been looking at grep -e where you are doing an "and" operation which is the sort of thing I want. However, if I get it right, the two terms have to be in the same line to be returned.

What I am interested instead is find me all documents in a directory that have both terms, possibly on different lines. If order matters, I do know that one term will always occur before the other, however certainly a general purpose solution would be just fine.

1 Answers1

1

TL&DR

Note: You have to test which one is the fastest for yourself.

grep -rlzE '(TermOne.*TermTwo)|(TermTwo.*TermOne)'

find . -type f -exec grep -q 'TermOne' {} \; \
               -exec grep -q 'TermTwo' {} \; \
               -print

awk '/TermOne/{if(p==0)p=1; if(p==2)p=3}
     /TermTwo/{if(p==0)p=2; if(p==1)p=3}
     p==3{print FILENAME;p=0;nextfile}' ./*

One File

There is no way to build a regex that could match two separate strings in a file.

It is possible to search for two terms with either alternation:

grep -E '(TermOne.*TermTwo)|(TermTwo.*TermOne)' file

or lookahead:

grep -P '(?=.*TermOne)(?=.*TermTwo)' file

but only if the two terms are on the same line

It is also possible to make the whole file act as one file (if the file doesn't contain NULs. Unix text files don't) with the GNU grep -z option:

grep -zE '(TermOne.*TermTwo)|(TermTwo.*TermOne)' file

It is not possible to use -z with -P at the same time, so, no lookahead solutions possible as of today.

The other alternative is to grep twice:

<file grep 'TermOne' | grep -q 'TermTwo'

The exit code of the whole pipe will signal 0 only if both terms were found in one file.

Or, to use awk:

awk '/TermOne/{if(p==0)p=1; if(p==2)p=3}
     /TermTwo/{if(p==0)p=2; if(p==1)p=3}
     p==3{print "both terms found"; exit}' file

list files

The first two solutions from above will work to recursively list all files by adding the options -r (recursive, which then there is no need for a filename) and -l (list matching filenames).

grep -rlzE '(TermOne.*TermTwo)|(TermTwo.*TermOne)'

Or, using find (two grep calls):

find . -type f -exec grep -q 'TermOne' {} \; -exec grep -q 'TermTwo' {} \; -print

Or, using awk (the glob will include only the PWD):

awk '/TermOne/{if(p==0)p=1; if(p==2)p=3}
     /TermTwo/{if(p==0)p=2; if(p==1)p=3}
     p==3{print FILENAME;p=0;nextfile}' ./*
  • I had tried the grep -E method, that was what I was trying to work with, but somehow I did not get it 100% right I guess. With the options and form above, I am seeing the correct number or results come back now. That is the solution I will go with as I have 2 terms (not more) in each file. – demongolem Apr 21 '20 at 12:09