167

I would like to get the multi pattern match with implicit AND between patterns, i.e. equivalent to running several greps in a sequence:

grep pattern1 | grep pattern2 | ...

So how to convert it to something like?

grep pattern1 & pattern2 & pattern3

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string. Using filter is system feature, not grep, so it is not an argument for it.


Don't confuse this question with:

grep "pattern1\|pattern2\|..."

This is an OR multi pattern match. I am looking for an AND pattern match.

greenoldman
  • 6,176

9 Answers9

143

To find the lines that match each and everyone of a list of patterns, agrep (the original one, now shipped with glimpse, not the unrelated one in the TRE regexp library) can do it with this syntax:

agrep 'pattern1;pattern2'

With GNU grep, when built with PCRE support, you can do:

grep -P '^(?=.*pattern1)(?=.*pattern2)'

With ast grep:

grep -X '.*pattern1.*&.*pattern2.*'

(adding .*s as <x>&<y> matches strings that match both <x> and <y> exactly, a&b would never match as there's no such string that can be both a and b at the same time).

If the patterns don't overlap, you may also be able to do:

grep -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'

The best portable way is probably with awk as already mentioned:

awk '/pattern1/ && /pattern2/'

Or with sed:

sed -e '/pattern1/!d' -e '/pattern2/!d'

Or perl:

perl -ne 'print if /pattern1/ && /pattern2/'

Please beware that all those will have different regular expression syntaxes.

The awk/sed/perl ones don't reflect whether any line matched the patterns in their exit status. To so that you need:

awk '/pattern1/ && /pattern2/ {print; found = 1}
     END {exit !found}'
perl -ne 'if (/pattern1/ && /pattern2/) {print; $found = 1}
          END {exit !$found}'

Or pipe the command to grep '^'.

For potentially gzip-compressed files, you can use zgrep which is generally a shell script wrapper around grep, and use one of the grep solutions above (not the ast-open one as that grep implementation cannot be use by zgrep) or you could use the PerlIO::gzip module of perl which can transparently uncompress files upon input:

perl -MPerlIO::gzip -Mopen='IN,gzip(autopop)' -ne '
  print "$ARGV:$_" if /pattern1/ && /pattern2/' -- *.gz

(which if the files are small enough at least is even going to be more efficient than zgrep as the decompression is done internally without having to run gunzip for each file).

  • 4
    The agrep syntax is not working for me... which version was it introduced in? – Raman Sep 05 '16 at 22:15
  • @Raman 2.04 from 1992 already had it. I've no reason to believe it wasn't there from the start. Newer (after 1992) versions of agrep can be found included with glimpse/webglimpse. Possibly you have a different implementation. I had a mistake for the ast-grep version though, the option for augmented regexps is -X, not -A. – Stéphane Chazelas Sep 06 '16 at 05:55
  • @StéphaneChazelas Thanks, I have agrep 0.8.0 on Fedora 23. This appears to be a different agrep than the one you reference. – Raman Sep 06 '16 at 06:37
  • 2
    @Raman, yours sounds like TRE agrep. – Stéphane Chazelas Sep 06 '16 at 07:01
  • @StéphaneChazelas Indeed it is. Too bad Fedora doesn't have the agrep you are referring to, out of the box. – Raman Sep 13 '16 at 18:15
  • awk '/pattern1/ && /pattern2/' is good, but how do i calculate count for this? – Yogesh D Jun 28 '17 at 17:40
  • @Techiee, you mean the count of lines that match either pattern or the count of occurrences of each pattern? In any case, it seems it would be a different question. – Stéphane Chazelas Jun 28 '17 at 18:06
  • @StéphaneChazelas, Right now awk '/pattern1/ && /pattern2/' is printing all the lines, i want the print the count of such lines. Can you help? Thanks – Yogesh D Jun 28 '17 at 18:31
  • @StéphaneChazelas: I got it, awk '/pattern1/ && /pattern2/' filename | wc -l will give me desired output. – Yogesh D Jun 28 '17 at 18:35
  • 3
    @Techiee, or just awk '/p1/ && /p2/ {n++}; END {print 0+n}' – Stéphane Chazelas Jun 28 '17 at 20:23
  • The awk solution doesn't work for me on Cygwin. Looks like it is but checking the process shows that it's doing nothing, no CPU usage at all. – Hashim Aziz Dec 12 '19 at 20:15
  • @StéphaneChazelas I am getting grep: invalid matcher .*pattern1.*&.*pattern2.* – Chaminda Bandara Feb 17 '21 at 02:04
  • 1
    @ChamindaBandara, you ran that with GNU grep instead of ast grep. GNU grep has no support for ast augmented regexp. It does have an undocumented -X option, but that's for something unrelated, it's to specify the regexp flavour (matcher) like in grep -X perl being the same as grep -P. – Stéphane Chazelas Feb 17 '21 at 09:00
  • Honestly, I've tried at least half of these suggestions in the example and none of them work as described. – Daniel Kaplan Mar 09 '22 at 20:50
  • 2
    @DanielKaplan, from your recent question, I suspect you're looking for something difference from what this Q&A is about. Here we're trying to find lines that match all patterns, while you may be trying to find files for which all patterns are matched by any line (there are several Q&As here covering that). I've edited the answer to maybe make that more obvious. – Stéphane Chazelas Mar 10 '22 at 07:32
  • 1
    Ah! OK, I see. My mistake. – Daniel Kaplan Mar 10 '22 at 07:36
  • Note that awk exits with a 0 status code even when no matches are found. You can fix this by piping to grep . – BallpointBen Jul 14 '23 at 13:34
  • @BallpointBen, see edit. – Stéphane Chazelas Jul 17 '23 at 09:25
30

You didn't specify grep version, this is important. Some regexp engines allow multiple matching groupped by AND using '&' but this is non-standard and non-portable feature. But, at least GNU grep doesn't support this.

OTOH you can simply replace grep with sed, awk, perl, etc. (listed in order of weight increasing). With awk, the command would look like

awk '/regexp1/ && /regexp2/ && /regexp3/ { print; }'

and it can be constructed to be specified in command line in easy way.

Netch
  • 2,529
  • 5
    Just remember that awk uses ERE's, e.g. the equivalent of grep -E, as opposed to the BRE's that plain grep uses. – jw013 Nov 10 '12 at 09:42
  • 4
    awk's regexes are called EREs, but in fact they're a bit idiosyncratic. Here are probably more details than anyone cares for: http://wiki.alpinelinux.org/wiki/Regex – dubiousjim Nov 10 '12 at 15:35
  • Thank you, grep 2.7.3 (openSUSE). I upvoted you, but I will keep question open for a while, maybe there is some trick for grep (not that I dislike awk -- simply knowing more is better). – greenoldman Nov 10 '12 at 15:42
  • 3
    The default action is to print the matching line so the { print; } part isn't really necessary or useful here. – tripleee Apr 20 '17 at 11:58
  • This still returns a 0 status code if the match fails. – BallpointBen Jul 14 '23 at 04:48
  • @BallpointBen If you mean return code from awk, well, it requires explicit generation because awk program doesn't "know" what is positive result for the programmer. You may add a variable used as boolean and to select exit code in END block. – Netch Jul 15 '23 at 05:26
  • @Netch Actually I think the easiest way is just pipe awk to grep .. – BallpointBen Jul 15 '23 at 14:34
19

git grep

Here is the syntax using git grep combining multiple patterns using Boolean expressions:

git grep --no-index -e pattern1 --and -e pattern2 --and -e pattern3

The above command will print lines matching all the patterns at once.

--no-index Search files in the current directory that is not managed by Git.

Check man git-grep for help.

See also:

For OR operation, see:

kenorb
  • 20,988
17

grep pattern1 | grep pattern2 | ...

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string

It's actually possible to build the pipeline dynamically (without resorting to eval):

# Executes: grep "$1" | grep "$2" | grep "$3" | ...
function chained-grep {
    local pattern="$1"
    if [[ -z "$pattern" ]]; then
        cat
        return
    fi
shift
grep -- &quot;$pattern&quot; | chained-grep &quot;$@&quot;

}

cat something | chained-grep all patterns must match order but matter dont

It's probably not a very efficient solution though.

  • 2
    Use either chained-grep() or function chained-grep but not function chained-grep(): https://unix.stackexchange.com/questions/73750/difference-between-function-foo-and-foo – nisetama Jan 19 '19 at 17:08
  • Can you describe what the trick is? Can you add it to the answer (*without* "Edit:", "Update:", or similar ) by editing it? – Peter Mortensen Oct 30 '20 at 20:40
  • Reformulated the answer to make the trick clearer (ie.: build a shell pipeline dynamically) – olejorgenb Oct 30 '20 at 23:21
  • 1
    The important part here is that shell allows recursion which makes this possible. Note the keyword local in front of variable that must be unique for the recursion. Also note that keyword local is not POSIX so using shebang #!/bin/sh may not be safe, see details here: https://unix.stackexchange.com/a/493743/20336 – Mikko Rantalainen Jul 07 '22 at 07:15
12

If patterns contains one pattern per line, you can do something like this:

awk 'NR==FNR{a[$0];next}{for(i in a)if($0!~i)next}1' patterns -

Or this matches substrings instead of regular expressions:

awk 'NR==FNR{a[$0];next}{for(i in a)if(!index($0,i))next}1' patterns -

To print all instead of no lines of the input in the case that patterns is empty, replace NR==FNR with FILENAME==ARGV[1], or with ARGIND==1 in gawk.

These functions print the lines of STDIN which contain each string specified as an argument as a substring. ga stands for grep all and gai ignores case.

ga(){ awk 'FILENAME==ARGV[1]{a[$0];next}{for(i in a)if(!index($0,i))next}1' <(printf %s\\n "$@") -; }
gai(){ awk 'FILENAME==ARGV[1]{a[tolower($0)];next}{for(i in a)if(!index(tolower($0),i))next}1' <(printf %s\\n "$@") -; }
nisetama
  • 1,097
4

Here's my take, and this works for words in multiple lines:

Use find . -type f followed by as many
-exec grep -q 'first_word' {} \;
and the last keyword with
-exec grep -l 'nth_word' {} \;

-q quiet / silent
-l show files with matches

The following returns list of filenames with words 'rabbit' and 'hole' in them:
find . -type f -exec grep -q 'rabbit' {} \; -exec grep -l 'hole' {} \;

StackRover
  • 141
  • 2
2

to search multiple files for the presence of two patterns anywhere in the file use

awk -v RS="" '/patern1/&&/patern2/{print FILENAME}' file1 ... filen
Archemar
  • 31,554
  • Grep is all too often used where (IMO) awk would be better. I like this answer for exactly that reason, and of course awk can do further processing such as printing only fields 6 and 2 from the input. – Graham Nicholls Jun 01 '21 at 10:20
  • This doesn't actually address the OP's question, but +1 'cause I think it's very useful for other related situations & reveals the strength of awk... if you had to choose awk or grep, I think it's clear. Fortunately we don't have to make this choice :) – Seamus Feb 19 '22 at 00:54
1

ripgrep

Here is the example using rg:

rg -N '(?P<p1>.*pattern1.*)(?P<p2>.*pattern2.*)(?P<p3>.*pattern3.*)' file.txt

It's one of the quickest grepping tools, since it's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.

See also related feature request at GH-875.

kenorb
  • 20,988
-2

To find all of the words (or patterns), you can run grep in a for loop. The main advantage here is searching from a list of regular expressions.

A real example:

# File 'search_all_regex_and_error_if_missing.sh'

find_list="
^a+$
^b+$
^h+$
^d+$
"

for item in $find_list; do if grep -E "$item" file_to_search_within.txt then echo "$item found in file." else echo "Error: $item not found in file. Exiting!" exit 1 fi done

Now let's run it on this file:

hhhhhhhhhh
aaaaaaa
bbbbbbbbb
ababbabaabbaaa
ccccccc
dsfsdf
bbbb
cccdd
aa
caa
$ ./search_all_regex_and_error_if_missing.sh
aaaaaaa aa
^a+$ found in file.
bbbbbbbbb bbbb
^b+$ found in file.
hhhhhhhhhh
^h+$ found in file.
Error: ^d+$ not found in file. Exiting!
Noam Manos
  • 1,031
  • 2
    Your logic is faulty -- I asked for ALL operator, your code works as OR operator, not AND. And btw. for that (OR) is much easier solution given right in the question. – greenoldman Aug 14 '18 at 22:18
  • @greenoldman The logic is simple: The for will loop on ALL of the words/patterns in the list, and if it is found in file - will print it. So just remove the else if you don't need action in case word was not found. – Noam Manos Aug 16 '18 at 15:07
  • 1
    I understand your logic as well as my question -- I was asking about AND operator, meaning the file is only a positive hit if it matches pattern A and pattern B and pattern C and... AND In you case file is positive hit if it matches pattern A or pattern B or... Do you see the difference now? – greenoldman Aug 17 '18 at 06:19
  • @greenoldman not sure why you think this loop does not check AND condition for all patterns? So I've edited my answer with a real example: It will search in file for all regex of list, and on the first one which is missing - will exit with error. – Noam Manos Aug 19 '18 at 15:04
  • You have it right in front of your eyes, you have positive match just after first match is executed. You should have "collect" all outcomes and compute AND on them. Then you should rewrite the script to run on multiple files -- then maybe you realize that the question is already answered and your attempt does not bring anything to the table, sorry. – greenoldman Aug 20 '18 at 05:56
  • @greenoldman sorry I don't get your point. ^a+$ , ^b+$ , ^h+$ are all positive match, but ^d+$ is not a match, so the search then breaks. That's exactly the meaning of AND condition! You don't have to print anything during the loop, just after it ends, if that's what you want. – Noam Manos Aug 20 '18 at 09:51
  • Of course, I could add variable tracking whether AND condition is fulfilled, and then I would have an extra script instead of short and concise call of grep which was posted and accepted as solution six years ago. Take signal to noise into consideration and please delete your entire answer -- it does not add anything really. – greenoldman Aug 20 '18 at 16:38
  • @greenholdman, why do you think it doesn't add anything? It's a great solution to verify numerous words/regexs. Imagine grep -e 'pattern1.pattern2' -e 'pattern2.pattern1' on 10+ regex, not just two... – Noam Manos Aug 23 '18 at 07:58
  • 1
    Note that this answer is about searching for all patterns and reporting if each pattern cannot be find at least once in the file. The original question was about matching ALL the patterns against ALL the lines instead of matching files. – Mikko Rantalainen Jul 07 '22 at 07:27
  • The accepted answer starts with the phrase, "To find the lines that match each and everyone of a list of patterns... ". So presumably that's what the OP desires. Does perchance this solution return ___filewise___ matches instead of ___linewise___ matches? If so, it seems this answer could be tweaked to return the latter. – jubilatious1 Jan 13 '24 at 21:03