grep'ping files for multiple strings (not necessarily on the same line)

Question

More often that I like to admit I look for a file that contains some strings.

Currently I do:

grep -rl string1 | xargs grep -l string2 | xargs grep -l string3

Is there a tool that does this prettier?

This greps for string1 or string2 or string3.

grep -rl -e string1 -e string2 -e string3

I want files that contain string1 and string2 and string3 but not necessarily on the same line.

Maybe one of the modern greps (ag/ack/rg/sift) can do this?

αғsнιη · Answer 1 · 2020-06-09T15:13:04.013

You could use grep in this way:

grep -rzlP '(?s)(?=.*?string1)(?=.*?string2)(?=.*?string3)' .

(?s) Known "dot-all" which tells grep to allow the dot . to match \newline characters as well.`
(?=.*?pattern): Positive Lookahead, that matches any characters . that occurrences zero or more times * and non-greedy followed by a pattern(string1, string2, ...).

You can make a function as following (POSIX bash & zsh):

mgrep() { eval grep -rzlP $(printf ''\''(?s)';
          printf '(?=.*?'\''"$%d"'\'')' $(eval echo {1..$#}); printf ''\''') . ; }

then call as below and it will look for the files recursively within current working directory having all patterns in.

mgrep string1 string2 string3

it will also handle any types of patterns as grep itself supports (adjust the grep's option in function per your requirement in advance).

mgrep string 'pattern with space' '\d+' [0-9]  [...]

@Fólkvangr See my solution based on αғsнιη's. It is basically just dequoting the solution here. — Ole Tange, Sep 02 '18 at 11:09

Stéphane Chazelas · Answer 2 · 2018-09-01T18:10:26.633

4

With agrep (the original approximative grep, not the one from tre), you can do

agrep -ld '$x' 'pattern1;pattern2;pattern3'

Where we use regexp that cannot match ($x, something after the end) as the delimiter.

(use find or zsh recursive globs to search in all files in a directory recursively).

Though note the patterns are matched against the whole content of the files, not each line of each file.

You can script it with gawk with:

PATTERNS='pattern1;pattern2;pattern3' gawk -e '
  BEGIN{n = split(ENVIRON["PATTERNS"], a, ";")}
  BEGINFILE{for (i in a) p[a[i]]; found = 0}
  {
    for (i in p)
      if ($0 ~ i) {
        if (++found == n) {print FILENAME; nextfile}
        delete p[i]
      }
  }' -E /dev/null file1 file2...

(though it's pretty slow).

edited Sep 01 '18 at 18:10

answered Sep 01 '18 at 07:37

Stéphane Chazelas

544,893

This seems only to work if the whole file is a single paragraph (i.e. has no empty lines). Thus not equivalent to original solution. – Ole Tange Sep 01 '18 at 14:12
@OleTange, Oh yes, you're right. I had assumed that ^$ would have worked like for gawk's RS="^$", but no. See edit: -d '$x' should work. – Stéphane Chazelas Sep 01 '18 at 18:12

score 1 · Answer 3 · answered Sep 02 '18 at 08:30

1

Based on αғsнιη's answer:

mgrep() {
    grep -rzlP "(?s)$(printf "(?=.*?%s)" "$@")" .
}

mgrep string1 string2 string3

answered Sep 02 '18 at 08:30

Ole Tange

35,514

This is a terrific answer. No need to even add the function to bashrc, just copy and paste it when needed. Then one can use it and refine the terms as necessary. – dotancohen Oct 21 '21 at 06:58

score 0 · Answer 4 · 2018-09-02T10:00:08.537

0

The following proposition is simple but could probably be more efficient and robust.

#!/bin/bash

tab=(one three five)

# grep_all's return status indicates if all patterns have at least
# one matching result in the text file specified as argument.

grep_all()
{
    local -n patterns=$1      # allows to refer to an array
    local file=$2

    # abort if a pattern is not found
    for pattern in "${patterns[@]}"; do
        if ! grep -q -e "$pattern" "$file"; then
            return 1
        fi
    done
}

grep_all tab file.txt
echo $?

edited Sep 02 '18 at 10:00

answered Sep 01 '18 at 07:25

1

It would (possibly) be handier (or at least portable) if the grep_all function took the pathname as the first argument and then a list of patterns. You would then shift the pathname off first, and then loop over $@. – Kusalananda Sep 01 '18 at 07:42
1

The grep in his solution is definitely not POSIX grep though. – Kusalananda Sep 02 '18 at 10:37

grep'ping files for multiple strings (not necessarily on the same line)

4 Answers4

Linked

Related