32

I am trying to figure out the correct syntax to find two strings, the entire part of each string, anywhere (doesn't have to be near each other) in a file. So any file that has both foo and say the number 321, doesn't have to be alone and can be a substring should match. I've tried the following without much luck:

grep 'foo\|321' *

grep 'foo|321'

5 Answers5

33

GNU grep

Should be a little faster because the second grep may operate on a list of files.

grep -lZ 'foo' * | xargs -0 grep -l '321'

POSIX grep with find

find is more useful if you want to search recursive directories (in that case lose the -mindepth and -maxdepth options.

find . -mindepth 1 -maxdepth 1 -type f -exec grep -q 'foo' {} \; -exec grep -l '321' {} +
kojiro
  • 4,644
  • 2
    -r worked fine on the first grep to make that GNU solution recursive for me, instead of using the POSIX line with all those execs – Hashbrown Nov 03 '17 at 09:12
16

You can do this with a short script:

for FILE in *
do
  grep -q foo $FILE && grep -q 321 $FILE && echo $FILE
done

You can also do this on one line:

for FILE in *; do grep -q foo $FILE && grep -q 321 $FILE && echo $FILE; done

grep returns 0 (true) if it found the string and the && separating the commands means that the second one will only run if the first one was true. The -q option makes sure that grep does not output anything.

The echo will only run if both strings were found in the same file.


I thought of a different way to do it. This way will probably be more efficient if the files in question are larger than your installed RAM as it only has to grep through each file once.

 for FILE in *
 do
   test $(egrep -o "foo|321" $FILE | uniq | sort | uniq | wc -l) -eq 2 && echo $FILE
 done

and the one-line version:

 for FILE in *; do test $(egrep -o "foo|321" $FILE | uniq | sort | uniq | wc -l) -eq 2 && echo $FILE; done
  • For your more efficient solution: what if the file has "foo foo" in it? – Jeff Ferland Mar 14 '13 at 22:59
  • 1
    That's what uniq | sort | uniq is for. "foo foo" ends up being one line but "foo 321" ends up being two lines because grep -o outputs all found strings on separate lines, even if they started on the same line. – Ladadadada Mar 14 '13 at 23:04
  • If the files become so huge that it would be OK to fork at least six times per file then it probably makes sense to use awk instead so that the files need not be searched to the end. – Hauke Laging Mar 14 '13 at 23:05
  • @Ladadadada got it. :) – Jeff Ferland Mar 14 '13 at 23:49
  • 4
    @HaukeLaging grep -q and grep -l does not search to the end of the file: they exit as soon as a match is found. It makes me wonder why the first solution isn't for FILE in *; do grep -q foo "$FILE" && grep -l 321 "$FILE"; done – kojiro Mar 15 '13 at 01:42
  • @kojiro That's neat. I didn't know about the -l option until I read @HaukeLaging's edit to his answer. – Ladadadada Mar 15 '13 at 08:04
3

Strange. For me both variants work (grep (GNU grep) 2.13):

grep 'foo\|321'
grep -E 'foo|321'

Edit 1 - show files with both matches only

The for file in * answer works but can become a performance nightmare (for big amounts of files): at least two processes per file. This is faster (in the GNU world):

find . -type f -print0 | xargs -0 -r grep --files-with-matches --null -- string1 |
  xargs -0 -r grep --files-with-matches -- string2

string1 should be the one which results in fewer matches.

Hauke Laging
  • 90,279
  • The asker is looking for the result to return true only if a file contains both strings rather than if it just matches at least one. – Jeff Ferland Mar 14 '13 at 22:13
  • I like that --files-with-matches option. Just read up on it in the man page and it causes grep to stop after it finds the first match, meaning it's very efficient for large files if the match happens early. It also says that the short option -l is specified by POSIX, so it can be used outside the GNU world. – Ladadadada Mar 14 '13 at 22:53
  • @Ladadadada The efficiency is no advantage over your -q, though. Mentioning GNU I wasn't thinking about --files-with-matches but about -0 / --null. What just comes to my mind: pathname expansion contains alphabetic sorting (really bad: It seems that can't even be turned off) so for huge amounts of files even your for file in * really stops being fun. – Hauke Laging Mar 14 '13 at 22:59
  • Indeed, the efficiency optimisations would be quite different for a few large files compared to lots of little ones and * just isn't going to work at all after a few thousand files. – Ladadadada Mar 14 '13 at 23:07
  • @Ladadadada It wouldn't work as part of a command line for an external command: grep foo * But for file in * is a shell-internal structure so I would assume that the command line limit is not applicable here. – Hauke Laging Mar 14 '13 at 23:12
  • How's this for inefficient, then?: find . -type f -exec sh -c 'grep -q "foo" "$1" && grep -q "bar" "$1" && echo "$1"' _ {} \; – SmallClanger Mar 14 '13 at 23:15
3

Basically, to find all files including a particular string in a directory, you can use:

grep -lir "pattern" /path/to/the/dir
  • -l: to make this scanning will stop on the first match
  • -i: to ignore case distinctions in both the pattern and the input files
  • -r: search all files under directory, recursively

To search for two patterns, try this:

grep -lr "321" $(grep -lr "foo" /path/to/the/dir)
quanta
  • 1,680
  • Standard boring comments about $() not handling spaces well apply. In practice for one liners at the shell this command will work fine. – Att Righ Oct 24 '17 at 23:01
-1

Should be

grep -e "foo" -e "321" *

Use -e for multiple patterns

EDIT

In case you need both to match:

grep -e ".*foo.*321.*" *

If the order does not matter:

grep -e ".*foo.*321.*" ".*321.*foo.*" *
ghm1014
  • 1,527
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. – mdpc Mar 14 '13 at 21:48
  • 1
    @mdpc I think it does provide an answer. What makes you think different? – Hauke Laging Mar 14 '13 at 21:52
  • 2
    @HaukeLaging Because it returns if either pattern matches. The OP is looking for a case where it only returns true if both are found in the file. – Jeff Ferland Mar 14 '13 at 22:15
  • My understanding this only finds files where both strings contain on the same line (I don't think . matches newlines by default) – Att Righ Oct 24 '17 at 23:01