How do I grep for lines containing either of two words, but not both?

Question

I'm trying to use grep to show only lines containing either of the two words, if only one of them appears in the line, but not if they are in the same line.

So far I've tried grep pattern1 | grep pattern2 | ... but didn't get the result I expected.

(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like [a-z][a-z0-9]\(,7\}\(\.[a-z0-9]\{,3\}\)+? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences? — G-Man Says 'Reinstate Monica', Jan 31 '19 at 04:05

Chris · Answer 1 · 2019-01-30T16:29:11.797

63

A tool other than grep is the way to go.

Using perl, for instance, the command would be:

perl -ne 'print if /pattern1/ xor /pattern2/'

perl -ne runs the command given over each line of stdin, which in this case prints the line if it matches /pattern1/ xor /pattern2/, or in other words matches one pattern but not the other (exclusive or).

This works for the pattern in either order, and should have better performance than multiple invocations of grep, and is less typing as well.

Or, even shorter, with awk:

awk 'xor(/pattern1/,/pattern2/)'

or for versions of awk that don't have xor:

awk '/pattern1/+/pattern2/==1`

edited Jan 30 '19 at 16:29

answered Jan 30 '19 at 12:03

Chris

1,539

Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words". – Jim L. Jan 30 '19 at 22:35
4

@JimL. You could put word boundaries (\b) in the patterns themselves, i.e. \bword\b. – wjandrea Jan 30 '19 at 22:48
awk '/pattern1/!=/pattern2/' – Quasímodo Feb 03 '21 at 15:58

Haxiel · Answer 2 · 2019-01-30T13:47:41.617

33

With GNU grep, you could pass both words to grep and then remove the lines containing both the patterns.

$ cat testfile.txt
abc
def
abc def
abc 123 def
1234
5678
1234 def abc
def abc

$ grep -w -e 'abc' -e 'def' testfile.txt | grep -v -e 'abc.*def' -e 'def.*abc'
abc
def

edited Jan 30 '19 at 13:47

answered Jan 30 '19 at 11:46

Haxiel

8,361

That is cool. I don't know about if it does both. I did not test, but I did use your answer to do grep on or of two words. grep -e foo -e goo. Thanks. – netskink Aug 15 '22 at 14:48

Siva · Answer 3 · 2019-01-30T14:56:04.607

17

Try with egrep

egrep  'pattern1|pattern2' file | grep -v -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'

edited Jan 30 '19 at 14:56

answered Jan 30 '19 at 11:45

Siva

9,077

3

can also be written as grep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo' – glenn jackman Jan 30 '19 at 13:40
8

Also, note from the grep man page: Direct invocation as either egrep or fgrep is deprecated -- prefer grep -E – glenn jackman Jan 30 '19 at 13:41
I was refering to the text "Direct invocation as either egrep or fgrep is deprecated " (or similar) in the manual page, @terdon. Not in AIX or the BSDs, but is in HP-UX and linuxs – Grump Feb 01 '19 at 23:12

Stéphane Chazelas · Answer 4 · 2019-02-01T14:08:05.623

With grep implementations that support perl-like regular expressions (like pcregrep or GNU or ast-open grep -P), you can do it in one grep invocation with:

grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'

That is find the lines that match pat1 but not pat2, or pat2 but not pat1.

(?=...) and (?!...) are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^) provided it's followed by .*pat1 and not followed by .*pat2, or the same with pat1 and pat2 reversed.

That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:

grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'

(?(1)yespattern|nopattern) matches against yespattern if the 1^st capture group (empty () above) matched, and nopattern otherwise. If that () matches, that means pat1 didn't match, so we look for pat2 (positive look ahead), and we look for not pat2 otherwise (negative look ahead).

With sed, you could write it:

sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'

Jim L. · Answer 5 · 2019-01-30T23:42:47.367

In Boolean terms, you're looking for A xor B, which can be written as

(A and not B)

or

(B and not A)

Given that your question doesn't mention that you are concerned with the order of the output so long as the matching lines are shown, the Boolean expansion of A xor B is pretty darn simple in grep:

$ cat << EOF > foo
> a b
> a
> b
> c a
> c b
> b a
> b c
> EOF
$ grep -w 'a' foo | grep -vw 'b'; grep -w 'b' foo | grep -vw 'a';
a
c a
b
c b
b c

Zhro · Answer 6 · 2019-01-31T07:13:37.010

-2

For the following example:

# Patterns:
#    apple
#    pear

# Example line
line="a_apple_apple_pear_a"

This can be done purely with grep -E, uniq, and wc.

# Grep for regex pattern, sort as unique, and count the number of lines
result=$(grep -oE 'apple|pear' <<< $line | sort -u | wc -l)

If grep is compiled with Perl regular expressions then you can match on the last occurrence instead of needing to pipe to uniq:

# Grep for regex pattern and count the number of lines
result=$(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l)

Output the result:

# Only one of the words exists if the result is < 2
((result > 0)) &&
   if (($result < 2)); then
      echo Only one word matched
   else
      echo Both words matched
   fi

A one-liner:

(($(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l) == 1)) && echo Only one word matched

If you don't want to hard-code the pattern, assembling it with a variable set of elements can be automated with a function.

This can also be done natively in Bash as a function without pipes or additional processes but would be more involved and is probably outside the scope of your question.

edited Jan 31 '19 at 07:13

answered Jan 31 '19 at 04:16

Zhro

2,669

(1) I was wondering when somebody was going to give an answer using Perl regular expressions. If you focused on that part of your post, and explained how it worked, this could be a good answer. (2) But I’m afraid the rest isn’t so good. The question says “show only *lines* containing either of the two words” (emphasis added). If the output is supposed to be *lines,* then it stands to reason that the input must also be multiple lines. But your approach works *only* when looking at only a single line. … (Cont’d) – G-Man Says 'Reinstate Monica' Feb 02 '19 at 23:35
(Cont’d) … For example, if the input contains the lines Big apple\n and pear-shaped\n, then the output should contain both of those lines. Your solution would get a count of 2; the long version would report “Both words matched” (which is an answer to the wrong question) and the short version would say nothing at all. (3) A suggestion: using -o here is a really bad idea, because it hides the lines that contain the matches, so you can’t see when both words appear on the same line. … (Cont’d) – G-Man Says 'Reinstate Monica' Feb 02 '19 at 23:35
(Cont’d) … (4) Bottom line: your use of uniq / sort -u and the fancy Perl regular expression to match only the last occurrence on each line don’t really add up to a useful answer to this question. But, even if they did, it would still be a bad answer because you don’t explain how they contribute to answering the question. (See Stéphane Chazelas’s answer for an example of a good explanation.) – G-Man Says 'Reinstate Monica' Feb 02 '19 at 23:35
The OP says that they wanted to "show only lines containing either of the two words" which means that each line has to be evaluated on its own. I don't see why you feel that this doesn't answer the question. Please provide an example input that you feel would fail. – Zhro Feb 02 '19 at 23:46
Oh, is *that* what you meant? “Read the input a line at a time and execute these two or three commands *for every line.”? (1) It’s painfully unclear that that’s what you meant. (2) It’s painfully inefficient. Four answers before yours showed how to handle the entire* file in a few commands (one, two or four), and you want to run 3 × n commands for n lines of input? Even if it works, it earns a down vote for unnecessarily expensive execution. (3) At the risk of splitting hairs, it still doesn’t do the job of *showing* the appropriate lines. – G-Man Says 'Reinstate Monica' Feb 03 '19 at 01:09
That's purely subjective and micro-optimization. I don't know how large the file is and it may be small enough that it doesn't even matter. If some action needs to be performed on each of the resulting lines then the output would be looped over anyways. There is nothing wrong with this answer other than it somehow displeasing you. – Zhro Feb 03 '19 at 01:40
Why is using a shell loop to process text considered bad practice? – G-Man Says 'Reinstate Monica' Feb 03 '19 at 03:27

How do I grep for lines containing either of two words, but not both?

6 Answers6

Linked

Related