How to use grep to search for a line with one of two words but not both?

Question

I want to search for lines with 'word1' XOR 'word2' in a text file. So it should output lines with word1, word2 but not the lines with both of these words. I wanted to use the XOR but I do not know how to write that in linux command line.

I tried:

grep 'word1\|word2' text.txt
grep word1 word2 text.txt
grep word1 text.txt | grep word2
grep 'word1\^word2' text.txt

and many more, but could not get sucess.

Stéphane Chazelas · Answer 1 · 2018-02-06T20:32:27.777

17

With GNU awk:

$ printf '%s\n' {foo,bar}{bar,foo} neither | gawk 'xor(/foo/,/bar/)'
foofoo
barbar

Or portably:

awk '((/foo/) + (/bar/)) % 2'

With a grep with support for -P (PCRE):

grep -P '^((?=.*foo)(?!.*bar)|(?=.*bar)(?!.*foo))'

With sed:

sed '
  /foo/{
    /bar/d
    b
  }
  /bar/!d'

If you want to consider whole words only (that there is neither foo nor bar in foobar or barbar for instance), you'd need to decide how those words are delimited. If it's by any character other than letters, digits and underscore like the -w option of many grep implementation does, then you'd change those to:

gawk 'xor(/\<foo\>/,/\<bar\>/)'
awk '((/(^|[^[:alnum:]_)foo([^[:alnum:]_]|$)/) + \
      (/(^|[^[:alnum:]_)bar([^[:alnum:]_]|$)/)) % 2'
grep -P '^((?=.*\bfoo\b)(?!.*\bbar\b)|(?=.*\bbar\b)(?!.*\bfoo\b))'

For sed that becomes a bit complicated unless you have a sed implementation like GNU sed that supports \</\> as word boundaries like GNU awk does.

edited Feb 06 '18 at 20:32

answered Feb 06 '18 at 17:48

Stéphane Chazelas

544,893

6

Stephane, please write a book about shell scripting! – pfnuesel Feb 06 '18 at 18:04
Sorry I only started command line a few weeks ago. How would I force it to only search for words? I tried -Pw and -wP but this gave me the wrong output. I also tried to use ' ' between word1/word2 and around word1/word2. – Lukali Feb 06 '18 at 18:18
@Lukali, see edit. – Stéphane Chazelas Feb 06 '18 at 20:33

score 10 · Accepted Answer · answered Feb 06 '18 at 21:23

grep 'word1\|word2' text.txt searches for lines containing word1 or word2. This includes lines that contain both.

grep word1 text.txt | grep word2 searches for lines containing word1 and word2. The two words can overlap (e.g. foobar contains foo and ob). Another way to search for lines containing both words, but only in a non-overlapping way, is to search for them in either order: grep 'word1.*word2\|word2.*word1' text.txt

grep word1 text.txt | grep -v word2 searches for lines containing word1 but not word2. The -v option tells grep to keep non-matching lines and remove matching lines, instead of the opposite. This gives you half the results you wanted. By adding the symmetric search, you get all the lines containing exactly one of the words.

grep word1 text.txt | grep -v word2
grep word2 text.txt | grep -v word1

Alternatively, you can start from the lines containing either word, and remove the lines containing both words. Given the building blocks above, this is easy if the words don't overlap.

grep 'word1\|word2' text.txt | grep -v 'word1.*word2\|word2.*word1'

Thank you this is exactly what I was looking for. The other answers are also very interesting so ill look into them. Thank you everyone for contributing. — Lukali, Feb 07 '18 at 14:56

score 2 · Answer 3 · answered Feb 06 '18 at 22:08

A bash solution:

#!/bin/bash 
while (( $# )); do
    a=0 ; [[ $1 =~ foo ]] && a=1 
    b=0 ; [[ $1 =~ bar ]] && b=1
    (( a ^ b )) && echo "$1"
    shift
done

To test it:

$ ./script {foo,bar}\ {foo,bar} neither
foo foo
bar bar

How to use grep to search for a line with one of two words but not both?

3 Answers3