13

I want to search for lines with 'word1' XOR 'word2' in a text file. So it should output lines with word1, word2 but not the lines with both of these words. I wanted to use the XOR but I do not know how to write that in linux command line.

I tried:

grep 'word1\|word2' text.txt
grep word1 word2 text.txt
grep word1 text.txt | grep word2
grep 'word1\^word2' text.txt

and many more, but could not get sucess.

αғsнιη
  • 41,407
Lukali
  • 243

3 Answers3

17

With GNU awk:

$ printf '%s\n' {foo,bar}{bar,foo} neither | gawk 'xor(/foo/,/bar/)'
foofoo
barbar

Or portably:

awk '((/foo/) + (/bar/)) % 2'

With a grep with support for -P (PCRE):

grep -P '^((?=.*foo)(?!.*bar)|(?=.*bar)(?!.*foo))'

With sed:

sed '
  /foo/{
    /bar/d
    b
  }
  /bar/!d'

If you want to consider whole words only (that there is neither foo nor bar in foobar or barbar for instance), you'd need to decide how those words are delimited. If it's by any character other than letters, digits and underscore like the -w option of many grep implementation does, then you'd change those to:

gawk 'xor(/\<foo\>/,/\<bar\>/)'
awk '((/(^|[^[:alnum:]_)foo([^[:alnum:]_]|$)/) + \
      (/(^|[^[:alnum:]_)bar([^[:alnum:]_]|$)/)) % 2'
grep -P '^((?=.*\bfoo\b)(?!.*\bbar\b)|(?=.*\bbar\b)(?!.*\bfoo\b))'

For sed that becomes a bit complicated unless you have a sed implementation like GNU sed that supports \</\> as word boundaries like GNU awk does.

10

grep 'word1\|word2' text.txt searches for lines containing word1 or word2. This includes lines that contain both.

grep word1 text.txt | grep word2 searches for lines containing word1 and word2. The two words can overlap (e.g. foobar contains foo and ob). Another way to search for lines containing both words, but only in a non-overlapping way, is to search for them in either order: grep 'word1.*word2\|word2.*word1' text.txt

grep word1 text.txt | grep -v word2 searches for lines containing word1 but not word2. The -v option tells grep to keep non-matching lines and remove matching lines, instead of the opposite. This gives you half the results you wanted. By adding the symmetric search, you get all the lines containing exactly one of the words.

grep word1 text.txt | grep -v word2
grep word2 text.txt | grep -v word1

Alternatively, you can start from the lines containing either word, and remove the lines containing both words. Given the building blocks above, this is easy if the words don't overlap.

grep 'word1\|word2' text.txt | grep -v 'word1.*word2\|word2.*word1'
  • Thank you this is exactly what I was looking for. The other answers are also very interesting so ill look into them. Thank you everyone for contributing. – Lukali Feb 07 '18 at 14:56
2

A bash solution:

#!/bin/bash 
while (( $# )); do
    a=0 ; [[ $1 =~ foo ]] && a=1 
    b=0 ; [[ $1 =~ bar ]] && b=1
    (( a ^ b )) && echo "$1"
    shift
done

To test it:

$ ./script {foo,bar}\ {foo,bar} neither
foo foo
bar bar