With grep
implementations that support perl-like regular expressions (like pcregrep
or GNU or ast-open grep -P
), you can do it in one grep
invocation with:
grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'
That is find the lines that match pat1
but not pat2
, or pat2
but not pat1
.
(?=...)
and (?!...)
are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^
) provided it's followed by .*pat1
and not followed by .*pat2
, or the same with pat1
and pat2
reversed.
That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:
grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'
(?(1)yespattern|nopattern)
matches against yespattern
if the 1
st capture group (empty ()
above) matched, and nopattern
otherwise. If that ()
matches, that means pat1
didn't match, so we look for pat2
(positive look ahead), and we look for not pat2
otherwise (negative look ahead).
With sed
, you could write it:
sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'
[a-z][a-z0-9]\(,7\}\(\.[a-z0-9]\{,3\}\)+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences? – G-Man Says 'Reinstate Monica' Jan 31 '19 at 04:05