I want to find the number of the lines that have both words/patterns "gene" and "+" in them. Is this possible to do this with grep?
1 Answers
Yes, you can do this with grep
:
grep -c 'gene.*+' file
That will look for lines where the word gene
appears first and as a separate word (the \b
means "word-break") and then, on the same line, you also have +
as a separate word. The -c
flag tells grep
to print the number of matching lines. If you also need to find cases where the +
comes before gene
, you can do:
grep -Ec '(gene.*\+)|(\+.*gene)' file
This, however, will also match things like Eugene+Mary came for dinner
which is probably not what you want. Given the words you are looking for, I am guessing that you are looking at gff/gtf files, so you might want to do something more sophisticated and only look for gene
in the third field of each line and +
in the seventh, on lines that don't start with a #
(the gff headers). If this is indeed what you need, you can do:
awk -F"\t" '!/^#/ && $3=="gene" && $7=="+"{c++}END{print c}'

- 242,166
-
For the Eugene case and grep, we can use word boundary markers:
grep -Ec '(\<gene\>.*\+)|(\+.*\<gene\>)' file
– glenn jackman Oct 15 '20 at 17:55
gene
always occur before+
on the lines that you are interested in? Would the basic regular expressiongene.*+
be enough? Do you need to filter out lines that contain words likegenes
orthegene
(i.e. wheregene
is just a substring and not its own word)? Can you show some example data? – Kusalananda Oct 15 '20 at 14:41grep gene | grep +
. That is a kind of and operator. You also need to consider all the question Kusalananda is asking. – nobody Oct 15 '20 at 15:20wc
should be used at the end to count the linesgrep gene | grep + | wc -l
. – nobody Oct 15 '20 at 15:28grep -c +
to count matching lines – steeldriver Oct 15 '20 at 15:36wc
, you can usegrep -c
. – terdon Oct 15 '20 at 15:48