The standard usage of grep
is to return lines that match a pattern.
If a line can contain several matches of the pattern, how can I count each match individually, not the total number of matches?
The standard usage of grep
is to return lines that match a pattern.
If a line can contain several matches of the pattern, how can I count each match individually, not the total number of matches?
The grep
command has a -c
option that counts the number of lines matched by a pattern. Since the standard usage of grep
is to return lines that match a pattern, this solves the task "count the number of matches".
If a line can contain several matches of the pattern, you may use grep
with its non-standard -o
option if you want to count each match individually. This isolates each match on a line of its own.
You may then count the number of matches by passing the result through wc -l
. This uses wc
to do the actual counting, not grep
. However, you could cheat and use grep -c .
in place of wc -l
to count the number of non-empty lines returned from the first grep
. Since that is a bit of a hack, and since wc -l
does literally what we want, we'll use wc
in the examples below.
See the manuals for grep
and wc
on your system.
Example: The number of lines matching the pattern G
in file
:
$ grep -c -e G file
7
Example: The number of matches in the same file, but counting each match individually:
$ grep -o -e G file | wc -l
18
grep -o
only prints the non-empty matches. For instance seq 10 | grep -c '^'
prints 10 but seq 10 | grep -o '^' | wc -l
prints 0.
– Stéphane Chazelas
Oct 06 '23 at 21:17
seq 10 | grep -c '7*'
prints 10, but seq 10 | grep -o '7*' | wc -l
prints 1.
– G-Man Says 'Reinstate Monica'
Oct 14 '23 at 06:28
Using awk
:
$ awk '{a += gsub(/pat/,"&"); } END{print a}' file
Or
$ awk '{for(i=1;i<=NF;i++)if ($i ~ /pat/) ++a}END{print a}'
The command is slightly changed for overlapping matching taken from this answer.
$ echo abababa | awk '{ while (a=index($0,"aba")) {++count; $0=substr($0,a+1)}}END{print count}'
With perl
, you could do:
perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='perl regex'
That has the advantage of also counting empty matches such as:
$ seq 10 perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='\b'
20
(20 word boundaries in the contents of the lines of the output of seq 10
).
With perl
regexps, you can also handle some cases of overlapping matches by using look-around operators:
$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='aba'
2
$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='(?=aba)'
3
Which instead of matching on occurrences of aba
, matches on the positions within the line where aba
can be seen ahead.