-4

The standard usage of grep is to return lines that match a pattern.

If a line can contain several matches of the pattern, how can I count each match individually, not the total number of matches?

terdon
  • 242,166

3 Answers3

4

The grep command has a -c option that counts the number of lines matched by a pattern. Since the standard usage of grep is to return lines that match a pattern, this solves the task "count the number of matches".

If a line can contain several matches of the pattern, you may use grep with its non-standard -o option if you want to count each match individually. This isolates each match on a line of its own. You may then count the number of matches by passing the result through wc -l. This uses wc to do the actual counting, not grep. However, you could cheat and use grep -c . in place of wc -l to count the number of non-empty lines returned from the first grep. Since that is a bit of a hack, and since wc -l does literally what we want, we'll use wc in the examples below.

See the manuals for grep and wc on your system.

Example: The number of lines matching the pattern G in file:

$ grep -c -e G file
7

Example: The number of matches in the same file, but counting each match individually:

$ grep -o -e G file | wc -l
      18
Kusalananda
  • 333,661
1

Using awk:

$ awk '{a += gsub(/pat/,"&"); } END{print a}' file

Or

$ awk '{for(i=1;i<=NF;i++)if ($i ~ /pat/)  ++a}END{print a}'

The command is slightly changed for overlapping matching taken from this answer.

$ echo abababa | awk '{ while (a=index($0,"aba"))  {++count; $0=substr($0,a+1)}}END{print count}'
0

With perl, you could do:

perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='perl regex'

That has the advantage of also counting empty matches such as:

$ seq 10 perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='\b'
20

(20 word boundaries in the contents of the lines of the output of seq 10).

With perl regexps, you can also handle some cases of overlapping matches by using look-around operators:

$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='aba'
2
$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='(?=aba)'
3

Which instead of matching on occurrences of aba, matches on the positions within the line where aba can be seen ahead.