Counting matches with grep, when multiple matches per line are possible

Question

The standard usage of grep is to return lines that match a pattern.

If a line can contain several matches of the pattern, how can I count each match individually, not the total number of matches?

Does this answer your question? Count total number of occurrences using grep — G-Man Says 'Reinstate Monica', Oct 12 '23 at 08:23
Also similar: Counting occurrences of [a] word in [a] text file. — G-Man Says 'Reinstate Monica', Oct 12 '23 at 08:23
Note that despite OP's confusing edits, this question is asking the same thing: "what will isolate each match on a line of its own" is exactly what is done in "grep's -o will only output the matches" and is what's in the accepted answer as well. — muru, Nov 06 '23 at 08:03
And now OP has edited the question yet again to asking something else altogether. Please don't drastically change questions that have been answered multiple times. — muru, Nov 07 '23 at 09:16
Please don't change your question after receiving answers. You can clarify, but not completely change so that the answers are no longer relevant. I have rolled back your edits to the last version that seems to match the answers you have been given. If you have more questions, please ask them separately, as new questions. — terdon, Nov 07 '23 at 09:28

Kusalananda · Accepted Answer · 2023-10-06T18:20:03.580

4

The grep command has a -c option that counts the number of lines matched by a pattern. Since the standard usage of grep is to return lines that match a pattern, this solves the task "count the number of matches".

If a line can contain several matches of the pattern, you may use grep with its non-standard -o option if you want to count each match individually. This isolates each match on a line of its own. You may then count the number of matches by passing the result through wc -l. This uses wc to do the actual counting, not grep. However, you could cheat and use grep -c . in place of wc -l to count the number of non-empty lines returned from the first grep. Since that is a bit of a hack, and since wc -l does literally what we want, we'll use wc in the examples below.

See the manuals for grep and wc on your system.

Example: The number of lines matching the pattern G in file:

$ grep -c -e G file
7

Example: The number of matches in the same file, but counting each match individually:

$ grep -o -e G file | wc -l
      18

edited Oct 06 '23 at 18:20

answered Oct 06 '23 at 18:13

Kusalananda

333,661

1

Beware grep -o only prints the non-empty matches. For instance seq 10 | grep -c '^' prints 10 but seq 10 | grep -o '^' | wc -l prints 0. – Stéphane Chazelas Oct 06 '23 at 21:17
2

There's also the usual question of whether overlapping matches should be counted (like if there are one or two occurrences of 99 in 999 or of aba in ababa). – Stéphane Chazelas Oct 06 '23 at 21:23
1

A perhaps less trivial example: seq 10 | grep -c '7*' prints 10, but seq 10 | grep -o '7*' | wc -l prints 1. – G-Man Says 'Reinstate Monica' Oct 14 '23 at 06:28

Prabhjot Singh · Answer 2 · 2023-11-02T15:06:56.850

1

Using awk:

$ awk '{a += gsub(/pat/,"&"); } END{print a}' file

Or

$ awk '{for(i=1;i<=NF;i++)if ($i ~ /pat/)  ++a}END{print a}'

The command is slightly changed for overlapping matching taken from this answer.

$ echo abababa | awk '{ while (a=index($0,"aba"))  {++count; $0=substr($0,a+1)}}END{print count}'

edited Nov 02 '23 at 15:06

answered Oct 23 '23 at 11:38

Prabhjot Singh

1,925

score 0 · Answer 3 · answered Oct 12 '23 at 08:18

With perl, you could do:

perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='perl regex'

That has the advantage of also counting empty matches such as:

$ seq 10 perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='\b'
20

(20 word boundaries in the contents of the lines of the output of seq 10).

With perl regexps, you can also handle some cases of overlapping matches by using look-around operators:

$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='aba'
2

$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='(?=aba)'
3

Which instead of matching on occurrences of aba, matches on the positions within the line where aba can be seen ahead.

Counting matches with grep, when multiple matches per line are possible

3 Answers3