407

grep -c is useful for finding how many times a string occurs in a file, but it only counts each occurence once per line. How to count multiple occurences per line?

I'm looking for something more elegant than:

perl -e '$_ = <>; print scalar ( () = m/needle/g ), "\n"'
030
  • 1,557

7 Answers7

586

grep's -o will only output the matches, ignoring lines; wc can count them:

grep -o 'needle' file | wc -l

This will also match 'needles' or 'multineedle'.

To match only single words use one of the following commands:

grep -ow 'needle' file | wc -l
grep -o '\bneedle\b' file | wc -l
grep -o '\<needle\>' file | wc -l
wag
  • 35,944
  • 12
  • 67
  • 51
  • 9
    Note that this requires GNU grep (Linux, Cygwin, FreeBSD, OSX). – Gilles 'SO- stop being evil' May 15 '11 at 14:37
  • @wag What magic does \b and \B do here? – Geek Jun 12 '14 at 08:36
  • 10
    @Geek \b matches a word boundary, \B matches NOT a word boundary. The answer above would be more correct if it used \b at both ends. – Liam Sep 25 '15 at 21:02
  • 1
    For a count of occurrences per line, combine with grep -n option and uniq -c ... grep -no '<needle>' file | uniq -c – jameswarren Oct 07 '16 at 13:56
  • @jameswarren uniq only removes adjacent identical lines, you need to sort before feeding to uniq if you are not already sure that duplicates will always be immediately adjacent. – tripleee Nov 03 '16 at 12:21
  • how to find the occurences for multiple words seperately? – ZhaoGang Sep 26 '18 at 03:11
  • 1
    Doesn't seem to work on WSL, it report a smaller number of occurences on large files. grep 'needle' file -c works in my case – quent Sep 13 '21 at 07:40
  • @tripleee For efficiency, use sort -u rather than sort | uniq. Here, sort is not necessary since matches from the same line in the source will be consecutive lines in the output. – Jivan Pal May 17 '22 at 16:44
  • 2
    @JivanPal This was in the context of uniq -c, which sort cannot do. Of course, if you know identical lines will always be adjacent, you don't need sort at all, which they will be if your pattern is just a static string, but not in the general case. – tripleee May 17 '22 at 17:06
23

If you have GNU grep (always on Linux and Cygwin, occasionally elsewhere), you can count the output lines from grep -o: grep -o needle | wc -l.

With Perl, here are a few ways I find more elegant than yours (even after it's fixed).

perl -lne 'END {print $c} map ++$c, /needle/g'
perl -lne 'END {print $c} $c += s/needle//g'
perl -lne 'END {print $c} ++$c while /needle/g'

With only POSIX tools, one approach, if possible, is to split the input into lines with a single match before passing it to grep. For example, if you're looking for whole words, then first turn every non-word character into a newline.

# equivalent to grep -ow 'needle' | wc -l
tr -c '[:alnum:]' '[\n*]' | grep -c '^needle$'

Otherwise, there's no standard command to do this particular bit of text processing, so you need to turn to sed (if you're a masochist) or awk.

awk '{while (match($0, /set/)) {++c; $0=substr($0, RSTART+RLENGTH)}}
     END {print c}'
sed -n -e 's/set/\n&\n/g' -e 's/^/\n/' -e 's/$/\n/' \
       -e 's/\n[^\n]*\n/\n/g' -e 's/^\n//' -e 's/\n$//' \
       -e '/./p' | wc -l

Here's a simpler solution using sed and grep, which works for strings or even by-the-book regular expressions but fails in a few corner cases with anchored patterns (e.g. it finds two occurrences of ^needle or \bneedle in needleneedle).

sed 's/needle/\n&\n/g' | grep -cx 'needle'

Note that in the sed substitutions above, I used \n to mean a newline. This is standard in the pattern part, but in the replacement text, for portability, substitute backslash-newline for \n.

7

If, like me, you actually wanted "both; each exactly once", (this is actually "either; twice") then it's simple:

grep -E "thing1|thing2" -c

and check for the output 2.

The benefit of this approach (if exactly once is what you want) is that it scales easily.

OJFord
  • 1,963
  • I'm not sure you're actually checking it's only appearing once? All you're looking for there is that either one of those words exist at least once. – Steve Gore Jul 11 '18 at 02:29
  • 1
    This should be the accepted answer. No need to use wc -l, grep has a built-in option to count things, and it is even named as obvious as -c for “count”! – rugk Aug 06 '20 at 20:03
  • 6
    @rugk You completely missed the first sentence in OP's post, which explicitly explains that -c only counts one occurrence per line. If a string occurs 1000 times on the same line, grep -c will still only count it as one. This answer makes no sense at all for this question. – Alexia Luna Aug 06 '21 at 21:52
  • The whole point of the question is exactly that the -c option does not work. – Hi-Angel Nov 04 '23 at 13:58
4

Another solution using awk and needle as field separator:

awk -F'^needle | needle | needle$' '{c+=NF-1}END{print c}'

If you want to match needle followed by punctuation, change the field separator accordingly i.e.

awk -F'^needle[ ,.?]|[ ,.?]needle[ ,.?]|[ ,.?]needle$' '{c+=NF-1}END{print c}'

Or use the class: [^[:alnum:]] to encompass all non alpha characters.

ripat
  • 141
3

I had a need to do this but for more than one search term. And I wanted them to be listed in columns with the number of occurrences of each.

My bash-only, one-liner, solution is as follows:

grep -o -E 'borp|flarb' flarb.log  | sort | uniq -c
 910 borp
9090 flarb
JDS
  • 191
1

This is my pure bash solution

#!/bin/bash

B=$(for i in $(cat /tmp/a | sort -u); do
echo "$(grep $i /tmp/a | wc -l) $i"
done)

echo "$B" | sort --reverse
Felipe
  • 19
1

Your example only prints out the number of occurrences per-line, and not the total in the file. If that's what you want, something like this might work:

perl -nle '$c+=scalar(()=m/needle/g);END{print $c}' 
jsbillings
  • 24,406
  • You are right -- my example only counts the occurences in the first line. –  Feb 06 '11 at 15:49