Count number of occurrences of a pattern on same line

Question

I need to solve this in a shell script. I am counting number of occurrence of the abc string below and I want to get the answer as 3.

echo abcsdabcsdabc | grep -o abc  
abc  
abc  
abc

Assuming we do not have the -o option in grep, how do we approach this then?

If your input string is known to contain no newlines, you can just feed the output to wc echo abcsdabcsdabc | grep -o abc| wc -l, to count the lines. — zeppelin, Mar 20 '19 at 09:30

Stéphane Chazelas · Answer 1 · 2022-05-08T09:12:34.323

With awk:

awk -- 'BEGIN{print gsub(ARGV[2], "&", ARGV[1])}' abcsdabcsdabc abc

Note that the pattern (here abc) is taken by awk as an extended regular expression (similar to those supported by grep -E/egrep).

That syntax allows both the subject and regexp to contain multiple lines. We also avoid the usual problems associated with echo which can't output arbitrary data.

To use perl regular expressions (similar to GNU grep -P's):

perl -le 'print scalar (() = $ARGV[0] =~ m{$ARGV[1]}g)' -- abcsdabcsdabc abc

(note however that the arguments are not interpreted as text as per the locale's encoding. For instance in a UTF-8 locale, with é and . as arguments, it would report 2 (bytes) instead of 1 (character)).

With zsh, you can do:

occurrences() {
  set -o localoptions -o extendedglob
local n=0
  : ${1//(#m)$2/$((++n))}
  echo $n
}
occurrences abcsdabcsdabc abc

Here, the second argument (abc) is interpreted as a fixed string; replace $2 with $~2 for it to be interpreted as an extended zsh glob pattern instead (with a wider feature set than extended regexps, but a different syntax).

Kusalananda · Answer 2 · 2019-07-23T12:22:47.940

4

Treating the string as consisting of fields that are delimited by abc:

$ echo abcsdabcsdabc | awk -F 'abc' '{ print (length > 0 ? NF - 1 : 0) }'
3

The number of occurrences of the delimiter abc is 1 minus the number of fields that it delimits.

$ echo abcsdabcsdabc | awk '{ n=0; while (sub("abc", "xxx")) n++; print n }'
3

This replaces the substring abc from the line with xxx and counts the number of times this is done, then outputs that number. The n=0 is not needed if there is only one line of input.

The gsub() function in awk returns the number of substitutions made, so the above could be simplified into

$ echo abcsdabcsdabc | awk '{ print gsub("abc", "xxx") }'
3

In bash, you can do the same thing as in that awk program that uses sub():

string=abcsdabcsdabc

n=0
while [[ $string == *abc* ]]; do
    n=$(( n+1 ))
    string=${string/abc/xxx}  # replace first occurrence of "abc" with "xxx"
done
printf '%d\n' "$n"

This uses a while loop to replace the substring abc from the value in $string with xxx until no further occurrences of abc is found in $string, just as the second awk program above does.

edited Jul 23 '19 at 12:22

answered Mar 20 '19 at 09:23

Kusalananda

333,661

Hi , Can u explain me on bash piece of code. not able to get how n will get value assigned to itself when we are just dealing with single line of input. Please help to explain line wise – Machine Mar 20 '19 at 09:38
1

@Machine I've explained it further now. The loop is looping until no abc is found in $string. – Kusalananda Mar 20 '19 at 09:45
string=${string/abc/} -- I never used this , so i believe this is inbuilt feature in unix.If u can elaborate more here – Machine Mar 20 '19 at 10:34
1

@Machine This is a bash-specific variable substitution that replaces the first occurrence abc in $string with nothing. The general form is ${variable/pattern/word} which replaces the first bit that matches pattern in $variable with word. Using ${variable//pattern/word} replaces all matches. This is described in the bash manual. It is a feature of the shell, not of Unix. – Kusalananda Mar 20 '19 at 10:44
Note that the first one returns -1 for empty lines. – Stéphane Chazelas Jul 23 '19 at 11:20
The second and last ones report 2 on a ababcc input. – Stéphane Chazelas Jul 23 '19 at 11:21
${variable/pattern/word} is not bash-specific. It comes from ksh93 and is also supported by mksh, zsh, yash, busybox sh at least. – Stéphane Chazelas Jul 23 '19 at 11:32
@StéphaneChazelas All your comments are correct. Now fixed. Thanks! – Kusalananda Jul 23 '19 at 11:33

jubilatious1 · Answer 3 · 2020-09-12T21:33:45.980

0

using Raku (formerly known as Perl_6)

~$ echo "abcsdabcsdabc" | raku -ne '.match("abc", :global).say;'
(｢abc｣ ｢abc｣ ｢abc｣)

Above gives you the matches (line-by-line). Below gives you the number of matches (line-by-line):

~$ echo "abcsdabcsdabc" | raku -ne '.match("abc", :global).elems.say;'
3

Note: the :global argument can be shortened to :g.

HTH.

https://raku.org/

edited Sep 12 '20 at 21:33

answered Sep 12 '20 at 21:23

jubilatious1

3,195
8
17

score -1 · Answer 4 · edited May 08 '22 at 09:11

-1

Using GNU sed

echo abcsdabcsdabc | sed 's/abc/abc\n/g' | wc -w

edited May 08 '22 at 09:11

Kusalananda

333,661

answered May 08 '22 at 08:58

Maulik Madhavi

1

This gives the right answer for the example, but so does echo 3. This fails for a *lot* of cases, like “xyz” and “The quick brown abc jumps over the lazy abc.” – G-Man Says 'Reinstate Monica' May 08 '22 at 15:32

Count number of occurrences of a pattern on same line

4 Answers4

Linked

Related