Counting consecutive repetitions of a specific character (e.g. ,)

Question

Say I have a file with lines like the two following:

foo,bar,,baz,xy
foo,,bar,baz,xy,,

I would like to count how many times I have ,, (two consecutive commas surrounded by any other character) in each line.

My approach so far: I thought I could first get rid of everything but any pair of consecutive commas on each line, and then replace those two commas by a single comma so that I can count them later. How can I do this?

The answer above should be (if we substitute the double comma by a single comma throwing everything else away):

,
,,

or simply:

1
2

Note that double commas in the middle of a line denotes one empty field, but double commas at the beginning or the end denotes two empty fields. What would you like to do at the beginning or end of a line? How many non-empty fields do you want to see on each line? — glenn jackman, Feb 13 '14 at 01:55
Thank you @glennjackman - That's a great observation. Counting empty fields is precisely my goal - This question is probably a victim of the XY Problem. — Amelio Vazquez-Reina, Feb 13 '14 at 13:27

score 5 · Accepted Answer · edited Apr 13 '17 at 12:36

A Perl one-liner for the job:

perl -nle 'print s/(^|[^,]),,([^,]|$)/$&/g' your_file

Or, even shorter, with awk:

 awk -F',,' '{print NF-1}' your_file

The awk one would consider ,,,, to be two occurrences of ,,, while the perl one would not count it at all. Choose the one that suits your use case.

Update

From your comment, it seems that your original intent was to count the number of empty fields on each line. If that's the case, this Perl one-liner should help (it assumes that there are no quoted fields containing commas):

perl -nle 'print scalar grep {//} split/,/' your_file

The same in awk if Perl is not available:

awk -F, 'empty=0;{for(i=1;i<=NF;i++)if($i=="")empty++};{print empty}' your_file

Counting consecutive repetitions of a specific character (e.g. ,)

1 Answers1