group and count by a regex

Question

I have dozens of values in a file such as

(1608926678.237962) vcan0 123#0000000158
(1608926678.251533) vcan0 456#0000000186

I want to count how many of each there are based on the numbers before the hash symbol (can include it also)

I have tried to following but keep getting zero

 grep -o '\b\d+#\b' ./file.log | wc -l

Any ideas? For the above example I would want:

123# 1
456# 1

Neither \d nor the + qualifier are supported by BRE grep - see for example Why does my regular expression work in X but not in Y? — steeldriver, Dec 21 '20 at 19:40

jesse_b · Answer 1 · 2020-12-21T19:36:27.527

4

It's not exactly the output you described but if that is really a hard requirement it can be massaged to that format but:

awk -F'[ #]' '{print $3}' input | sort -n | uniq -c

The awk command will extract your number before # and then pass it to sort/uniq. uniq -c will provide a count of each value.

To get your output format:

awk -F'[ #]' '{print $3}' input | sort -n | uniq -c | awk '{print $2"#",$1}'

edited Dec 21 '20 at 19:36

answered Dec 21 '20 at 19:31

jesse_b

37,005

score 4 · Answer 2 · answered Dec 21 '20 at 19:33

4

grep + Bash:

$ grep -Eo '\b[0-9]+#\b' ./file.log  | sort | uniq -c  | while read -r a b; do echo "$b" "$a"; done
123# 1
456# 1

answered Dec 21 '20 at 19:33

Arkadiusz Drabczyk

25,539

1

That while loop is just awk '{print $2, $1}, and I’m sure there are options with other tools. Why write a loop you don’t need? – D. Ben Knoble Dec 22 '20 at 14:24
First, you forgot ' and second - why not? There are many ways to do what OP requested. This solution uses Bash, other solutions use awk which was added to the list of tags after OP asked the question - see https://unix.stackexchange.com/posts/625570/revisions – Arkadiusz Drabczyk Dec 22 '20 at 14:27
Well, if we’re nit-picking to that level, your answer isn’t grep + bash either, since you use sort and uniq. I am in favor of not using a while-read loop in bash where possible—they tend to be slower than the equivalent approach using a dedicated tool. And since you already used a few other tools, as mentioned, there’s no harm in throwing another (awk) into the mix for the field re-writing. – D. Ben Knoble Dec 22 '20 at 14:29
Yes, I'm the one who's nit-picking :) Have a nice day. – Arkadiusz Drabczyk Dec 22 '20 at 14:32
1

The while loop is much slower than an awk equivalent, but more importantly, I don't understand why you would want it. What does it offer that unic -c doesn't do already? If you just want to change 1 123# to 123# 1, then using a shell loop is probably the most inefficient and slow way of doing it, so it seems like an odd choice. – terdon Dec 22 '20 at 15:23
As for "why not?" - see why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Ed Morton Dec 22 '20 at 15:25

αғsнιη · Answer 3 · 2020-12-21T20:37:32.013

With GNU awk:

awk -v FPAT=' [0-9]+#' '{ c[$1]++; }; END{ for(x in c) print x, c[x]; }' infile
 123# 1
 456# 1

Assuming there is always one pattern " [0-9]+#" matched per line as shown in your given sample input;

to filtering out the whitespaces from the result and also during processing for a input like:

(1608926678.237962) vcan0        123#0000000158
(1608926678.251533) vcan0 456#0000000186
(1608926678.237962) vcan0    123#0000000158
(1608926678.251533) vcan0 456#0000000186
(1608926678.237962) vcan0      123#0000000158
(1608926678.251533) vcan0                       456#0000000186
(1608926678.237962) vcan0 123#0000000158

awk -v FPAT='[ \t][0-9]+#' '{
    filter=$1; sub(/[ \t]/, "", filter);
    c[filter]++;
};
END{ for(x in c) print x, c[x]; }' infile
456# 3
123# 4

for a input having multiple matched pattern " [0-9]+#" in each or every lines, you would do:

awk -v FPAT='[ \t][0-9]+#' '{
    for (i=1; i<=NF; i++){ 
        filter=$i; sub(/[ \t]/, "", filter); c[filter]++;
    };
};
END{ for(x in c) print x, c[x]; }' infile

score 2 · Answer 4 · answered Dec 21 '20 at 20:57

2

With any awk in any shell on every Unix box:

$ awk -F'[ #]' '{cnt[$3]++} END{for (val in cnt) print val"#", cnt[val]}' file
123# 1
456# 1

answered Dec 21 '20 at 20:57

Ed Morton

31,617

score 0 · Answer 5 · answered Dec 22 '20 at 18:38

0

awk '{for(i=1;i<=NF;i++){if($i ~ /#/){print $i}}}' filename| awk -F "#" '{print $1"#",gsub($1,$0)}'

output

123# 1
456# 1

answered Dec 22 '20 at 18:38

Praveen Kumar BS

5,211

group and count by a regex

5 Answers5