Best way to group & count lines on stdin?

Question

Since time immemorial I have used ... | sort | uniq -c | sort -nr to group & count input lines, to count how many of each line there is in the input.

Is there any better way? Have I just picked up a bad habit? Is there a better way using standard unix tools that will be installed on Ubuntu Linux 18.04+ (or things that are an apt-get away?)

Relating https://unix.stackexchange.com/q/170043/117549 and https://unix.stackexchange.com/q/452569/117549 and https://unix.stackexchange.com/q/41479/117549 — Jeff Schaller, Aug 14 '20 at 13:49

score 0 · Answer 1 · answered Dec 02 '23 at 11:31

I think you have the standard, obvious *nix way there. It is a perfectly good and reasonable approach:

$ printf 'aa\nbb\ncc\ndd\naa\ncc\n' | sort | uniq -c | sort -nr
      2 cc
      2 aa
      1 dd
      1 bb

Sure, you could use a little script instead, and so use only one command. for example, with gawk:

$ printf 'aa\nbb\ncc\ndd\naa\ncc\n' | 
  gawk '
    BEGIN{ PROCINFO["sorted_in"] = "@val_num_desc" }
   { count[$0]++ } 
   END{ for(line in count){ print count[line],line}}'
2 cc
2 aa
1 dd
1 bb

Or, perl:

$ printf 'aa\nbb\ncc\ndd\naa\ncc\n' | 
   perl -lne '$k{$_}++ }{ for $i (sort { $k{$b} <=> $k{$a} } keys %k ){print "$k{$i} $i"}'
2 aa
2 cc
1 bb
1 dd

But that's just reinventing the wheel. Plus, both scripts recure loading all input in memory which can be an issue when dealing with large amounts of data. So just stick with what you are doing. It is a fine solution, probably the most efficient one there is.

Best way to group & count lines on stdin?

1 Answers1