Get count/histogram of occurrences of each word in document

Question

How can I find count of every word in a file?

I want a histogram of each word in text pipe or document.

I was already able to split up document into a list of words; so each word is on a new line. If you can get it working directly from text document then a solution from there is fine too.

> cat doc.txt 
word
second
third
word
really
> cat doc.txt | ... # then count occurrences of each word \
                      and print in descending order separated by delimiter
word 2
really 1
second 1
third 1

It needs to be somewhat efficient as file is 1GB text and cannot work with exponential time load.

score 5 · Accepted Answer · answered Aug 04 '20 at 13:35

5

Here's one way:

$ sort file | uniq -c | sort -nrk1 | awk '{print $2,$1}'
word 2
third 1
second 1
really 1

answered Aug 04 '20 at 13:35

terdon

242,166

Get count/histogram of occurrences of each word in document

1 Answers1