Using only standard tools:
$ tr -sc '[:alpha:]' '\n' <file | sort | uniq -c
3 I
2 am
8 groot
1 love
2 me
1 so
This first replaces every non-word character with a newline character. We define a non-word character as "any character that is not an alphabetic character" (this is what -c
together with [:alpha:]
and \n
does on the tr
command line). Any run of more than one consecutive newline resulting from this is compressed down to a single newline (this is what -s
does on the tr
command line).
The generated words (there will be one word per line) are then sorted with sort
and the number of times each word occurs is then counted.
The sort | uniq -c
part of the pipeline could be made slightly more time efficient with a single awk
program:
$ tr -sc '[:alpha:]' '\n' <file | awk '{ count[$0]++ } END { for (word in count) print count[word], word }'
1 love
8 groot
2 am
3 I
1 so
2 me
The awk
code simply uses each word read from tr
as a key into the count
associative array, and increments the associated value each time the word is seen. At the end, the code prints the counts for all words.