2

I have a file: file.txt with the following content:

I am groot, groot me, me groot,I love groot, groot groot, am I groot groot so

I wanted to count all the words individually.

I used command for single word count:

tr ' ' '\n' < file.txt | grep "groot" | wc -l

But I wanted to know is there any way to count all the words once? The desired output would look like this:

word        count
I           4
am          3
groot       8
me          2

Can it be done using some bash file or bash script? Please help.

AdminBee
  • 22,803

2 Answers2

3
grep -o '\w\+' file.txt | sort | uniq -c

Explanation:

  • grep -o will output each match on a separate line.
  • \w\+ matches any run of consecutive alphanumeric characters and _.
  • uniq -c will output a count of occurrences for each consecutive run of repeated lines.
  • The sort before uniq -c is necessary in order to group each unique word in a single consecutive run of repeated lines.
Amir
  • 1,651
2

Using only standard tools:

$ tr -sc '[:alpha:]' '\n' <file | sort | uniq -c
   3 I
   2 am
   8 groot
   1 love
   2 me
   1 so

This first replaces every non-word character with a newline character. We define a non-word character as "any character that is not an alphabetic character" (this is what -c together with [:alpha:] and \n does on the tr command line). Any run of more than one consecutive newline resulting from this is compressed down to a single newline (this is what -s does on the tr command line).

The generated words (there will be one word per line) are then sorted with sort and the number of times each word occurs is then counted.

The sort | uniq -c part of the pipeline could be made slightly more time efficient with a single awk program:

$ tr -sc '[:alpha:]' '\n' <file | awk '{ count[$0]++ } END { for (word in count) print count[word], word }'
1 love
8 groot
2 am
3 I
1 so
2 me

The awk code simply uses each word read from tr as a key into the count associative array, and increments the associated value each time the word is seen. At the end, the code prints the counts for all words.

Kusalananda
  • 333,661