I wanted to count all the words exist in a text file and its occerence saperately

Question

I have a file: file.txt with the following content:

I am groot, groot me, me groot,I love groot, groot groot, am I groot groot so

I wanted to count all the words individually.

I used command for single word count:

tr ' ' '\n' < file.txt | grep "groot" | wc -l

But I wanted to know is there any way to count all the words once? The desired output would look like this:

word        count
I           4
am          3
groot       8
me          2

Can it be done using some bash file or bash script? Please help.

Does this answer your question? Sort and count number of occurrence of lines — pLumo, Oct 20 '20 at 06:26

score 3 · Answer 1 · answered Oct 20 '20 at 08:22

grep -o '\w\+' file.txt | sort | uniq -c

Explanation:

grep -o will output each match on a separate line.
\w\+ matches any run of consecutive alphanumeric characters and _.
uniq -c will output a count of occurrences for each consecutive run of repeated lines.
The sort before uniq -c is necessary in order to group each unique word in a single consecutive run of repeated lines.

score 2 · Answer 2 · answered Oct 21 '20 at 11:54

Using only standard tools:

$ tr -sc '[:alpha:]' '\n' <file | sort | uniq -c
   3 I
   2 am
   8 groot
   1 love
   2 me
   1 so

This first replaces every non-word character with a newline character. We define a non-word character as "any character that is not an alphabetic character" (this is what -c together with [:alpha:] and \n does on the tr command line). Any run of more than one consecutive newline resulting from this is compressed down to a single newline (this is what -s does on the tr command line).

The generated words (there will be one word per line) are then sorted with sort and the number of times each word occurs is then counted.

The sort | uniq -c part of the pipeline could be made slightly more time efficient with a single awk program:

$ tr -sc '[:alpha:]' '\n' <file | awk '{ count[$0]++ } END { for (word in count) print count[word], word }'
1 love
8 groot
2 am
3 I
1 so
2 me

The awk code simply uses each word read from tr as a key into the count associative array, and increments the associated value each time the word is seen. At the end, the code prints the counts for all words.

This is what I requred brother. Thank you smuch... You are awesome.... :) — Geeta Prasad, Oct 22 '20 at 10:50

I wanted to count all the words exist in a text file and its occerence saperately

2 Answers2