I need to find the 10 most frequent words in a .csv file. The file is structured so that each line contains comma-separated words. If the same word is repeated more than once in the same line, it should be counted as one. So, in the example below:
green,blue,blue,yellow,red,yellow
red,blue,green,green,green,brown
green, blue and red should be counted as 2 and yellow and brown as 1
I know similar questions have been asked before, and one solution was:
<file.csv tr -c '[:alnum:]' '[\n*]' | sort|uniq -c|sort -nr|head -10
But this will count the number of time a word appears in the same line, like this:
4 green
3 blue
2 yellow
2 red
1 brown
and this is not actually what I need. Any help? Also I will appreciate a short explanation of the command and why does the command I found in similar questions does not do what I need.
red,blue,green,brown,green,green
? – Kusalananda Jun 01 '20 at 21:20