This answered question explains how to search and sort a specific filename, but how would you accomplish this for an entire directory? I have 1 million text files I need to to search for the ten most frequently used words.
database= /data/000/0000000/s##_date/*.txt - /data/999/0999999/s##_data/*txt
Everything I have attempted results in sorting filenames, paths, or directory errors.
I have made some progress with grep, but parts of filenames seem to appear in my results.
grep -r . * | tr -c '[:alnum:]' '[\n*]' | sort | uniq -c | sort -nr | head -10
output:
1145
253 txt
190 s01
132 is
126 of
116 the
108 and
104 test
92 with
84 in
The 'txt' and 's01' come from file names and not from the text inside the text file. I know there are ways of excluding common words like "the" but would rather not sort and count file names at all.