0

I have more then 30 different text files and each one of them has a same word which repeated different time for example in text1 "esr" repeated 12 times and in text2 "esr" repeated 21 times.

Is it possible to output the number of time that the word repeated separately with one command?

αғsнιη
  • 41,407
  • "repeated separately"? The sum of all the occurrences, or the number of occurrences in each file, but presented separately? – Kusalananda Aug 28 '17 at 12:31
  • @Kusalananda the number of occurrences in separate format –  Aug 28 '17 at 12:32
  • are you interested specifically in "esr" or every word that can be found in any file ? – pawel7318 Aug 28 '17 at 12:41
  • 2
    Your title is completely different that what you described in body of your question! if it's title then sort <file | uniq -c would be enough. – αғsнιη Aug 28 '17 at 14:31

5 Answers5

5

With grep + wc pipeline:

for f in *.txt; do echo -n "$f "; grep -wo 'esr' "$f" | wc -l; done

grep options:

  • -w - word-regexp (to match whole/separate word)

  • -o - print only matched substrings


  • wc -l - count the number of lines (matched words in our case) for each file
4
strings ./*.txt|tr " " "\n"|sort|uniq -c
αғsнιη
  • 41,407
pawel7318
  • 2,000
  • 3
  • 16
  • 15
3

Use grep to find all instances, then count unique lines using uniq -c.

grep "word" * | sort | uniq -c

If you want matches per input file, use grep -c:

grep -c "word" * 
sebasth
  • 14,872
2
for name in file*.txt; do
    printf 'Pattern occurs %d times in "%s"\n' "$(grep -wo 'pattern' "$name" | wc -l)" "$name"
done
Kusalananda
  • 333,661
0

If you want to count every word in any number of files you could use AWK e.g.:

awk 'BEGIN{RS="[[:space:]]+"}
     {counts[$0]++}
     END{for(word in counts){print word " - " counts[word]}
     ' file1 file2 file...

This treats a file as if every word were on a separate line, that's the BEGIN{RS="[[:space:]]+"} part, then counts each time it sees a line. Removing the BEGIN portion would count each normal line.

If you're only interested in 1 specific word, you could change the END block to look something like:

END{print counts["esr"]}

Which would print only the times "esr" shows up, but remember that this is case-sensitive.

To remove case-sensitivity, use counts[tolower($0)]++ or counts[toupper($0)]++.

Checks can be added to print out data when the count goes from one file to the next as well.