Grep for words not lines

Question

I need to search a text file for a word count. The file contains lines of sentences and I only care about the number of times a word shows up not the number of lines. How do I tell grep to search for words instead of lines?

For instance if I use, grep -c '^ab' (words that start with ab), it only returns the number of lines that begin with ab, not the number of words that begin with ab.

Are you using a grep that has the non-standard -o option? Are you interested in substrings of words, like world in the string otherworldly? If not, what constitute a word? Is Unix-like one word, or the two words Unix and like and does you'd match the word you? — Kusalananda, Mar 28 '22 at 22:21
@they yeah essentially I'd like to apply the regix to search for words rather than lines. For instance if I use, grep -c '^ab' (words that start with ab), it only returns the number of lines that begin with ab, not the number of words — magnus reeves, Mar 28 '22 at 22:28
Or try grepping without the ^, the beginning-of-line anchor. — waltinator, Mar 28 '22 at 23:33
(1) You start by saying “I only care about the number of times *A word* shows up …” This makes it sound like you have a word (one word) that you are interested in. But then you say “words that start with ab”. Please clarify: are you looking for a word, or for all words that match a pattern? (2) If you want to look for a pattern, think about multiple occurrences. For example, if you were looking for all words containing ab, would “habitable” count as one or two? — G-Man Says 'Reinstate Monica', Mar 29 '22 at 05:48
@G-Man Says 'Reinstate Monica' I answered this comment above, I'd like to apply regix to the words in a file not the line — magnus reeves, Mar 30 '22 at 16:57
@waltinator thanks for the help this was really usefull, I want to find the count of words that start with "ab" not the count of lines that start with ab — magnus reeves, Mar 30 '22 at 16:58
Does this answer your question? Count total number of occurrences using grep — G-Man Says 'Reinstate Monica', Oct 12 '23 at 08:14
Also similar: Counting occurrences of [a] word in [a] text file. — G-Man Says 'Reinstate Monica', Oct 12 '23 at 08:15

score 1 · Answer 1 · answered Mar 28 '22 at 23:31

1

If you want to count words in file.txt, not lines, simply put each word on its own line:

tr " " "\n" file.txt | grep -c '^ab'

answered Mar 28 '22 at 23:31

waltinator

4,865

I like this solution although my mac didnt like the tr input and it didnt do anythiing – magnus reeves Mar 30 '22 at 17:04
Even with GNU tr, this should probably be tr " " "\n" < file.txt | grep -c '^ab'. – frabjous Mar 30 '22 at 18:27

score 1 · Accepted Answer · answered Mar 28 '22 at 23:45

1

With GNU grep you can use the -o flag to get all the matches, and then count them afterwards wc -l:

grep -o '\<ab' file.txt | wc -l

Or I suppose you could count with grep itself:

grep -o '\<ab' file.txt | grep -c ''

("\<" means "start of a word".)

answered Mar 28 '22 at 23:45

frabjous

8,691

spectacular, if I wanted to include words that began with 'a' but ended with 'b', how would that look? – magnus reeves Mar 30 '22 at 17:32
Assuming a "word" can only have letters in it you could use \<a[A-Za-z]*b\>; if what you count as "words" can have other things in them like hyphens or underscores or digits, you may need to add to what's in the brackets. – frabjous Mar 30 '22 at 17:48
thank u so much, would u mind sharing me a resource where I can find other documentation on the -o regix? I have a billion more question I dont want to bug you with. I think its called string matching regix? Sorry I'm brand new to this in school I don't know if my questions make sense – magnus reeves Mar 30 '22 at 18:06
-o doesn't use a different kind of regex; it is just an option for grep which makes it so it only outputs the matches rather than the entire lines containing the matches (grep's normal behavior), and if there is more than one match on the same line, it puts them on separate lines in the output. See the man page for GNU grep (or man grep in the terminal). – frabjous Mar 30 '22 at 18:25

Grep for words not lines

2 Answers2