How can I find all lines containing two specified words, case-insensitively?

Question

I need to check if two (specified) words exist on any line in a text file. There are no limits for the characters of the words. For example:

I want to find lines of a text file that contain the two words “cat” and “elephant” together (i.e., on the same line; not necessarily side-by-side):

Cat is smaller than elephant
Elephant is larger than cat
Cats are cute!
Elephants are very strong
Cat and elephants live in different environments
cats are friendly

In the previous examples, how can I find the lines containing both words?

Cat is smaller than elephant
Elephant is larger than cat
Cat and elephants live in different environments

I tried grep and awk with no hope. The problem is there are words that have upper and lower case, so how can I match for both words regardless of their letter status!?

Try with grep again, but use grep -i. This makes its matching disregard the case of the letters. Also, please show what you've tried so that other's make comment and give suggestions for improvement. For example, does your command distinguish that "catnip" is not the word "cat"? — Kusalananda, Oct 16 '18 at 22:21
How to run grep with multiple AND patterns?, How to use grep to match multiple strings in the same line?, grep for 2 words existing on the same line, Grep searching two words in a line — phuclv, Oct 17 '18 at 14:25
A proposed edit wants to add the [case-sensitivity] tag which seems ok, and mention it in the title. But is also proposes removing the [grep] tag for no reason, which is bad. I don't have enough rep on this site to approve-and-edit or reject-and-edit. (@Kusalananda perhaps you'd care to take a look.) — Peter Cordes, Jan 16 '24 at 10:10

score 8 · Answer 1 · 2018-10-17T02:18:49.667

8

With grep

grep -i "cat" file | grep -i "elephant"

Cat is smaller than elephant
Elephant is larger than cat
Cat and elephants live in different environment

The flag in grep is to ignore case (upper/lower)

 -i, --ignore-case         ignore case distinctions

or awk

awk 'BEGIN{IGNORECASE=1} /cat/&&/elephant/{print $0}' file

@glenn jackman suggested that awk statement can be run as follows:

awk '/cat/&&/elephant/' IGNORECASE=1 file

edited Oct 17 '18 at 02:18

answered Oct 16 '18 at 22:20

The {print $0} block is optional since it is the default action. – glenn jackman Oct 17 '18 at 01:54
I would leave your answer as it is. FYI the awk command can be "golfed" to awk '/cat/&&/elephant/' IGNORECASE=1 file -- also I believe IGNORECASE is specific to GNU awk – glenn jackman Oct 17 '18 at 01:58
1

@glenn: IGNORECASE is indeed GNU; tolower($0)~/cat/ (or similarly with toupper) is standard, but may give undesired results in some (non-English) locales with accented letters and especially Turkish with its dotted and dotless i's. – dave_thompson_085 Oct 17 '18 at 04:49

Kusalananda · Answer 2 · 2018-10-17T05:52:49.857

$ grep -Fiw cat <file | grep -Fiw elephant
Cat is smaller than elephant
Elephant is larger than cat

We first extract all lines from the file file that contains the word cat and then narrow down those lines to the ones that contains the word elephant.

This is done using grep -F -i -w where

-F makes grep treat the pattern as a fixed string, not as a regular expression,
-i makes grep do case-insensitive matching, and
-w makes grep match complete words only.

The -w option is an extension of the POSIX standard for grep, but is implemented by most common grep implementations. It basically disallows matches of the given patten when the matching string is part of a longer word.

Note that I'm not matching the line

Cat and elephants live in different environment

This is due to the final s in elephants. I would also not match the line

elephantiasis is catastrophic

for the same reason.

Would you want to allow for a plural s at the end of words, use

$ grep -Eiw 'cats?' <file | grep -Eiw 'elephants?'
Cat is smaller than elephant
Elephant is larger than cat
Cat and elephants live in different environment

Here, we use an (extended) regular expression instead of a fixed string in both invocations of grep. The expressions will match an optional s at the end of the two words. Now we match cat and cats (case-insensitively), but would not match catnip, catsup, or scat.

score 3 · Answer 3 · answered Oct 17 '18 at 01:56

3

with GNU sed:

sed -n '/cat/I {/elephant/I p}' file

or perl

perl -ne 'print if /cat/i and /elephant/i' file

or a single grep

grep -i -e 'cat.*elephant' -e 'elephant.*cat' file

answered Oct 17 '18 at 01:56

glenn jackman

85,964

how do I insert a text at the end of found line using sed? – hoangpx Jul 28 '20 at 23:18

G-Man Says 'Reinstate Monica' · Answer 4 · 2022-05-02T19:26:49.087

You can do it in non-GNU awk by using the “poor man’s” trick to get case insensitivity:

awk  '/[Cc][Aa][Tt]/ && /[Ee][Ll][Ee][Pp][Hh][Aa][Nn][Tt]/'  file

where, just as [aeiou] matches any one of a, e, i, o or u, [Ee] matches either E or e — that is, a case-insensitive match for “e”.

Note that this approach (like all the other answers posted here so far) will match the line

There are many ways to catch an elephant.

because the word “catch” contains the string “cat”. If you want to avoid this, try

awk  '/(^|\W)[Cc][Aa][Tt](\W|$)/ && /(^|\W)[Ee][Ll][Ee][Pp][Hh][Aa][Nn][Tt](\W|$)/'  file

where you constrain each word to be preceded by a non-word character (or the beginning of the line) and followed by a non-word character (or the end of the line) — \W matches a non-word character (i.e., a space (or tab) or other non-alphanumeric * character).

(I’m not sure whether this is POSIX-compliant.)

Note that this will now not match

Cat and elephants live in different environments

because the word “elephants” is not the same as the word “elephant”.
__________________________
* In this context, underscore (the “_” character) counts as a letter.

How can I find all lines containing two specified words, case-insensitively?

4 Answers4

Linked