finding multiple strings in a line

Question

I'm brand new to Unix and am using CygWIN64 simulator. I have a huge number of text files (tens of thousands) that I need to search for specific strings. I have had success teaching myself to search using a single string, but I cannot figure out after several days of trying how to search for two strings.

My files reside in c:/BF/data/

My single string command is

grep -Rinw c:/BF/data/ -e 'string'

I have tried many example from online and cannot get any command to work with two strings (an AND construction, not an OR construction). If the two strings are present in a line I want to have that line show on the screen. Again, I have been able to do this with one string. The string might have a space in it - if that makes any difference. For example, one string might be 'Miami' and the other 'New York City'.

I have tried different grep commands, awk commands and nothing works.

Can someone please point me in the right direction?

https://unix.stackexchange.com/questions/177513/grep-with-logic-operators — steeldriver, Jun 23 '20 at 18:42

glenn jackman · Answer 1 · 2020-06-24T16:05:59.700

To find 2 strings in a line:

Using GNU grep with Perl-compatible regexes:

grep -RinP '^(?=.*\bMiami\b)(?=.*\bNew York City\b)' dir/

Perl regexes use \b as a word boundary.

With GNU awk:

gawk -v IGNORECASE=1 '
    /\<Miami\>/ && /\<New York City\>/ {
        print FILENAME ":" NR ":" $0
    }
' file

Extended regexes use \< and \> as word boundaries.

However awk doesn't have an equivalent of -R. You could use find:

find dir/ -type f -exec gawk -v IGNORECASE=1 '...' '{}' +

bu5hman · Answer 2 · 2020-06-24T17:18:07.717

The following solution is available on @steeldriver linked post courtesy of @Campa.

grep -Rinw Miami . | grep -iw "new york city"

Just adding a few switches to get your recursive search and output

Using

Miami banana
Miami New York City
Miami banana
New York City banana
Miami banana
New York City Miami

if you have large numbers of files then avoiding the Perl with grep seems like a good idea

time grep -Rinw Miami . | grep -iw "new york city"
./file:2:Miami New York City
./file:6:New York City Miami
real    0m0.014s
user    0m0.004s
sys     0m0.016s
time grep -RinwP Miami . | grep -iwP "new york city"
./file:2:Miami New York City
./file:6:New York City Miami
real    0m0.059s
user    0m0.060s
sys     0m0.004s

There looks to be a time advantage of the above over @glennjackman with Perl

time grep -RinP '^(?=.*\bMiami\b)(?=.*\bNew York City\b)' .
./file:2:Miami New York City
./file:6:New York City Miami
real    0m0.069s
user    0m0.062s
sys     0m0.007s

And looping these 1,000 times over the same search for i in {1..1000}; do ....; done seems to confirm

@glennjackman Perl grep

real    0m49.276s
user    0m47.414s
sys     0m1.790s

@Campa Perl grep

real    0m42.841s
user    0m42.305s
sys     0m3.346s

@Campa simple grep

real    0m8.813s
user    0m8.837s
sys     0m3.081s

But the hands down winner over the 1,000 repetition sprint is @glennjackman awk

real    0m2.975s
user    0m2.259s
sys     0m0.772s

finding multiple strings in a line

2 Answers2