-2

I want to recursively search a directory for files with either a .txt extension or no extension which has 2 strings that im searching for present at the same time. How can I manage that?

For example there are 5 files somewhere inside this directory that contain "string 1", "string 2" and "string 3" inside them. 2 of them are .pdf and .html which I am not interested in. Remaining 3 are .txt files and/or have no extension. I would like to get the paths of those which contain all the strings inside them and have the .txt extension or no extensions at all.

3 Answers3

1

Updated for the modified question:

find directory -type f \( -name '*.txt' -o ! -name '*.*' \) \
    -exec grep -q -F -e 'string 1' {} \; \
    -exec grep -q -F -e 'string 2' {} \; \
    -exec grep -q -F -e 'string 3' {} \; \
    -print

This searches the directory called directory recursively for regular files with a .txt filename suffix, and for regular files with no dot in their names. When such a file is found, grep is use in a way similar to what I previously described (see below) to figure out whether all three strings are present in the file.

If the strings are found, then the pathname of the file is printed.

Alternatively, using the code from my first installment of this anwer (from below):

find directory -type f \( -name '*.txt' -o ! -name '*.*' \) -exec sh -c '
    for pathname do
        if  grep -q -F -e "string 1" "$pathname" &&
            grep -q -F -e "string 2" "$pathname" &&
            grep -q -F -e "string 3" "$pathname"
        then
            printf "All were found in \"%s\"\n" "$pathname"
        fi
    done' sh {} +

See also:


Old answer from before the modification to the question:

The name of the file is of no consequence as Unix does not infer a file type from the file name.

To test whether a string is present in some file called file, one may do

if grep -q -F -e 'some string' file; then
    echo 'The string is present'
else
    echo 'The string is not present'
fi

The options used with grep here are

  • -q: This makes grep quiet, and it also makes it terminate as soon as the pattern matches. Instead of extracting the line(s) where the pattern matches, it exits with an exit status reflecting whether a match was found or not. This exit status is what I'm using in the if statement above.
  • -F: This makes grep treat the pattern as a string rather than a regular expression. This makes it possible to test whether strings like a * [in the] sky occurs in a text, without having to escape the special characters in it.
  • -e: This makes grep treat the next argument as the pattern to use for matching with. This makes it possible to use a pattern starting with - without grep thinking it's a command line option.

To test several strings, add further grep tests like this:

if  grep -q -F -e 'string 1' file &&
    grep -q -F -e 'string 2' file &&
    grep -q -F -e 'string 3' file
then
    echo 'All three string were found in the file'
else
    echo 'One or more string was not found in the file'
fi

Assuming one is using a shell that has named arrays (such as bash), one could also store the strings in an array and do a loop like so:

strings=( 'string 1' 'string 2' 'string 3' )

found=true for string in "${strings[@]}"; do if ! grep -q -F -e "$string" file; then found=false break fi done

if "$found"; then echo 'All strings were found' else echo 'Not all strings were found' fi

This iterates over the strings, and if one of them is not found (note the ! which negates the result of the grep test), then the variable found is set to false and the loop is exited (we don't need to test further strings).

We then test whether $found is true or false and act on the result of that test.

The above shell code rewritten for /bin/sh (without named arrays):

set -- 'string 1' 'string 2' 'string 3'

found=true for string do if ! grep -q -F -e "$string" file; then found=false break fi done

if "$found"; then echo 'All strings were found' else echo 'Not all strings were found' fi

Kusalananda
  • 333,661
1

Searching for multiple strings is a job for awk, not for grep:

find directory -type f \( -name '*.txt' -o ! -name '*.*' \) \
    -exec awk '
              index($0,"string 1"){x=1}
              index($0,"string 2"){y=1}
              index($0,"string 3"){z=1}
              x && y && z { f=1; exit }
              END { exit !f }
              ' {} \; \
    -print

Note that in the above awk is only called once per input file instead of once per string per input file. It's also trivial to write a script to find any number of strings instead of hard-coding them a line at a time and still just call awk once per file, e.g.:

find directory -type f \( -name '*.txt' -o ! -name '*.*' \) \
    -exec awk '
              BEGIN {
                  totReqd = split("string 1 \
                                   string 2 \
                                   string 3", strings, /[[:space:]]+\n[[:space:]]+/)
              }
              {
                  for (idx in strings) {
                      if ( index($0,strings[idx]) ) {
                          totFound++
                          delete strings[idx]
                      }
                  }
              }
              totFound == totReqd { f=1; exit }
              END { exit !f }
              ' {} \; \
    -print

Both of the above are untested but should be close if not exactly correct. They could further be easily modified to operate on multiple files at a time.

Ed Morton
  • 31,617
-2

Edited bellow for updated question you can grep 2 patterns using -e option. Files that you are looking into doesn't need to have extension, just use wildcard so you your statement would look something like this

grep -e "word1" -e "word 2"  /your/folder/*

or this way for even 3 strings in files that consist word "txt"

 grep 'word1\|word2\|word3'  /your/folder/*txt*

Try it to see what you get

if you would like to find both strings in the same line you can just do

grep "word 1"  /your/folder/* | grep "word 2"

That will pipe results for first grep to another with different string. or do the following

grep -e 'word1.*word2\|word2.*word1'  /your/folder/*

so it would look for word1 first and word2 second or vice versa

DenisZ
  • 76
  • 6