1

I am attempting to write a bash script that searches contents of files in a specified directory tree for the presence of a specified substring.

Using grep's recursive function alone is not sufficient, since I potentially need to iterate over the / directory (and all sub-directories) of a system, which makes grep run out of memory and abort. Therefore I decided to get a list of all directories and sub-directories in the specified directory tree using find with the following variables denoting arguments passed to the script.

searchdir=$HOME     # passed in a script argument
searchstr="secret"  # passed in a script argument

I call the find utility and store the output into a temporary file.

TF=$(mktemp)
find ${searchdir} -type d 1>$TF 2>/dev/null

With the list of all directories in the temporary file, I proceed to iterate over the lines of this file using a while-do loop with the intention to perform a search over all files in each directory. For grep, I use the format of parameters provided in this answer to search all files, including the hidden ones, in the single directory.

cat $TF | while read line || [[ -n $line ]];
do
    grepdir="${line}/{*,.*}"
    grep -sHn "${searchstr}" ${grepdir}
done

... however, that code produces no output.

I verified that...

The ${TF} does contain the correct list of all directories. Outputting the ${grepdir} variable gives the output I'm expecting to find.

/home/user/{*,.*}
/home/user/.ssh/{*,.*}
/home/user/test/{*,.*}
# ... and so on

If I run the grep command with a hardcoded directory, particularly the ~/test/ directory, which contains two test files with the string it's supposed to find

grep -sHn "${searchstr}" /home/user/test/{*,.*}

... it correctly outputs the two files containing the substring "secret".

/home/user/test/asdf:7:secret
/home/user/test/test.txt:5:asdfasfdsecretaasdfafd

A format that works for me is the one originally mentioned in the answer discussing the recursive use of grep. If I do this:

cat $TF | while read line || [[ -n $line ]];
do
    grep -rn "${line}" -e "${searchstr}"
done

... I get some output (technically correct, but with many duplicate entries), but since the grep is processing the directories recursively and I have a list of all directories, I am bound to get the same results many times and on directories such as the aforementioned root directory, grep will fail entirely, which is what I'm trying to avoid.


I should also probably mention that my desperate hacks to get it working, such as passing $(echo "${grepdir}") as the parameter, led to no results as well.

There is most likely a misconception in my thinking or understanding of bash. Shouldn't bash expand the ${grepdir} variable before making a call to grep? Where is my script going wrong?

2 Answers2

3

Rule #1: When a command or script isn’t doing what you want it to, look at the error messages.  Don’t throw them into /dev/null.

You are getting error messages like

grep: /home/user/{*,.*}: No such file or directory
grep: /home/user/.ssh/{*,.*}: No such file or directory
grep: /home/user/test/{*,.*}: No such file or directory

but you aren’t seeing them.

If we look at bash(1), we see

Expansion is performed on the command line after it has been split into words.  There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

The important part for your situation is that brace expansion occurs before variable expansion.  So, if you said

grep -sHn "${searchstr}" "${line}"/{*,.*}

then

  • brace expansion would turn the last token into "${line}"/* and "${line}"/.*,
  • variable expansion would turn the above into /home/user/* and /home/user/.*, and then
  • pathname expansion would turn the above into a list of filenames.

But, when you say

grep -sHn "${searchstr}" ${grepdir}

then

  • variable expansion turns the last token into /home/user/{*,.*},

and then it’s too late for brace expansion to occur.  grep looks for a file called literally /home/user/{*,.*}.


P.S.

grep -sHn "${searchstr}" "${line}/{*,.*}"

wouldn’t work, either, because the quotes would prevent the brace expansion and pathname expansion from occurring.

P.P.S. You don’t need all those braces;

grep -sHn "$searchstr" "$line"/{*,.*}

would be fine.

2

The reason why grep aborts when recursing over the whole system is likely not that it couldn't cope with the amount of data, but that it trips over one or the other pseudo or device file in /proc, /sys or /dev. You could exclude the offending directories with the --exclude option on the command line.

The reason why it doesn't expand the wildcards is because they're quoted in this line:

    grepdir="${line}/{*,.*}"

Changing it to this will probably help that they get expanded.

    grepdir="${line}/"{*,.*}

Another way to achieve this (with less scripting on your behalf) would be to select the files using find and piping the file paths to xargs for processing: find / ... -print 0 | xargs -0 ...

However, either way would probably still trip over whatever file(s) the original recursive grep tripped over, unless you exclude them.

nilsph
  • 71
  • Good points about special files potentially causing a problem, and find being a better approach. – G-Man Says 'Reinstate Monica' Nov 25 '19 at 20:51
  • I got the grep: memory exhausted error message when performing a recursive search on / and didn't look for the exact cause. As mentioned here it indeed is likely one of the large system files, since the trip happens in the /proc/ directory and the virtual machine the script is running in has under 500 MB of RAM. Good point bringing it into attention as it is a potential failure point further on! – Marty Cagas Nov 25 '19 at 21:53
  • Did you actually try grepdir="${line}/"{*,.*}? IIRC unquoted brace expansions don't expand on the LHS of a simple assignment in bash (although other types of expansion do occur if unquoted) – steeldriver Nov 26 '19 at 01:20