Need help in running find command reading the file name from the for loop

Question

contents of /tmp/fefile

amx/eng/prf.amx
amx/eng/det.amx
bmb/menu.bmb
bmx/eng/menu.bmx
dll/tlnt.dll
dlx/eng/dlx

for file in `cat /tmp/fefile`
do
    if [ -f $file ]
    then
        echo "File '${file}' found in $(pwd) path."
        echo " Now i need to check if the  that file modified in last 10 mins with the below find command "
        find . -mmin -10 -type f -name ${file} -regextype posix-egrep -regex ".*/(dir1|dir1|dir3|dir4)/.+" -printf "%P\n" > /tmp/base
        echo "The files that are modified recently are below"
        File=(cat /tmp/base)
    echo &quot; now i am verifying that $file is matched with $File &quot;
    if[ $file == $File ]
    then
        echo &quot; tmp file matched with base file.&quot;
    else
        echo &quot; file doesn't match&quot;
    fi # Originally missing
else
    echo &quot;File '{file} not found.&quot;
fi

done

Please help me in correcting the above script in the find command to read the file name and check if it is modified in last 10 mins

if that file is modified and check if both files matched

Daniel · Answer 1 · 2023-06-02T19:21:39.963

In the find command, you use a regex. This regex looks through the filenames listed in the fefile file and none of them match this regex. NOTE: it looks in the filenames themselves, not inside the files.

find . -mmin -10 -type f -name ${file} -regextype posix-egrep -regex ".*/(dir1|dir1|dir3|dir4)/.+" -printf "%P\n" > /tmp/base

None of them have:

anything in any amount PLUS
dir1 or dir1 (again?! maybe dir2) or dir3 or dir4 PLUS
anything in any amount but at least one

Another problem is the regex itself, it is:

".*/(dir1|dir1|dir3|dir4)/.+"

Maybe should be:

".*\/(dir1|dir2|dir3|dir4)\/.+"

You should escape the / with an \, as in: \/.

Also:

File=(cat /tmp/base)

Should be:

File=$(cat /tmp/base)

or

File=`cat /tmp/base`

Another point is the end of find line:

(...) -printf "%P\n" > /tmp/base

It's better change > to >>:

(...) -printf "%P\n" >> /tmp/base

Otherwise it will overwrite all files found and let just one.

cas · Answer 2 · 2023-06-05T06:56:16.360

The main problem with your script fragment is that you are running find multiple times in a loop (once for each filename in /tmp/fefile).

This is extremely slow and inefficient because find is an "expensive" operation (recursing a directory tree with any tool is expensive in both time and disk I/O), not something you should run repeatedly in a loop unless you have no other choice (and there's almost always another, better choice).

It is much better to run find just once and process its output (e.g. with grep or awk or sed or whatever).

Try something more like this:

find ./dir[1234]/ -type f -mmin -10 -printf '%P\n' | grep -F -f /tmp/fefile

That will output a list of all files in dir1..dir4 that a) were modified in the last 10 minutes and b) match the fixed-string patterns in /tmp/fefile.

BTW, notice that this does not need a /tmp/base temporary file (also BTW, hard-coding tempfile names into a script is generally a bad idea, use mktemp or similar instead. I'd guess your /tmp/fefile should almost certainly not be hard-coded either, but I don't know what the rest of your script does or how this script fragment is executed)

You may need to tweak the find and/or grep options a bit to get exactly what you want - it took me a few minutes examining your script fragment to figure out what it is you're trying to do, and I'm still not 100% sure. I do know that you're using ~20 lines of shell code to very inefficiently do something that you could do much better and faster with either find alone or with find and grep (or some other common tool such as sed or awk or perl).

Note: this will not work correctly if any filenames contain newline characters. You can use \0 instead of \n in the -printf format string, along with GNU grep's -z option.

find ./dir[1234]/ -type f -mmin -10 -printf '%P\0' | grep -z -F -f /tmp/fefile

(to view the output in a terminal, you might want to convert the NUL separators to newlines, e.g. by piping the output to tr '\0' '\n'. This is fine for just displaying a list of filenames, but not safe if you need to do something with the filenames)

And, speaking of doing things with the filenames, one of the best and safest ways to do that is to store them in an array. e.g. by using the bash built-in mapfile (AKA readarray) along with process substitution to populate an array with all the matching filenames.

declare -a found
mapfile -d '' -t found < <(find ./dir[1234]/ -type f -mmin -10 -printf '%P\0' |
                             grep -z -F -f /tmp/fefile)

$found will be an array containing all the matching filenames. you can view the array with declare -p found (this is mostly useful for debugging purposes, to verify that the array contains what you think it should contain) or use it as args for a command, or in a loop, e.g.:

for f in "${found[@]}"; do
  echo "$f"
done

You can do anything else you want to do with "$f" in the loop, but remember to double-quote both the variable AND the array because they can contain any character except NUL.

Which reminds me, you use ${file} in your find command, rather than "$file". This is a very commonly-made mistake: curly braces around variables are NOT a substitute for quoting.

They are used for parameter substitution (run man bash and search for the Parameter Expansion heading) and to eliminate ambiguity when interpolating variable names in a string (e.g. when you have a variable called $foo and you need to print it in a string immediately adjacent to a valid variable-name character - echo "$food" will output the value of $food, while echo "${foo}d" will output the value of $foo followed by a literal d character).

See $VAR vs ${VAR} and to quote or not to quote.

See also Why does my shell script choke on whitespace or other special characters?, When is double-quoting necessary? and Security implications of forgetting to quote a variable in bash/POSIX shells

Finally, since this question is about find and processing its output, and because you have been asking several find-related questions, see Why is looping over find's output bad practice?. And don't forget to read the related questions it links to.

Great answer! The OP using -name ${file} suggests though that they may want for amx/eng/prf.amx for instance to find a file called prf.amx in the eng subdir of an amx directory, and not any file whose path contain amx/eng/prf.amx (like kabamx/eng/prf.amxil), in which case it may be necessary to switch from grep -zF to perl -0 and construct the appropriate regex. — Stéphane Chazelas, Jun 05 '23 at 06:44
@StéphaneChazelas that's pretty much why i wrote the "You may need to tweak " paragraph. And regardless of the exact details of what the OP wants to do, running find in a loop is pretty much guaranteed to be the wrong way to do it (by which I mean that I can't think of any valid reason to do that but there might be some extreme corner case where it might be appropriate. probably not, though). — cas, Jun 05 '23 at 06:53

Stéphane Chazelas · Answer 3 · 2023-06-05T07:53:37.433

If the point is to find the regular files, last modified in the last 10 minutes whose last paths components are any of the strings that make up the lines of the /tmp/fefile, and whose path contains at least one directory component in the dir1, dir2, dir3, dir4, list, you can't do the matching by -name, that has to be done on the full -path.

-path (originally from BSD but now standard), and with some implementations -wholename (same as -path), -ipath, -regex, -iregex match on the whole path.

So some of the options are

generate a find command line that uses a -path predicate for each line of /tmp/fefile:

LC_ALL=C find . '(' -path '*/dir1/*' -o \
                    -path '*/dir2/*' -o \
                    -path '*/dir3/*' -o \
                    -path '*/dir4/*' \
                ')' '(' \
                    -path '*/amx/eng/prf.amx' -o \
                    -path '*/amx/eng/det.amx' -o \
                    ... \
                ')' -type f -mmin -10

which with bash you could do with:

readarray -t args < <(
  </tmp/fefile LC_ALL=C sed '
1!i\
-o
i\
-path
s|[*/?\\]|\\&|g; # escape glob operators
s|.*|*/&|')
LC_ALL=C find . '(' -path '*/dir1/*' -o \
                    -path '*/dir2/*' -o \
                    -path '*/dir3/*' -o \
                    -path '*/dir4/*' \
                ')' '(' "${args[@]}" ')'

(or */dir[1234]/*, I'm assuming those are placeholders for some real directory names that can't easily by factorised in one pattern).

leave the path matching to a post-processing command as shown by @cas

Or since here your find implementation seems to be GNU find which supports a -regex predicate, construct the regex dynamically:

regex=".*/($(
  </tmp/fefile LC_ALL=C sed -e 's/[][$^*()+{}\\|.?]/\\&/g' |
  paste -sd '|' -))\$"
LC_ALL=C find . -regextype posix-extended \
                -regex '.*/(dir1|dir2|dir3|dir4)/.*' \
                -regex "$regex" \
                -type f -mmin -10

(assuming /tmp/fefile is not empty).

Add some -exec predicate if you need to run commands on those files, or -print0 (or -printf '%P\0' to strip the leading ./) to pass the list NUL-delimited to some other command that can handle NUL-delimited list (a safe way to pass an arbitrary list of file paths).

Need help in running find command reading the file name from the for loop

3 Answers3