With xargs
+ find
One solution is to use xargs
to build insanely long find
commands that will search for thousands of files at once:
sed -e 's/^/-o -name /' "${Region}_${date}.txt" \
| xargs find "$DataDir" -false \
> "${runDir}/st_$Region"
The first sed
command turns each filename into the expression -o -name filename
which will be appended by xargs
to the find
command. Then xargs
execute the find
command(s) it has built. The result is stored directly into the st_$Region
file.
Fine. But how are we going to build ${Region}_filesnotfound_$date.txt
, the list of files that were not found? Just by intersecting the full original list with the list of files found:
comm -3 \
<(sort -u "${Region}_${date}.txt") \
<(xargs -L1 basename < "${runDir}/st_$Region" | sort -u) \
> "${Region}_filesnotfound_$date.txt"
comm -3
supresses the lines in common between the two files. Those are pseudo-files actually. The second file is the result of the basename
command applied to each file found. Both files are sorted.
With find
+ grep
Another solution is to grep
the filenames from the output of find
. grep
offers the possibility (via the-f
option) to search a series of patterns stored in a file. We have a series of filenames in a file. Let's make it a pattern list and feed it to grep
:
find "$DataDir" \
| grep -f <(sed 's|.*|/&$|' "${Region}_${date}.txt") \
> "${runDir}/st_$Region"
The sed
command is mandatory: it anchors the filename to search at the end of the path.
As for the list of missing files, it would be built the same way as the other solution.
The problem with this solution is that filenames may contain characters that may be interpreted by grep
: .
, *
, [
, etc. We would have to escape them with sed
(I leave it as an exercise to the reader). That's why the first solution is to be preferred IMHO.
Finally, note that I have used some bash
isms here (e.g. process substitions <(...)
). Don't expect any of my solutions to be POSIX compliant.
find
once to write all the file names below$DataDir
into a temporary file, then look up file names in that file, possible all at once withgrep
. – AlexP Jan 10 '17 at 01:23find
executed across the entire$DataDir
for each line in the source file. – Chris Davies Jan 10 '17 at 14:05