If your filelist.csv
contained exact matches for the files, you could just use something like find ... -print0 | grep -z -F -f filelist.csv | xargs -0r
...but it seems you want to match partial filenames listed in that file (with any characters before the filename and ".txt" appended). To do that, the easiest way is to use a regular expression.
You can use process substitution to transform the partial filenames in filelist.csv
into an appropriate regular expression filelist.csv
as it's being read by grep
.
BTW, unless you use sed's -i
option (don't do that for this particular task) this transformation is not permanent, it won't affect the original filelist.csv file, only the stream of text being fed into grep -f
.
Alternatively, you could pipe the output of find . -name '*.txt'
into grep
. That way, the input that grep sees is already filtered for filenames ending with .txt
, so sed
isn't needed to modify the regular expressions.
Anyway, try something like this:
First, some setup stuff for this experiment:
$ cat filelist.csv
test
foo
$ touch test test.txt foo foo.txt footest footest.txt
$ ls -l
total 4
-rw-r--r-- 1 cas cas 10 Sep 8 04:01 filelist.csv
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 foo
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 footest
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 footest.txt
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 foo.txt
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 test
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 test.txt
Then use the bash built-in mapfile
to populate an array called out
with the output of find and grep:
$ mapfile -d '' out < \
<(find . -type f -print0 |
grep -z -f <(sed -e 's/^\(.*\)/.*\1\.txt$/' filelist.csv)
or:
$ mapfile -d '' out < \
<(find . -type f -name '*.txt' -print0 |
grep -z -f filelist.csv )
The results:
$ declare -p out
declare -a out=([0]="./foo.txt" [1]="./footest.txt" [2]="./test.txt")
$ ls -l "${out[@]}"
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 ./footest.txt
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 ./foo.txt
-rw-r--r-- 1 cas cas 0 Sep 8 04:01 ./test.txt
Note how the out
array only contains foo.txt
, footest.txt
, and test.txt
, but not foo
or test
or footest
.
BTW, you can iterate over the filenames in $out
with something like:
for f in "${out[@]}"; do
echo "$f"
do-something-else-with "$f"
done
or to iterate over the indices of the array (0, 1, 2) rather than the values -sometimes that is more useful, e.g. when you have two or more arrays with the same indices that you want to use together in some way, or when you need to use the index for some other purpose:
for i in "$!{out[@]}"; do
echo "${out[$i]}"
done
Remember:
double-quote your variables (i.e. type "$var"
, not just $var
) when you don't want shell to word-split them or expand globs or act on shell metacharacters like ;
or &
that may be in them. This is almost always. Rule of thumb: if you don't know exactly WHY you need to use a variable without double-quoting it in any particular situation, then double-quote it. Not quoting $out
or $filename
was the proximate cause of your initial problem.
never assume filenames won't have annoying characters like spaces and newlines in them - these are perfectly valid characters for filenames in unix, so your scripts will have to deal with them. In fact, the only character which can't be in a path/filename is a NUL.
always use NUL as the separator between arbitrary or unknown filenames. It's the only separator which will work with any filename.
there are lots of exceptions but: most of the time, you should use an array when you want a variable to hold multiple values, not a white-space separated string or similar method of "faking/emulating an array". Especially when those values are filenames or other things where your separator is a valid character in one of the values.
find
command - it's the shell's expansion of the unquoted${out}
. See for example New line in bash variables – steeldriver Sep 07 '21 at 15:02-name
args for find, or pipe find's output into grep (usefind -print0
andgrep -z
for NUL separated filenames) – cas Sep 07 '21 at 17:37