3

When I execute the following command:

#!/bin/bash
while IFS= read -r -d '' file; do
    files+=$file
done < <(find -type f -name '*.c' -print0)
echo "${files[@]}"

I do not get the same result as this one:

#!/bin/bash
find_args="-type f '*.c' -print0"
while IFS= read -r -d '' file; do
    files+=$file
done < <(find $find_args)
echo "${files[@]}"

How can I fix the second scenario to be equivalent to the first one?

My understanding is that, because there are single quotes in the double quotes, the single quotes get escaped, which produces a bad expansion that looks something like that:

find -type f -name ''\''*.c'\'' -print0

3 Answers3

4

BLayer's answer is correct, but to deconstruct what's really happening here (ignoring the typo of the missing -name primary):

#!/bin/bash
while IFS= read -r -d '' file; do
    files+=$file
done < <(find -type f -name '*.c' -print0)
echo "${files[@]}"

In the shell started by process substitution (<(...)), the following command is parsed by bash:

find -type f -name '*.c' -print0

Because the glob *.c is quoted, bash does not expand it. However, the single quotes are stripped off. So when the find process starts, what it sees as its argument list is:

-type
f
-name
*.c
-print0

Note that these arguments are separated with null bytes, not with spaces or newlines. This is at the C level, not at the shell level. This has to do with how programs are executed using execve() in C.

Now to contrast, in the following snippet:

#!/bin/bash
find_args="-type f -name '*.c' -print0"
while IFS= read -r -d '' file; do
    files+=$file
done < <(find $find_args)
echo "${files[@]}"

The value of the variable find_args is set to:

-type f -name '*.c' -print0

(The double quote marks are not part of the value, but the single quote characters are.)

When the command find $find_args is run, as per man bash, the token $find_args is subject to parameter expansion followed by word splitting followed by pathname expansion (a.k.a. glob expansion).

After parameter expansion, you have -type f -name '*.c' -print0. Note that this is after quote removal. So the single quotes will not be removed.

After word splitting, you have the following as separate words:

-type
f
-name
'*.c'
-print0

Then comes pathname expansion. Of course '*.c' isn't likely to match anything as you don't usually put single quotes in your filenames, so the result would likely be that '*.c' will be passed as a literal pattern to find, and thus the -name primary will fail on all files. (It would succeed only if there is a file whose name starts with a single quote and ends with the three characters .c')


Edit: Actually, if there is such a file, the glob '*.c' will expand to match that file and any other such files and then the expansion [the actual file name] will be passed to find as a pattern. So whether the -print0 primary will ever be reached or not depends on (a) whether there is only one such filename, and (b) whether that filename, interpreted as a glob, matches itself.

Examples:

If you run touch "'something.c'", then the glob '*.c' will expand to 'something.c', and then the find primary -name 'something.c' will match that file as well and it will be printed.

If you run touch "'namewithcharset[a].c'", the glob '*.c' will be expanded to that by the shell, but the find primary -name 'namewithcharset[a].c' will not match itself—it would only match 'namewithcharseta.c', which doesn't exist—so -print0 would not be reached.

If you run touch "'x.c'" "'y.c'", the glob '*.c' will expand to both filenames, which will cause an error to be output from find because 'y.c' isn't a valid primary (and it can't be as it doesn't start with a hyphen).


If the nullglob option is set, you'll get different behavior.

See also:

Wildcard
  • 36,499
3

(Note, you have a typo. You left off the -name flag in the second example.)

One approach is to put the args in an array and pass the array appropriately to find...

#!/bin/bash
find_args=(-type f -name '*.c' -print0)
while IFS= read -r -d '' file; do
    files+=$file
done < <(find "${find_args[@]}")
echo "${files[@]}"

The format ${foo[@]} expands to all of the elements of the array, each an individual word (rather than expanding to a single string). This is closer in intent to the original script.

B Layer
  • 5,171
0

In addition to what has already been said, you need to:

  • declare the $files variable as an array as by default it will be scalar and var+=something on a scalar does string concatenation (or arithmetic addition if the scalar has been given the integer attribute). Or use the var+=(something) syntax (which would automatically convert the variable to an array).
  • initialise the variable (as unset or to an empty list), as otherwise you may inherit an initial value from the environment.

Doing:

files=()
while ...
  files+=$file # or files+=("$file")
done

Would be enough unless the files variable has previously been declared as an associative array earlier in the script (in which case files+=something would be like files["0"]+=something and files+=("$files") would be an error).

If you can't guarantee files haven't been defined as an associative array earlier in the script, you may need:

typeset -a files=()

instead, though that would have the side effect of limiting the scope of the variable to the enclosing function. typeset -ga files=() doesn't work properly as a work around for that in bash as it would declare the variable in the global scope. unset files; files=() might not work either as unset files in some cases may reveal the files variable from an outer scope (which may be an associative array) instead of unsetting it.