0

Background

I want to pass a list of filenames (listed via find) containing spaces to my custom python script. Therefore, I set up find to add quotes around each result:

find ./testdata -type f -printf "\"%p\" "

Result:

"./testdata/export (1).csv" "./testdata/export (2).csv" "./testdata/export (3).csv"

For the sake of this question, let's suppose my custom script (test.py) does the following:

#!/usr/bin/python3
import sys 


print(sys.argv)

Observations

Case 1:

Manually listing the quoted arguments.

Input: ./test.py "./testdata/export (1).csv" "./testdata/export (2).csv" "./testdata/export (3).csv"

Output: ['./test.py', './testdata/export (1).csv', './testdata/export (2).csv', './testdata/export (3).csv']

Case 2:

Using xargs

Input: find ./testdata -type f -printf "\"%p\" " | xargs ./test.py

Output: ['./test.py', './testdata/export (1).csv', './testdata/export (2).csv', './testdata/export (3).csv']

(I.e., output is the same as case 1)

Case 3:

Using backticks.

Input: ./test.py `find ./testdata -type f -printf "\"%p\" "`

Output: ['./test.py', '"./testdata/export', '(1).csv"', '"./testdata/export', '(2).csv"', '"./testdata/export', '(3).csv"']

Two things have changed:

  • "./testdata/export and (1).csv" are now two separate arguments.
  • the quotes remained part of the arguments

Questions

  1. Why does the version with backticks behave differently?

  2. Is there a way to still include the quotes with the backticks? I.e., make them behave the same as with xargs?

Remark

I really can't imagine what is going on here. One logical explanation could have been, that the output of the command in backticks will be treated as one big argument. But then, why is it splitted at the white spaces?

So the next best explanation seems to be that every white-space separated string is treated as a separate argument, without any regards to quoting. Is this correct? And if so, why do backticks have this strange behaviour? I guess this is not what we would want most of the time...

Attilio
  • 355
  • 1
  • 3
  • 12
  • why not pass it via find's -exec? – Jeff Schaller Sep 05 '19 at 15:20
  • ^thats how you actually should do it. To answer the immediate problem: xargs interprets quotes unless you use -d. The shell doesn't do quote removal on the output of command substitution. Hence the difference. The xargs behaviour is what I'd usually not want. See also https://unix.stackexchange.com/a/523809/70524 – muru Sep 05 '19 at 15:23

3 Answers3

2

So the next best explanation seems to be that every white-space separated string is treated as a separate argument, without any regards to quoting. Is this correct?

Yes, see e.g. https://mywiki.wooledge.org/WordSplitting and Why does my shell script choke on whitespace or other special characters? and When is double-quoting necessary?

The shell processes quotes only when they are originally on the command line, and not the result of any expansions (like command substitution you're using here, or parameter expansion), and aren't themselves quoted.

And if so, why do backticks have this strange behaviour? I guess this is not what we would want most of the time...

Well, strangeness is relative. And what one wants in one case might not be at all what anyone wants in some other case.

But consider something like this:

a="blah blah"
somecmd -f "$a"

The way it works, is that somecmd gets as argument the string contained in the variable a, regardless of what it contains. This is similar to how it works in "real" programming languages, say subprocess.call(["somecmd", "-f", a]) in Python. Straightforward, clean and completely safe: no special characters in the variable can mess things up.

That's important if the string comes from outside the script, read from a file, entered by a user or as the result of a filename expansion.

echo "Please enter a filename: "
read -r a
somecmd -f "$a"

If the result of expansions was processed for quotes, then you couldn't enter Don't stop me now.mp3 as the filename, as there's an unpaired quote.

Also, should results of all expansions be processed for further expansions, too? Setting a to $(rm -rf $HOME).txt would then do some rather nasty things. Note that that's a perfectly valid filename, so it can come up as the result of a glob like *.txt.

I know, that's a bit of a hyperbole, since we could propose that only quotes and escapes should get processed after expansions, not any further expansions. Unpaired single-quotes would still be an issue, and $(find -printf "\"%p\"") still wouldn't work for filenames that contain double-quotes.

Probably something like that could be made to work, but the less silent magic processing there is, the less chance for accidents to happen. (And with the shell, I sometimes think we should be glad it's even this sane.)


But you're right, this means that there's no immediately obvious straightforward way to get a list of strings out of find to the shell. That's actually what you really want, a list of strings, like sys.argv in Python. Not quotes.

Here's what you can do:

find -print0 | xargs -0 ./test.py

-print0 asks find to print the filenames with a NUL byte as separator (instead of a newline), and -0 tells xargs to expect just that. This works since the NUL byte is the only thing that can't be contained in a filename. -print0 and -0 are found in at least GNU and FreeBSD.

Or, in Bash:

mapfile -d '' files < <(find -print0)
./test.py "${files[@]}"

That's the same NUL-separated strings used with a process substitution and an array.

Or, in Bash (with shopt -s globstar) and others that have a similar feature, and if you don't need to filter based on anything but the filename:

shopt -s globstar
./test.py ./testdata/**

** is like *, just recursive.

Or, with standard tools:

find -exec ./test.py {} +

This bypasses the whole issue by asking find to run test.py itself, without passing the list of filenames anywhere else. Doesn't help if you actually do need to store the list somewhere, though. Note the + at the end, -exec ./test.py {} \; would run test.py once for each file.

ilkkachu
  • 138,973
  • Note that in bash5, backslashes are processed after unquoted expansions (as it became a glob operator). – Stéphane Chazelas Sep 05 '19 at 17:32
  • @StéphaneChazelas, yes, though it doesn't work in the sense of escaping whitespace, which I suppose people want when they try to output quoted filenames from a command substitution or such. – ilkkachu Sep 05 '19 at 17:59
0

You are losing the quotes due to shell expansion in the command substitution, you just need to quote it up again. It's recommended that you use the $() form instead of backticks. It makes your code more readable.

eval ./test.py "$(find ./testdata -type f  -printf "\"%p\" ")"

Updated: to behave like the other examples now, I have put eval in front, this will cause the proper expansion/quoting for you to get separate quoted args to python.

Adam D.
  • 462
  • Did you try that? It makes the output of find one single argument. Which doesn't help at all. – ilkkachu Sep 05 '19 at 16:24
  • python would get one arg but they are quoted within. This is what it looks like: ['./her.py', '"./testdata/export (2).csv" "./testdata/export (3).csv" "./testdata/export (1).csv" '] – Adam D. Sep 05 '19 at 17:15
  • 1
    Which is completely different from ['./test.py', './testdata/export (1).csv', './testdata/export (2).csv', './testdata/export (3).csv'] and would require quote processing within the Python script, and would be incompatible with using a shell glob, e.g. ./test.py ./testdata/export*.csv – ilkkachu Sep 05 '19 at 17:18
  • With eval it would work. But it's not a good idea to run eval with untrusted input: it would also process any expansions in the file names, including command substitutions. A file called $(uname -a >&2) would produce unexpected output, and you can imagine other commands in place of uname. It'd be ok if you know there are no files with dollar signs or backticks in their names. Quotes would break it too. – ilkkachu Sep 05 '19 at 19:06
0

xargs does its own special processing of its input.

It treats all sequences of newlines and blanks (at least space and tab, more in some implementations) as delimiters, ignores leading and trailing ones and handles quoting in its own special way: '...', "..." and \ can be used for quoting but in a different way from that of the sh syntax (both "..." and '...' are strong quotes but can't contain newline and \newline is a literal newline instead of a line continuation).

So on an input like:

   "foo \ bar" 'x'\
y

xargs generates two foo \ bar and x<newline>y arguments.

Leaving a command substitution (both the archaic `...` and modern $(...) forms) unquoted in list contexts in POSIX shells is the split+glob operator. The input is split on $IFS characters using complex rules and the resulting words are subject to filename generation. There is no quote handling at all.

On an input like

  "a* b"

With the default value of $IFS (SPC, TAB, NL), it generate a "a* word which is further expanded to the list of filenames in the current directory that start with "a and a b" word.

A command line like:

cmd "a* b"
cmd2 "x\"y"

is code in the shell syntax. In the syntax of the shell, blanks, newlines and quotes have a special meaning as well and are interpreted differently for that of xargs. That code above is parsed as two commands as newline separates commands, cmd "a* b" is parsed as two words: cmd and a* b as space separates words and "..." is a shell quoting operator that prevents the * and SPC within from being treated specially... etc.

To do tokenisation the same way as the shell does, zsh has a z glob qualifier for that (note that zsh in not POSIX by default in that it does only split and not split+glob upon unquoted command substitutions in list contexts), and also a Q glob qualifier to remove one layer of quoting. In that shell you can do:

output_of_cmd=$(find...) # no split+glob here as we're assigning to
                         # scalar variable. It's not a list context

words=("${(Q@)${(z)output_of_cmd}}") # array assignment
your-app "${words[@]}"