Isn't there some way to protect spaces in backtick (or $(...))
expansion?
No, there isn't. Why is that?
Bash has no way of knowing what should be protected and what shouldn't.
There are no arrays in the unix file/pipe. It's just a byte stream. The command inside the ``
or $()
outputs a stream, which bash swallows and treats as a single string. As that point, you only have two choices: put it in quotes, to keep it as one string, or put it naked, so that bash splits it up according to its configured behavior.
So what you have to do if you want an array is to define a byte format that has an array, and that's what tools like xargs
and find
do: If you run them with the -0
argument, they work according to a binary array format which terminates elements with the null byte, adding semantics to the otherwise opaque byte stream.
Unfortunately, bash
cannot be configured to split strings on the null byte. Thanks to https://unix.stackexchange.com/a/110108/17980 for showing us that zsh
can.
xargs
You want your command to run once, and you said that xargs -0 -n 10000
solves your problem. It doesn't, it ensures that if you have more than 10000 parameters, your command will run more than once.
If you want to make it strictly either run once or fail, you have to provide the -x
argument and an -n
argument larger than the -s
argument (really: large enough that a whole bunch of zero-length arguments plus the name of the command do not fit in the -s
size). (man xargs, see excerpt far below)
The system I'm currently on has a stack limited to about 8M, so here's my limit:
$ printf '%s\0' -- {1..1302582} | xargs -x0n 2076858 -s 2076858 /bin/true
xargs: argument list too long
$ printf '%s\0' -- {1..1302581} | xargs -x0n 2076858 -s 2076858 /bin/true
(no output)
bash
If you don't want to involve an external command, the while-read loop feeding an array, as shown in https://unix.stackexchange.com/a/110108/17980, is the only way for bash to split things at the null byte.
The idea to source the script ( . ... "$@" )
to avoid the stack size limit is cool (I tried it, it works!), but probably not important for normal situations.
Using a special fd for the process pipe is important if you want to read something else from stdin, but otherwise you won't need it.
So, the simplest "native" way, for everyday household needs:
files=()
while IFS= read -rd '' file; do
files+=("$file")
done <(find ... -print0)
myscriptornonscript "${files[@]}"
If you like your process tree clean and nice to look at, this method allows you to do exec mynonscript "${files[@]}"
, which removes the bash process from memory, replacing it with the called command. xargs
will always remain in memory while the called command runs, even if the command is only going to run once.
What speaks against the native bash method is this:
$ time { printf '%s\0' -- {1..1302581} | xargs -x0n 2076858 -s 2076858 /bin/true; }
real 0m2.014s
user 0m2.008s
sys 0m0.172s
$ time {
args=()
while IFS= read -rd '' arg; do
args+=( "$arg" )
done < <(printf '%s\0' -- $(echo {1..1302581}))
/bin/true "${args[@]}"
}
bash: /bin/true: Argument list too long
real 107m51.876s
user 107m38.532s
sys 0m7.940s
bash is not optimized for array handling.
man xargs:
-n max-args
Use at most max-args arguments per command line. Fewer than
max-args arguments will be used if the size (see the -s option)
is exceeded, unless the -x option is given, in which case xargs
will exit.
-s max-chars
Use at most max-chars characters per command line, including the command and initial-arguments and the terminating nulls
at the ends
of the argument strings. The largest allowed value is system-dependent, and is calculated as the argument length limit
for exec,
less the size of your environment, less 2048 bytes of headroom. If this value is more than 128KiB, 128Kib is used as the
default
value; otherwise, the default value is the maximum. 1KiB is 1024 bytes.
-x
Exit if the size (see the -s option) is exceeded.
IFS="
, newline,"
). But is there a need to execute the script over all the filenames? If not, consider using find itself to execute the script for each file. – njsg Jan 19 '14 at 23:30