4

I need to pass an array of filenames to a command, preserving proper quoting. So far, so good. Unfortunately the command is actually a sub-command that is, in turn, invoked by another command. Concretely, the command is:

git filter-branch --index-filter \
    'git rm -rf --cached ‹file1› ‹file2›…' \
    HEAD

For simplicity, I’m going to replace this in the following by a simpler command that exhibits the same problem:

printf '%s\n' 'cmd file1 file2…'

Now I’ve got an array files=('a b' c). My desired result is that the above command prints in a single line, and individually quotes every token after cmd as necessary (e.g. when there’s a space).

It works if I manually expand and quote the file names:

$ printf '%s\n' 'cmd '\''a b'\'' c'
→ cmd 'a b' c

(Alternatively I could mix single and double quotes to achieve the same result.)

But it no longer works if I am trying to pass an array:

  1. $ (set -x; printf '%s\n' "cmd '${files[@]}'")
    + printf '%s\n' 'cmd '\''a b' 'c'\'''
    → cmd 'a b
    c'
    
  2. $ (set -x; printf '%s\n' 'cmd '\'"${files[@]}"\')
    + printf '%s\n' 'cmd '\''a b' 'c'\'''
    → cmd 'a b
    c'
    
  3. $ (set -x; printf '%s\n' 'cmd '"${files[@]}")
    + printf '%s\n' 'cmd a b' c
    → cmd a b
    c
    

I’m not surprised (3) doesn’t work (and it’s only included for completeness). Based on the output of set -x, the shell correctly quotes the individual array elements in (1) and (2) and it even puts escaped quotes around the whole thing. But then it breaks apart the individually quoted items. Is there a way to prevent this?


Incidentally, Shellcheck (SC2145) suggests replacing the [@] part by [*] in the above. This obviously breaks for filenames with spaces.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Defining gitCmd=( git filter-branch --index-filter 'git rm -rf --cached [MAGIC]' HEAD ) and doing an array expansion "${gitCmd[@]}" doesn't work? It preserves the quoted expressions as defined. You can see the part within the quotes is not broken and preserved – Inian May 09 '19 at 16:30

4 Answers4

5
  1. Instead of an array, use set -- file1 file2 ... to fill the parameter list, then use bash parameter transformation with the Quote operator:

    set -- 'a "b' c "d 'e" "f 'g "'"h' ; (set -x; printf 'cmd %s\n' "${*@Q}")
    

    Output:

    + printf 'cmd %s\n' ''\''a "b'\'' '\''c'\'' '\''d '\''\'\'''\''e'\'' '\''f '\''\'\'''\''g "h'\'''
    cmd 'a "b' 'c' 'd '\''e' 'f '\''g "h'
    

    Or, if we remove the set -x; part, the output becomes:

    cmd 'a "b' 'c' 'd '\''e' 'f '\''g "h'
    
  2. A comment from LL3 suggests a better way that doesn't require set -- ...:

    export x; n=(a "b 'c"); x="${n[@]@Q}"
    ( n=($x); printf 'cmd %s\n' "${n[*]}"; )
    

    Simple version:

    n=(a "b 'c"); echo "cmd ${n[@]@Q}"
    

    Output:

    cmd 'a' 'b '\''c'
    
  3. Yet another method is to use bash parameter transformation with the Assignment operator, (which also needs an eval):

    export x;n=(a b 'c d');x="${n[@]@A}"; (eval "$x";printf '%s\n' "${n[@]}")
    

    Output showing what printf sees:

    a
    b
    c d
    
milahu
  • 208
agc
  • 7,223
  • 1
    Uh that's cool! I didn't know of that parameter expansion. I think you could then use it from the array itself and expand it directly on the command ? Something like cmd "${files[*]@Q}" (thus without passing through set -- etc.) – LL3 May 10 '19 at 00:53
  • @LL3 That's a good idea. See revised answer. – agc May 10 '19 at 01:13
  • My real use case is a parameter array rather than a regular array anyway, so I wouldn’t need set -- either way, but having these solutions is brilliant. – Konrad Rudolph May 10 '19 at 08:22
  • Actually when trying to use this with my actual command I’m getting a “bad substitution” error: git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch'"${*@D}" HEAD — any idea why? This is on GNU bash 4.4.19(1) – Konrad Rudolph May 10 '19 at 14:48
  • @LL3 D was a typo, same result with Q. As mentioned previously, my “array variable” is actually the (function’s) argument array, $@. I can of course assign it to an array variable and use that but I thought it should work directly. I did try "${@@Q}", which didn’t work: It still splits the printf command line. – Konrad Rudolph May 10 '19 at 17:04
  • 1
    @KonradRudolph, As a diagnostic please try: set -- a b 'c d' ; (echo ${@@Q}); it should print 'a' 'b' 'c d'. – agc May 10 '19 at 17:12
  • @agc That works. But it seems that putting additional quotes around ${@@Q} is simply disregarded. – Konrad Rudolph May 10 '19 at 17:46
  • 1
    "Instead of an array, (which can't be passed to a subshell)" -- Of course you can pass an array to a subshell, something like a=('a b' c); (printf "cmd "; printf "%s " "${a[@]@Q}"; printf "\n") works just fine and prints cmd 'a b' 'c', which seems to be what is required. You can't pass an array through the environment to another process, though. But I'm not sure you need to do that here, whatever the argument to --index-filter is going to be, it can be made a single string in the main shell. – ilkkachu May 10 '19 at 21:41
  • @ilkkachu, Your comment might make a good answer. And I might be confusing as to the distinction between a subshell and another process; if you know of some question which explains the distinction, please post the URL. – agc May 10 '19 at 22:14
  • 1
    bonus points: add quotes only when needed. ${n[@]@Q} will always add single quotes – milahu Apr 19 '22 at 16:47
3

git filter-branch runs /bin/sh /usr/lib/git-core/git-filter-branch and that script evaluates the argument of --index-filter using eval.

So that argument is evaluated as /bin/sh code.

On most systems, /bin/sh will be more or less an interpreter of the POSIX sh language, though in a few like Solaris 10 and older, it could still be the ancient Bourne sh language instead.

When it comes to quoting syntax, it makes little difference though.

In any case, none of the ksh/bash/zsh extended quoting operators like $'...' can be used. What that means is that you can't use GNU/bash/zsh/ksh printf %q or mksh/bash ${var@Q} operator, or the xtrace tracing to generate the quoting as those resort to $'...' in some cases. They also use some forms of quoting that are not localisation-safe (like \).

One builtin quoting operator you could use is zsh's qq parameter expansion flag as it uses single quotes:

files=(foo 'a b c' $'a\nb\nc' --foo-- "a'b")
git filter-branch --index-filter "git rm -rf --cached -- ${${(@qq)files}}" HEAD

To see how zsh quotes those:

$ printf '<%s>\n' "${${(@qq)files}}"
<'foo' 'a b c' 'a
b
c' '--foo--' 'a'\''b'>

With bash/ksh/yash/zsh, you could do that same quoting using a function like:

shquote() {
  LC_ALL=C awk -v q=\' '
    BEGIN{
      for (i=1; i<ARGC; i++) {
        gsub(q, q "\\" q q, ARGV[i])
        printf "%s ", q ARGV[i] q
      }
      print ""
    }' "$@"
}

And then:

git filter-branch --index-filter "git rm -rf --cached -- $(shquote "${files[@]}")" HEAD
2

In Zsh there are several options for quoting. The best are the (q+) or (q-) expansion flags documented in zshall(1). These add fewer unnecessary characters:

$ cmd=(ssh localhost "echo hi > t")

$ newcmd=(sh -c "${${(q@)cmd}}"); echo "${${(q@)newcmd}}" sh -c ssh\ localhost\ echo\ hi\ \>\ t

$ newcmd=(sh -c "${${(qq@)cmd}}"); echo "${${(qq@)newcmd}}" 'sh' '-c' ''''ssh''' '''localhost''' '''echo hi > t''''

$ newcmd=(sh -c "${${(q-@)cmd}}"); echo "${${(q-@)newcmd}}" sh -c 'ssh localhost '''echo hi > t''

$ newcmd=(sh -c "${${(qqqq@)cmd}}"); echo "${${(qqqq@)newcmd}}" $'sh' $'-c' $'$'ssh' $'localhost' $'echo hi > t''

As for the syntax of "${${(q@)cmd}}", the q (or qq, q- etc.) causes escaping or quoting to be applied. The @ causes this escaping to applied to each element of the array cmd. The outer ${...} seems to be equivalent to ${(j: :)...}, i.e. joining with spaces. The double quotes are needed so that the result is not split again.

Unfortunately all of the quoting mechanisms in Zsh and Bash are exponential in quote depth for some inputs.

Here is an example showing the growth rate for the various quote expansion operators (code below):

q: (1) 6; (2) 14; (3) 24; (4) 42; (5) 76; (6) 142; (7) 272; (8) 530; 
qq: (1) 5; (2) 15; (3) 43; (4) 125; (5) 369; (6) 1099; (7) 3287; (8) 9849; 
qqq: (1) 5; (2) 13; (3) 25; (4) 45; (5) 81; (6) 149; (7) 281; (8) 541; 
qqqq: (1) 6; (2) 14; (3) 24; (4) 39; (5) 64; (6) 109; (7) 194; (8) 359; 
q-: (1) 5; (2) 15; (3) 39; (4) 97; (5) 237; (6) 575; (7) 1391; (8) 3361; 
q+: (1) 6; (2) 16; (3) 40; (4) 98; (5) 238; (6) 576; (7) 1392; (8) 3362; 

Strangely qqqq grows the most slowly, even though it doesn't start to fall behind until the 4th level of nesting.

Tcl is a great language that has nested quote operators with linear growth properties (see item 6 under man tcl).

Here is the code for the experiments. I used $'\t' as the initial string because it gives different lengths for q+ and q-.

f (){
  flag=$1
  echo -n "$flag: "
  str=$'\t'
  for i in $(seq 1 10); do
    eval 'str=\"${${('$flag'@)str}}\"'
    N=$(echo -n $str | wc -c)
    echo -n "($i) $N; "
  done
  echo
}
f q
f qq
f qqq
f qqqq
f q-
f q+
Metamorphic
  • 1,179
0
$ foo=(1 2 '3 4' 4 5)
$ printf "'%s'\n" "${foo[@]}"
'1'
'2'
'3 4'
'4'
'5'
$ subcommand() { printf "'%s'\n" "$@"; }
$ subcommand "${foo[@]}"
'1'
'2'
'3 4'
'4'
'5'

So let's adapt this to your specific use-case:

git filter-branch --index-filter \
    'git rm -rf --cached file1 file2 […]' \
    HEAD

In your case we need to be a little more creative though, and break things up into smaller pieces.

git filter-branch --index-filter  \
    'git rm -rf --cached [MAGIC]' \
    HEAD

The file list we're creating is where you need the "magic" to happen. The rest is all static, yes? And since you're scripting this, you don't need it to be on three lines, which simplifies things:

git filter-branch --index-filter 'git rm -rf --cached [MAGIC]' HEAD

So:

prefix="git filter-branch --index-filter 'git rm -rf --cached "
postfix="' HEAD"
magic="$(printf '"%s" ' "${file[@]}"'

And then if we execute:

${prefix}${magic}${postfix}

Thereby, we have assembled your command - albeit using "s to enclose your filenames rather than 's, since the filter-branch command is in 's already.

DopeGhoti
  • 76,081
  • Unfortunately this doesn’t help me … In the printf code, I need all items in one line, properly quoted. Consider my actual use-case: your command would break the git filter-branch call. – Konrad Rudolph May 09 '19 at 16:05
  • So don't use \n, that was just for demonstrative purposes. e. g. printf "'%s' " "$@". – DopeGhoti May 09 '19 at 16:05
  • Again, consider my actual use-case. I’m using \n in the printf example because this emulates what the git filter-branch command is doing. The issue isn’t the newline, it’s that shell quoting doesn’t work inside a string. – Konrad Rudolph May 09 '19 at 16:06
  • So considered; see expanded discussion. – DopeGhoti May 09 '19 at 16:18
  • This is a nice enough hack and I think it does work with spaces — but unlike normal shell quoting it’s not generally valid. For instance, it fails if a filename contains a double-quote (crazy, but valid). I don’t think this issue can be worked around by using printf. – Konrad Rudolph May 09 '19 at 16:51