19

I read about that I should quote variables in bash, e.g. "$foo" instead of $foo. However, while writing a script, I found an a case where it works without quotes but not with them:

wget_options='--mirror --no-host-directories'
local_root="$1" # ./testdir recieved from command line
remote_root="$2" # ftp://XXX recieved from command line 
relative_path="$3" # /XXX received from command line

This one works:

wget $wget_options --directory_prefix="$local_root" "$remote_root$relative_path"

This one does not (note the double quotes aroung $wget_options):

wget "$wget_options" --directory_prefix="$local_root" "$remote_root$relative_path"
  • What is the reason for this?

  • Is the first line the good version; or should I suspect that there is a hidden error somewhere that causes this behavior?

  • In general, where do I find good documentation to understand how bash and its quoting works? During writing this script I feel that I started to work on a trial-and-error base instead of understanding the rules.

muru
  • 72,889
z32a7ul
  • 435
  • 3
    Your question is answered here: http://mywiki.wooledge.org/BashFAQ/050 – glenn jackman Aug 26 '17 at 12:44
  • 3
    Go to the source for the rules: the bash manual. Pay close attention to section 3.5 "Shell Expansions", especially word splitting and filename expansion -- these 2 factors are what you use quotes to control. – glenn jackman Aug 26 '17 at 12:49
  • 3
  • 4
    I think it helps to understand how command line arguments work at a low level. When a program is executed, it receives arguments as a list of lists of characters (close enough). Each inner list is what we call an "argument." Most programs depend on logical separation between args. Here, you see that wget doesn't know what --mirror --no-host-directories means (as one argument), but it handles it when it's split into two arguments. Very few programs treat spaces and quotes specially once they are inside the argument vector. The problem is that bash, and other shells, are meant to be > – HTNW Aug 27 '17 at 01:56
  • 2

    used by humans. It'd be annoying to manually define the boundaries between arguments, so shells split on whitespace to turn a line (a list of characters) into an argument vector (a list of lists of characters). Variable expansion is one the first expansions bash does, so you can imagine that $a is exactly equivalent to directly writing its contents. Now the issue is evident: a="-a -b"; cmd "$a" expands to cmd "-a -b", but cmd probably doesn't know what that means. cmd $a expands to cmd -a -b, which probably does work.

    – HTNW Aug 27 '17 at 02:04
  • l̶i̶t̶t̶l̶e̶ ̶k̶n̶o̶w̶n̶ ̶b̶a̶s̶h̶ ̶f̶a̶c̶t̶ (edit: see this included in another answer already!) variable assignment doesn't require quotes; e.g., a=$b is fine, no need to use a="$b", unless the RHS is an expression; and even then, the form $(...) acts as quotes, e.g., this is fine: a=$(...), no need to do a="$(...)". – michael Aug 27 '17 at 02:55
  • @HTNW, that's an oversimplification that can end up being misleading. The tokenisation done by the shell is very different from the split+glob operator applied to expansions. cmd a b is very different from a='a b'; cmd $a. Parameter expansion is not some sort of macro expansion like it was in the Thompson shell or it is with C macros or shell aliases. Think for instance of cases like a='a; reboot' or a='$(uname)' or a='a\ b', or a='a:b' IFS=:. – Stéphane Chazelas Aug 28 '17 at 10:47

3 Answers3

34

The most robust way to code that is to use an array:

wget_options=(
    --mirror 
    --no-host-directories
    --directory_prefix="$1"
)
wget "${wget_options[@]}" "$2/$3"
glenn jackman
  • 85,964
  • This is the right answer. Reference – l0b0 Aug 27 '17 at 09:34
  • 2
    It's a good answer, so I upvoted it but Kusalanda's helped me more to understand why my code was wrong and I can accept only one. – z32a7ul Aug 27 '17 at 12:23
  • I was running into a world of trouble until someone on the rsync list showed me this construct. It is particularly helpful if some of the elements might be empty strings. This makes empty strings disappear. Some commands like cp and rsync will do unexpected things if your command expands to something like rsync '' rest of parameters. This is great for building a command piece by piece conditionally and then just running it once in one place. – Joe Sep 02 '17 at 06:06
28

Basically, you should double quote variable expansions to protect them from word splitting (and filename generation). However, in your example,

wget_options='--mirror --no-host-directories'
wget $wget_options --directory_prefix="$local_root" "$remote_root$relative_path"

word splitting is exactly what you want.

With "$wget_options" (quoted), wget doesn't know what to do with the single argument --mirror --no-host-directories and complains

wget: unknown option -- mirror --no-host-directories

For wget to see the two options --mirror and --no-host-directories as separate, word splitting has to occur.

There are more robust ways of doing this. If you are using bash or any other shell that uses arrays like bash do, see glenn jackman's answer. Gilles' answer additionally describes an alternative solution for plainer shells such as the standard /bin/sh. Both essentially store each option as a separate element in an array.

Related question with good answers: Why does my shell script choke on whitespace or other special characters?


Double quoting variable expansions is a good rule of thumb. Do that. Then be aware of the very few cases where you shouldn't do that. These will present themselves to you through diagnostic messages, such as the above error message.

There are also a few cases where you don't need to quote variable expansions. But it's easier to continue using double quotes anyway as it doesn't make much difference. One such case is

variable=$other_variable

Another one is

case $variable in
    ...) ... ;;
esac
Kusalananda
  • 333,661
  • 2
    Before using that split+glob operator, one may need to make sure that $IFS contains the right value. Here you need to split on space and the text happens not to contain any tab or newline, so the default value of $IFS would do, but if that code is to be used in a function that may be called in a context where $IFS could have been modified, you'd want to set $IFS beforehand (and possibly restore it afterwards or use a local scope for it if the rest of the code assumes an unmodified $IFS) – Stéphane Chazelas Aug 28 '17 at 10:20
16

You're trying to store a list of strings in a string variable. It doesn't fit. No matter how you access the variable, something is broken.

wget_options='--mirror --no-host-directories' sets the variable wget_options to a string that contains a space. At this point, there is no way to know whether the space is supposed to be part of an option, or a separator between options.

When you access the variable with a quoted substitution wget "$wget_options", the value of the variable is used as a string. This means that it's passed as a single parameter to wget, so it's a single option. This breaks in your case because you intended it to mean multiple options.

When you use an unquoted substitution wget $wget_options, the value of the string variable undergoes an expansion process nicknamed “split+glob”:

  1. Take the value of the variable and split it into whitespace-delimited parts (assuming you have not modified the $IFS variable). This results in an intermediate list of strings.
  2. For each element of the intermediate list, if it is a wildcard pattern that matches one or more files, replace that element by the list of matching files.

This happens to work in your example, because the splitting process turns the space into a separator, but doesn't work in general since an option could contain spaces and wildcard characters.

In ksh, bash, yash and zsh, you can use an array variable. An array in shell terminology is a list of strings, so there is no loss of information. To make an array variable, put parentheses around the array elements when assigning the value to the variable. To access all the elements of the array, use "${VARIABLE[@]}" — this is a generalization of "$@", which forms a list from the elements of the array. Note that you need the double quotes here too, otherwise each element undergoes split+glob.

wget_options=(--mirror --no-host-directories --user-agent="I can haz spaces")
wget "${wget_options[@]}" …

In plain sh, there are no array variables. If you don't mind losing the positional arguments, you can use them to store one list of strings.

set -- --mirror --no-host-directories --user-agent="I can haz spaces"
wget "$@" …

For more information, see Why does my shell script choke on whitespace or other special characters?