In Bash, when specifying command line arguments to a command, what characters are required to be escaped?
Are they limited to the metacharacters of Bash: space, tab,
|, &, ;, (, ), <, and >?
In Bash, when specifying command line arguments to a command, what characters are required to be escaped?
Are they limited to the metacharacters of Bash: space, tab,
|, &, ;, (, ), <, and >?
The following characters have special meaning to the shell itself in some contexts and may need to be escaped in arguments:
| Character | Unicode | Name | Usage |
|---|---|---|---|
` |
U+0060 (Grave Accent) | Backtick | Command substitution |
~ |
U+007E | Tilde | Tilde expansion |
! |
U+0021 | Exclamation mark | History expansion |
# |
U+0023 Number sign | Hash | Comments |
$ |
U+0024 | Dollar sign | Parameter expansion |
& |
U+0026 | Ampersand | Background commands |
* |
U+002A | Asterisk | Filename expansion and globbing |
( |
U+0028 | Left Parenthesis | Subshells |
) |
U+0029 | Right Parenthesis | Subshells |
|
U+0009 | Tab (⇥) |
Word splitting (whitespace) |
{ |
U+007B Left Curly Bracket | Left brace | Brace expansion |
[ |
U+005B | Left Square Bracket | Filename expansion and globbing |
| |
U+007C Vertical Line | Vertical bar | Pipelines |
\ |
U+005C Reverse Solidus | Backslash | Escape character |
; |
U+003B | Semicolon | Separating commands |
' |
U+0027 Apostrophe | Single quote | String quoting |
" |
U+0022 Quotation Mark | Double quote | String quoting with interpolation |
↩ |
U+000A Line Feed | Newline | Line break |
< |
U+003C | Less than | Input redirection |
> |
U+003E | Greater than | Output redirection |
? |
U+003F | Question mark | Filename expansion and globbing |
|
U+0020 | Space | Word splitting1 (whitespace) |
Some of those characters are used for more things and in more places than the one I linked.
There are a few corner cases that are explicitly optional:
! can be disabled with set +H, which is the default in non-interactive shells.{ can be disabled with set +B.* and ? can be disabled with set -f or set -o noglob.= Equals sign (U+003D) also needs to be escaped if set -k or set -o keyword is enabled.Escaping a newline requires quoting — backslashes won't do the job. Any other characters listed in IFS will need similar handling. You don't need to escape ] or }, but you do need to escape ) because it's an operator.
Some of these characters have tighter limits on when they truly need escaping than others. For example, a#b is ok, but a #b is a comment, while > would need escaping in both contexts. It doesn't hurt to escape them all conservatively anyway, and it's easier than remembering the fine distinctions.
If your command name itself is a shell keyword (if, for, do) then you'll need to escape or quote it too. The only interesting one of those is in, because it's not obvious that it's always a keyword. You don't need to do that for keywords used in arguments, only when you've (foolishly!) named a command after one of them. Shell operators ((, &, etc) always need quoting wherever they are.
1Stéphane has noted that any other single-byte blank character from your locale also needs escaping. In most common, sensible locales, at least those based on C or UTF-8, it's only the whitespace characters above. In some ISO-8859-1 locales, U+00A0 no-break space is considered blank, including Solaris, the BSDs, and OS X (I think incorrectly). If you're dealing with an arbitrary unknown locale, it could include just about anything, including letters, so good luck.
Conceivably, a single byte considered blank could appear within a multi-byte character that wasn't blank, and you'd have no way to escape that other than putting the whole thing in quotes. This isn't a theoretical concern: in an ISO-8859-1 locale from above, that A0 byte which is considered a blank can appear within multibyte characters like UTF-8 encoded "à" (C3 A0). To handle those characters safely you would need to quote them "à". This behaviour depends on the locale configuration in the environment running the script, not the one where you wrote it.
I think this behaviour is broken multiple ways, but we have to play the hand we're dealt. If you're working with any non-self-synchronising multibyte character set, the safest thing would be to quote everything. If you're in UTF-8 or C, you're safe (for the moment).
! when csh history expansion is enabled, typically not in scripts. [ ! -f a ] or find . ! -name... are fine. That's covered by your tighter limits section but maybe worth mentioning explicitly.
– Stéphane Chazelas
Mar 20 '16 at 07:30
hash[foo"]"]=, ${var-foo"}"}, [[ "!" = b ]], [[ a = "]]" ]], the regexp operators for [[ x =~ ".+[" ]]. Other keywords than { (if, while, for...) would need to be quoted so they're not recognised as such...
– Stéphane Chazelas
Mar 20 '16 at 07:48
]), so I'm not listing them. I don't think any keyword needs quoting in argument position.
– Michael Homer
Mar 20 '16 at 08:02
*?[+@ (pathname expansions), Job control % the builtins : and . and the control characters that make up white space: In the POSIX locale, white space consists of one or more <blank> ( <space> and <tab> characters), <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters. (A filename with any of those characters need to be quoted).
–
Mar 20 '16 at 21:59
touch -- -awe) needs ls ./-awe at the very least. You may not call that "quoting", but it is: escaping troubling characters. The builtins : and . need quoting as argv[0] if an alias exist (alias .='source nothing.sh'), then \. will actually execute the builtin (not the alias). Maybe you are right about Job control. But I hope that pathname expansions will not raise any complaint.
–
Mar 20 '16 at 23:02
[[:blank:]] (which in the C locale is TAB and space), not [[:space:]], the FF, CR, VT... don't need quoting (except maybe for CR on some Microsoft ports of bash). In UTF-8, all the non-ascii characters are multi-byte and so fall into that current bug of bash. But for instance on latin1 locales on Solaris, 0xa0, the non-breaking space is a [[:blank:]], so needs quoting (even though the whole point of that character should be that it doesn't break...)
– Stéphane Chazelas
Mar 21 '16 at 07:36
LC_ALL=en_GB.ISO8859-1 bash -c $'printf "%s," a\xa0b' outputs a,b, there.
– Stéphane Chazelas
Mar 21 '16 at 07:39
xargs, and would probably apply to the grammar of awk or bc for instance. I had said at the time I would bring it up to the austin group mailing list, but never gotten round to do it. I'll try and give it a go.
– Stéphane Chazelas
Mar 21 '16 at 09:11
à in echo Voilà | iconv -f utf-8 in a script, as if called in a latin1 locale on those systems, the 0xa0 byte in that à character would be taken as a token separator. IOW, the script is parsed based on the locale of the user, not his author's which sounds wrong to me.
– Stéphane Chazelas
Mar 21 '16 at 09:23
isblank). Those locales seem broken to me.
– Michael Homer
Mar 21 '16 at 09:32
C. In fact it is written to include any your locale. A locale may include anything as it see fit in the blank category. Of which you give an example. In the broader view of what a language may interpret as blanks, the Unicode white space list serves as the most extreme example. That's why I presented it.
–
Mar 21 '16 at 19:30
bash -c $'printf "%s," a\xa0t' you are obviously asking for a byte (\x..), which bash correctly gives. Even in a utf8 locale, this: LC_ALL=en_US.utf8 bash -c $'printf "%s," a\xa0t' will produce a byte with value 0xA0 (which renders as a broken character, which it is in a utf-8 locale). The additional language effects that byte might have depend on the locale description. Which looks broken on the Solaris you describe.
–
Mar 21 '16 at 19:30
[:blank:] in Unicode is [\p{Zs}\t]. A \p{Zs} (or \p{Space_Separator}) is : a whitespace character that is invisible, but does take up space. Similar to this list from EM to HAIR space.
–
Mar 21 '16 at 19:50
a or " itself is a blank, but then you can't expect much to work with that. See the discussions there have around CVE-2014-0475 there have been at the time.
– Stéphane Chazelas
Mar 21 '16 at 21:14
bash -c $'printf "%s," a\xa0t', it's my shell, not bash that expands the $'\xa0', bash sees a nbsp character in between the a and t which it treats as a token delimiter when called in a latin1 locale on Solaris/OS/X...
– Stéphane Chazelas
Mar 21 '16 at 21:16
In GNU Parallel this is tested and used extensively:
$a =~ s/[\002-\011\013-\032\\\#\?\`\(\)\{\}\[\]\^\*\<\=\>\~\|\; \"\!\$\&\'\202-\377]/\\$&/go;
# quote newline as '\n'
$a =~ s/[\n]/'\n'/go;
It is tested in bash,dash,ash,ksh,zsh, and fish. Some of the characters do not need quoting in some (versions) of the shells, but the above works in all tested shells.
If you simply want a string quoted, you can pipe it into parallel --shellquote:
printf "&*\t*!" | parallel --shellquote
parallel chose to solve the greater problem of knowing all characters that need escaping across a wide varienty of shells than the much lesser problem of just wrapping everything with ' and escaping ' into '\'', which would also work on all of the shells mentioned here. But cool that they have a battle-tested list.
– mtraceur
Jul 19 '23 at 18:40
' works in every shell listed in this answer, and every Bourne-style shell since the original Bourne shell. Maybe in older fish versions it didn't, since fish isn't a Bourne-like? (But I installed fish just to check before writing my comment here, and it does work now.)
– mtraceur
Jul 21 '23 at 19:54
--shellquote also support other shells which are neither listed here nor Bourne-like, such as csh/tcsh? If so, then that totally makes sense, because iirc the csh-like shells had a different interpretation of '-quoting.
– mtraceur
Jul 21 '23 at 22:57
rc is weird, and csh is pretty fucked up, but they are supported.
– Ole Tange
Jul 22 '23 at 01:03
For lightweight escaping solution in Perl, I'm following the principle of single quotes. A Bash-string in single quotes can have any character, except the single quote itself.
My code:
my $bash_reserved_characters_re = qr([ !"#$&'()*;<>?\[\\`{|~\t\n]);
while(<>) {
if (/$bash_reserved_characters_re/) {
my $quoted = s/'/'"'"'/gr;
print "'$quoted'";
} else {
print $_;
}
}
Example run 1:
$ echo -n "abc" | perl escape_bash_special_chars.pl
abc
Example run 2:
echo "abc" | perl escape_bash_special_chars.pl
'abc
'
Example run 3:
echo -n 'ab^c' | perl escape_bash_special_chars.pl
ab^c
Example run 4:
echo -n 'ab~c' | perl escape_bash_special_chars.pl
'ab~c'
Example run 5:
echo -n "ab'c" | perl escape_bash_special_chars.pl
'ab'"'"'c'
echo 'ab'"'"'c'
ab'c
'), then the only character you need to escape is the single-quote itself, which you replace with'\''. Sowhatever$"*(>|&; whateverjust becomes'whatever$"*(>|&; whatever'and you're done, andwhatever$"*(>|&; 'whatever(note the nested') becomes'whatever$"*(>|&; '\''whatever'. Simple. Universal. – mtraceur Jul 19 '23 at 18:14