In Bash, when specifying command line arguments to a command, what characters are required to be escaped?
Are they limited to the metacharacters of Bash: space, tab,
|
, &
, ;
, (
, )
, <
, and >
?
In Bash, when specifying command line arguments to a command, what characters are required to be escaped?
Are they limited to the metacharacters of Bash: space, tab,
|
, &
, ;
, (
, )
, <
, and >
?
The following characters have special meaning to the shell itself in some contexts and may need to be escaped in arguments:
Character | Unicode | Name | Usage |
---|---|---|---|
` |
U+0060 (Grave Accent) | Backtick | Command substitution |
~ |
U+007E | Tilde | Tilde expansion |
! |
U+0021 | Exclamation mark | History expansion |
# |
U+0023 Number sign | Hash | Comments |
$ |
U+0024 | Dollar sign | Parameter expansion |
& |
U+0026 | Ampersand | Background commands |
* |
U+002A | Asterisk | Filename expansion and globbing |
( |
U+0028 | Left Parenthesis | Subshells |
) |
U+0029 | Right Parenthesis | Subshells |
|
U+0009 | Tab (⇥ ) |
Word splitting (whitespace) |
{ |
U+007B Left Curly Bracket | Left brace | Brace expansion |
[ |
U+005B | Left Square Bracket | Filename expansion and globbing |
| |
U+007C Vertical Line | Vertical bar | Pipelines |
\ |
U+005C Reverse Solidus | Backslash | Escape character |
; |
U+003B | Semicolon | Separating commands |
' |
U+0027 Apostrophe | Single quote | String quoting |
" |
U+0022 Quotation Mark | Double quote | String quoting with interpolation |
↩ |
U+000A Line Feed | Newline | Line break |
< |
U+003C | Less than | Input redirection |
> |
U+003E | Greater than | Output redirection |
? |
U+003F | Question mark | Filename expansion and globbing |
|
U+0020 | Space | Word splitting1 (whitespace) |
Some of those characters are used for more things and in more places than the one I linked.
There are a few corner cases that are explicitly optional:
!
can be disabled with set +H
, which is the default in non-interactive shells.{
can be disabled with set +B
.*
and ?
can be disabled with set -f
or set -o noglob
.=
Equals sign (U+003D) also needs to be escaped if set -k
or set -o keyword
is enabled.Escaping a newline requires quoting — backslashes won't do the job. Any other characters listed in IFS will need similar handling. You don't need to escape ]
or }
, but you do need to escape )
because it's an operator.
Some of these characters have tighter limits on when they truly need escaping than others. For example, a#b
is ok, but a #b
is a comment, while >
would need escaping in both contexts. It doesn't hurt to escape them all conservatively anyway, and it's easier than remembering the fine distinctions.
If your command name itself is a shell keyword (if
, for
, do
) then you'll need to escape or quote it too. The only interesting one of those is in
, because it's not obvious that it's always a keyword. You don't need to do that for keywords used in arguments, only when you've (foolishly!) named a command after one of them. Shell operators ((
, &
, etc) always need quoting wherever they are.
1Stéphane has noted that any other single-byte blank character from your locale also needs escaping. In most common, sensible locales, at least those based on C or UTF-8, it's only the whitespace characters above. In some ISO-8859-1 locales, U+00A0 no-break space is considered blank, including Solaris, the BSDs, and OS X (I think incorrectly). If you're dealing with an arbitrary unknown locale, it could include just about anything, including letters, so good luck.
Conceivably, a single byte considered blank could appear within a multi-byte character that wasn't blank, and you'd have no way to escape that other than putting the whole thing in quotes. This isn't a theoretical concern: in an ISO-8859-1 locale from above, that A0
byte which is considered a blank can appear within multibyte characters like UTF-8 encoded "à" (C3 A0
). To handle those characters safely you would need to quote them "à"
. This behaviour depends on the locale configuration in the environment running the script, not the one where you wrote it.
I think this behaviour is broken multiple ways, but we have to play the hand we're dealt. If you're working with any non-self-synchronising multibyte character set, the safest thing would be to quote everything. If you're in UTF-8 or C, you're safe (for the moment).
!
when csh history expansion is enabled, typically not in scripts. [ ! -f a ]
or find . ! -name...
are fine. That's covered by your tighter limits section but maybe worth mentioning explicitly.
– Stéphane Chazelas
Mar 20 '16 at 07:30
hash[foo"]"]=
, ${var-foo"}"}
, [[ "!" = b ]]
, [[ a = "]]" ]]
, the regexp operators for [[ x =~ ".+[" ]]
. Other keywords than {
(if
, while
, for
...) would need to be quoted so they're not recognised as such...
– Stéphane Chazelas
Mar 20 '16 at 07:48
]
), so I'm not listing them. I don't think any keyword needs quoting in argument position.
– Michael Homer
Mar 20 '16 at 08:02
*?[+@
(pathname expansions), Job control %
the builtins :
and .
and the control characters that make up white space: In the POSIX locale, white space consists of one or more <blank> ( <space> and <tab> characters), <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters.
(A filename with any of those characters need to be quoted).
–
Mar 20 '16 at 21:59
touch -- -awe
) needs ls ./-awe
at the very least. You may not call that "quoting", but it is: escaping troubling characters
. The builtins :
and .
need quoting as argv[0] if an alias exist (alias .='source nothing.sh'
), then \.
will actually execute the builtin (not the alias). Maybe you are right about Job control. But I hope that pathname expansions will not raise any complaint.
–
Mar 20 '16 at 23:02
[[:blank:]]
(which in the C locale is TAB and space), not [[:space:]]
, the FF, CR, VT... don't need quoting (except maybe for CR on some Microsoft ports of bash). In UTF-8, all the non-ascii characters are multi-byte and so fall into that current bug of bash. But for instance on latin1 locales on Solaris, 0xa0, the non-breaking space is a [[:blank:]]
, so needs quoting (even though the whole point of that character should be that it doesn't break...)
– Stéphane Chazelas
Mar 21 '16 at 07:36
LC_ALL=en_GB.ISO8859-1 bash -c $'printf "%s," a\xa0b'
outputs a,b,
there.
– Stéphane Chazelas
Mar 21 '16 at 07:39
xargs
, and would probably apply to the grammar of awk
or bc
for instance. I had said at the time I would bring it up to the austin group mailing list, but never gotten round to do it. I'll try and give it a go.
– Stéphane Chazelas
Mar 21 '16 at 09:11
à
in echo Voilà | iconv -f utf-8
in a script, as if called in a latin1 locale on those systems, the 0xa0 byte in that à
character would be taken as a token separator. IOW, the script is parsed based on the locale of the user, not his author's which sounds wrong to me.
– Stéphane Chazelas
Mar 21 '16 at 09:23
isblank
). Those locales seem broken to me.
– Michael Homer
Mar 21 '16 at 09:32
C
. In fact it is written to include any your locale
. A locale may include anything as it see fit in the blank category. Of which you give an example. In the broader view of what a language may interpret as blanks, the Unicode white space list serves as the most extreme example. That's why I presented it.
–
Mar 21 '16 at 19:30
bash -c $'printf "%s," a\xa0t'
you are obviously asking for a byte (\x..
), which bash correctly gives. Even in a utf8 locale, this: LC_ALL=en_US.utf8 bash -c $'printf "%s," a\xa0t'
will produce a byte with value 0xA0
(which renders as a broken character, which it is in a utf-8 locale). The additional language effects that byte might have depend on the locale description. Which looks broken on the Solaris you describe.
–
Mar 21 '16 at 19:30
[:blank:]
in Unicode is [\p{Zs}\t]
. A \p{Zs}
(or \p{Space_Separator}) is : a whitespace character that is invisible, but does take up space. Similar to this list from EM
to HAIR
space.
–
Mar 21 '16 at 19:50
a
or "
itself is a blank, but then you can't expect much to work with that. See the discussions there have around CVE-2014-0475 there have been at the time.
– Stéphane Chazelas
Mar 21 '16 at 21:14
bash -c $'printf "%s," a\xa0t'
, it's my shell, not bash
that expands the $'\xa0'
, bash sees a nbsp character in between the a and t which it treats as a token delimiter when called in a latin1 locale on Solaris/OS/X...
– Stéphane Chazelas
Mar 21 '16 at 21:16
In GNU Parallel this is tested and used extensively:
$a =~ s/[\002-\011\013-\032\\\#\?\`\(\)\{\}\[\]\^\*\<\=\>\~\|\; \"\!\$\&\'\202-\377]/\\$&/go;
# quote newline as '\n'
$a =~ s/[\n]/'\n'/go;
It is tested in bash
,dash
,ash
,ksh
,zsh
, and fish
. Some of the characters do not need quoting in some (versions) of the shells, but the above works in all tested shells.
If you simply want a string quoted, you can pipe it into parallel --shellquote
:
printf "&*\t*!" | parallel --shellquote
parallel
chose to solve the greater problem of knowing all characters that need escaping across a wide varienty of shells than the much lesser problem of just wrapping everything with '
and escaping '
into '\''
, which would also work on all of the shells mentioned here. But cool that they have a battle-tested list.
– mtraceur
Jul 19 '23 at 18:40
'
works in every shell listed in this answer, and every Bourne-style shell since the original Bourne shell. Maybe in older fish
versions it didn't, since fish
isn't a Bourne-like? (But I installed fish
just to check before writing my comment here, and it does work now.)
– mtraceur
Jul 21 '23 at 19:54
--shellquote
also support other shells which are neither listed here nor Bourne-like, such as csh
/tcsh
? If so, then that totally makes sense, because iirc the csh-like shells had a different interpretation of '
-quoting.
– mtraceur
Jul 21 '23 at 22:57
rc
is weird, and csh
is pretty fucked up, but they are supported.
– Ole Tange
Jul 22 '23 at 01:03
For lightweight escaping solution in Perl, I'm following the principle of single quotes. A Bash-string in single quotes can have any character, except the single quote itself.
My code:
my $bash_reserved_characters_re = qr([ !"#$&'()*;<>?\[\\`{|~\t\n]);
while(<>) {
if (/$bash_reserved_characters_re/) {
my $quoted = s/'/'"'"'/gr;
print "'$quoted'";
} else {
print $_;
}
}
Example run 1:
$ echo -n "abc" | perl escape_bash_special_chars.pl
abc
Example run 2:
echo "abc" | perl escape_bash_special_chars.pl
'abc
'
Example run 3:
echo -n 'ab^c' | perl escape_bash_special_chars.pl
ab^c
Example run 4:
echo -n 'ab~c' | perl escape_bash_special_chars.pl
'ab~c'
Example run 5:
echo -n "ab'c" | perl escape_bash_special_chars.pl
'ab'"'"'c'
echo 'ab'"'"'c'
ab'c
'
), then the only character you need to escape is the single-quote itself, which you replace with'\''
. Sowhatever$"*(>|&; whatever
just becomes'whatever$"*(>|&; whatever'
and you're done, andwhatever$"*(>|&; 'whatever
(note the nested'
) becomes'whatever$"*(>|&; '\''whatever'
. Simple. Universal. – mtraceur Jul 19 '23 at 18:14