24

In Bash, when specifying command line arguments to a command, what characters are required to be escaped?

Are they limited to the metacharacters of Bash: space, tab, |, &, ;, (, ), <, and >?

ilkkachu
  • 138,973
Tim
  • 101,790
  • Don't forget (possible) filename globbing with * and ? – Jeff Schaller Mar 20 '16 at 03:41
  • Thanks. Could you exhaustively list the kinds of characters which need to be escaped in cmd line args? – Tim Mar 20 '16 at 03:48
  • The list is good to have, but the most important thing to understand about quoting, is: Everything between single quotes is passed literally and without word splitting. No exceptions. (This means there is no way whatsoever to embed a single quote within single quotes, by the way, but that's easy to work around.) – Wildcard Mar 20 '16 at 08:19
  • 1
    To illustrate @Wildcard's point: you wrap the entire argument with single quotes ('), then the only character you need to escape is the single-quote itself, which you replace with '\''. So whatever$"*(>|&; whatever just becomes 'whatever$"*(>|&; whatever' and you're done, and whatever$"*(>|&; 'whatever (note the nested ') becomes 'whatever$"*(>|&; '\''whatever'. Simple. Universal. – mtraceur Jul 19 '23 at 18:14

3 Answers3

31

The following characters have special meaning to the shell itself in some contexts and may need to be escaped in arguments:

Character Unicode Name Usage
` U+0060 (Grave Accent) Backtick Command substitution
~ U+007E Tilde Tilde expansion
! U+0021 Exclamation mark History expansion
# U+0023 Number sign Hash Comments
$ U+0024 Dollar sign Parameter expansion
& U+0026 Ampersand Background commands
* U+002A Asterisk Filename expansion and globbing
( U+0028 Left Parenthesis Subshells
) U+0029 Right Parenthesis Subshells
U+0009 Tab () Word splitting (whitespace)
{ U+007B Left Curly Bracket Left brace Brace expansion
[ U+005B Left Square Bracket Filename expansion and globbing
| U+007C Vertical Line Vertical bar Pipelines
\ U+005C Reverse Solidus Backslash Escape character
; U+003B Semicolon Separating commands
' U+0027 Apostrophe Single quote String quoting
" U+0022 Quotation Mark Double quote String quoting with interpolation
U+000A Line Feed Newline Line break
< U+003C Less than Input redirection
> U+003E Greater than Output redirection
? U+003F Question mark Filename expansion and globbing
U+0020 Space Word splitting1 (whitespace)

Some of those characters are used for more things and in more places than the one I linked.


There are a few corner cases that are explicitly optional:


Escaping a newline requires quoting — backslashes won't do the job. Any other characters listed in IFS will need similar handling. You don't need to escape ] or }, but you do need to escape ) because it's an operator.

Some of these characters have tighter limits on when they truly need escaping than others. For example, a#b is ok, but a #b is a comment, while > would need escaping in both contexts. It doesn't hurt to escape them all conservatively anyway, and it's easier than remembering the fine distinctions.

If your command name itself is a shell keyword (if, for, do) then you'll need to escape or quote it too. The only interesting one of those is in, because it's not obvious that it's always a keyword. You don't need to do that for keywords used in arguments, only when you've (foolishly!) named a command after one of them. Shell operators ((, &, etc) always need quoting wherever they are.


1Stéphane has noted that any other single-byte blank character from your locale also needs escaping. In most common, sensible locales, at least those based on C or UTF-8, it's only the whitespace characters above. In some ISO-8859-1 locales, U+00A0 no-break space is considered blank, including Solaris, the BSDs, and OS X (I think incorrectly). If you're dealing with an arbitrary unknown locale, it could include just about anything, including letters, so good luck.

Conceivably, a single byte considered blank could appear within a multi-byte character that wasn't blank, and you'd have no way to escape that other than putting the whole thing in quotes. This isn't a theoretical concern: in an ISO-8859-1 locale from above, that A0 byte which is considered a blank can appear within multibyte characters like UTF-8 encoded "à" (C3 A0). To handle those characters safely you would need to quote them "à". This behaviour depends on the locale configuration in the environment running the script, not the one where you wrote it.

I think this behaviour is broken multiple ways, but we have to play the hand we're dealt. If you're working with any non-self-synchronising multibyte character set, the safest thing would be to quote everything. If you're in UTF-8 or C, you're safe (for the moment).

Michael Homer
  • 76,565
  • Other blanks in your locale would need escaping as well (except currently the multi-byte one because of a bug) – Stéphane Chazelas Mar 20 '16 at 07:25
  • You only need to escape ! when csh history expansion is enabled, typically not in scripts. [ ! -f a ] or find . ! -name... are fine. That's covered by your tighter limits section but maybe worth mentioning explicitly. – Stéphane Chazelas Mar 20 '16 at 07:30
  • Note that there are contexts where other characters need quoting like: hash[foo"]"]=, ${var-foo"}"}, [[ "!" = b ]], [[ a = "]]" ]], the regexp operators for [[ x =~ ".+[" ]]. Other keywords than { (if, while, for...) would need to be quoted so they're not recognised as such... – Stéphane Chazelas Mar 20 '16 at 07:48
  • To the extent that those are command-line arguments at all, the interpretation is up to the command in question (just like ]), so I'm not listing them. I don't think any keyword needs quoting in argument position. – Michael Homer Mar 20 '16 at 08:02
  • If you mean in arguments other than the first (zeroth)? Then yes. – Stéphane Chazelas Mar 20 '16 at 08:18
  • I assumed the question was about arguments "to a command" in the shell-syntax sense, but I suppose you're right that argv[0] is strictly an argument too. If Tim wants to edit to clarify that point I'll update the list, but otherwise I'll let the assumption stand. – Michael Homer Mar 20 '16 at 08:25
  • Thanks. I didn't explicitly mention the zeroth argument, neither did I explicitly realize that. But I think Stephane is correct, and I agree. – Tim Mar 20 '16 at 14:56
  • A dash - (used for options), *?[+@ (pathname expansions), Job control % the builtins : and . and the control characters that make up white space: In the POSIX locale, white space consists of one or more <blank> ( <space> and <tab> characters), <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters. (A filename with any of those characters need to be quoted). –  Mar 20 '16 at 21:59
  • 2
    Quoting builtins, dashes, or % doesn't do anything. – Michael Homer Mar 20 '16 at 22:06
  • A dash in a filename (touch -- -awe) needs ls ./-awe at the very least. You may not call that "quoting", but it is: escaping troubling characters. The builtins : and . need quoting as argv[0] if an alias exist (alias .='source nothing.sh'), then \. will actually execute the builtin (not the alias). Maybe you are right about Job control. But I hope that pathname expansions will not raise any complaint. –  Mar 20 '16 at 23:02
  • @StéphaneChazelas White space in UNICODE is hardly only space and tab It include, at least, hex 0x09, 0x0A, 0x0B, 0x0C, 0x0D, and (which may raise some debate but it is a single byte whitespace in cp-1252 at least) 0x85 (Horizontal Ellipsis). –  Mar 20 '16 at 23:29
  • @BinaryZebra, we're talking of [[:blank:]] (which in the C locale is TAB and space), not [[:space:]], the FF, CR, VT... don't need quoting (except maybe for CR on some Microsoft ports of bash). In UTF-8, all the non-ascii characters are multi-byte and so fall into that current bug of bash. But for instance on latin1 locales on Solaris, 0xa0, the non-breaking space is a [[:blank:]], so needs quoting (even though the whole point of that character should be that it doesn't break...) – Stéphane Chazelas Mar 21 '16 at 07:36
  • @MichaelHomer, on Solaris in iso-8859-1 locales, 0xa0 is a blank. LC_ALL=en_GB.ISO8859-1 bash -c $'printf "%s," a\xa0b' outputs a,b, there. – Stéphane Chazelas Mar 21 '16 at 07:39
  • @StéphaneChazelas: Not just there - it seems to be the case on BSDs and OS X too. That seems clearly wrong. – Michael Homer Mar 21 '16 at 08:06
  • @MichaelHomer, I agree it's not desirable. But that seems to be what POSIX requires. Same applies to xargs, and would probably apply to the grammar of awk or bc for instance. I had said at the time I would bring it up to the austin group mailing list, but never gotten round to do it. I'll try and give it a go. – Stéphane Chazelas Mar 21 '16 at 09:11
  • You hinted to it already, but that means that you need to quote the à in echo Voilà | iconv -f utf-8 in a script, as if called in a latin1 locale on those systems, the 0xa0 byte in that à character would be taken as a token separator. IOW, the script is parsed based on the locale of the user, not his author's which sounds wrong to me. – Stéphane Chazelas Mar 21 '16 at 09:23
  • POSIX doesn't seem to require that a non-breaking space be blank, though, and I don't think it's reasonably "used to separate words within a line of text" (which is the phrasing from C99 for isblank). Those locales seem broken to me. – Michael Homer Mar 21 '16 at 09:32
  • @StéphaneChazelas What you wrote was "Other blanks in your locale". That is not limited in any way to C. In fact it is written to include any your locale. A locale may include anything as it see fit in the blank category. Of which you give an example. In the broader view of what a language may interpret as blanks, the Unicode white space list serves as the most extreme example. That's why I presented it. –  Mar 21 '16 at 19:30
  • @StéphaneChazelas In bash -c $'printf "%s," a\xa0t' you are obviously asking for a byte (\x..), which bash correctly gives. Even in a utf8 locale, this: LC_ALL=en_US.utf8 bash -c $'printf "%s," a\xa0t' will produce a byte with value 0xA0 (which renders as a broken character, which it is in a utf-8 locale). The additional language effects that byte might have depend on the locale description. Which looks broken on the Solaris you describe. –  Mar 21 '16 at 19:30
  • @BinaryZebra: "blank" has a specific meaning of word separators (see e.g. ISO C99, incorporated by reference into POSIX), which form feeds &c don't meet. An uncontrolled locale could include such characters as blanks, incorrectly, but that's far from the most extreme case - everything could be a blank in that locale. Which, really, is the problem with the POSIX tokenisation requirement. – Michael Homer Mar 21 '16 at 19:36
  • @StéphaneChazelas According to this: A [:blank:] in Unicode is [\p{Zs}\t]. A \p{Zs} (or \p{Space_Separator}) is : a whitespace character that is invisible, but does take up space. Similar to this list from EM to HAIR space. –  Mar 21 '16 at 19:50
  • @MichaelHomer The problem is that you try to look at the issue only from the programing POV (Point of view). Yes, it is desirable to have a clear list of [[:blank:]] characters. Which a default locale for C language of "C" does very well ( it only means 0x20 0x09) but if you are to embrace languages and language definitions the list may (and in fact does drift) change (with all the security issues you may want to add to such change). But expecting a locale of en_US.utf8 (or most others) to strictly fit the (very limited) view of only 0x20 0x09 for [[:blank:]] is simply wrong. –  Mar 21 '16 at 19:57
  • @BinaryZebra, nobody said en_US.utf8 blanks only had 0x9 and 0x20, just that its only single-byte blanks were 0x9 and 0x20 as the only single-byte characters in UTF-8 are the ASCII ones and no other character in ASCII fit the definition of blank. Of course, you could construct a malicious locale where a or " itself is a blank, but then you can't expect much to work with that. See the discussions there have around CVE-2014-0475 there have been at the time. – Stéphane Chazelas Mar 21 '16 at 21:14
  • @BinaryZebra, in bash -c $'printf "%s," a\xa0t', it's my shell, not bash that expands the $'\xa0', bash sees a nbsp character in between the a and t which it treats as a token delimiter when called in a latin1 locale on Solaris/OS/X... – Stéphane Chazelas Mar 21 '16 at 21:16
  • @StéphaneChazelas This dual talk about bytes and code points is wrong. A character is whichever encoding it is, whether it is one byte, two bytes or 10 bytes it is deeply irrelevant (as long as it is a valid character in whichever encoding is used). A nbsp is a character. That it happens to be a single byte 0xA0 in iso-8859-1 should be of no real importance. That, as is today, the only one byte encoded blanks in utf-8 are 0x09 and 0x20 must have no real meaning. That may change tomorrow. As soon as some other encoding is named (or used) the characters have been actually "converted". –  Mar 21 '16 at 22:34
  • Bash uses isblank at the byte level; bytes are deeply relevant. In any case, comments are not discussion forums, so let's stick to relevant and material improvements to the answer. – Michael Homer Mar 21 '16 at 23:52
3

In GNU Parallel this is tested and used extensively:

$a =~ s/[\002-\011\013-\032\\\#\?\`\(\)\{\}\[\]\^\*\<\=\>\~\|\; \"\!\$\&\'\202-\377]/\\$&/go;
# quote newline as '\n'                                                                                                         
$a =~ s/[\n]/'\n'/go;

It is tested in bash,dash,ash,ksh,zsh, and fish. Some of the characters do not need quoting in some (versions) of the shells, but the above works in all tested shells.

If you simply want a string quoted, you can pipe it into parallel --shellquote:

printf "&*\t*!" | parallel --shellquote
Ole Tange
  • 35,514
  • How have I not heard of parallel before... – Tom Feb 26 '18 at 00:21
  • @TomH It will be appreciated if you can spend 5 minutes thinking of how we could have reached you. – Ole Tange Feb 26 '18 at 08:45
  • I think it's a progression problem. most people don't need or understand parallel until they have progressed through some complexity stages. By which time they have come across xargs, nohup and stuff like that. Also I don't see many people using parallel to solve problems in stack exchange or when I google for solutions to bash problems – Tom Feb 27 '18 at 02:49
  • Wild to me that parallel chose to solve the greater problem of knowing all characters that need escaping across a wide varienty of shells than the much lesser problem of just wrapping everything with ' and escaping ' into '\'', which would also work on all of the shells mentioned here. But cool that they have a battle-tested list. – mtraceur Jul 19 '23 at 18:40
  • 1
    @mtraceur It is because in some shells ' does not work. Newer versions of GNU Parallel use ' for some shells. – Ole Tange Jul 21 '23 at 00:40
  • @OleTange ' works in every shell listed in this answer, and every Bourne-style shell since the original Bourne shell. Maybe in older fish versions it didn't, since fish isn't a Bourne-like? (But I installed fish just to check before writing my comment here, and it does work now.) – mtraceur Jul 21 '23 at 19:54
  • @OleTange Does GNU Parallel's --shellquote also support other shells which are neither listed here nor Bourne-like, such as csh/tcsh? If so, then that totally makes sense, because iirc the csh-like shells had a different interpretation of '-quoting. – mtraceur Jul 21 '23 at 22:57
  • 1
    @mtraceur That is exactly the issue. rc is weird, and csh is pretty fucked up, but they are supported. – Ole Tange Jul 22 '23 at 01:03
2

For lightweight escaping solution in Perl, I'm following the principle of single quotes. A Bash-string in single quotes can have any character, except the single quote itself.

My code:

my $bash_reserved_characters_re = qr([ !"#$&'()*;<>?\[\\`{|~\t\n]);

while(<>) {
    if (/$bash_reserved_characters_re/) {
        my $quoted = s/'/'"'"'/gr;
        print "'$quoted'";
    } else {
        print $_;
    }
}

Example run 1:

$ echo -n "abc" | perl escape_bash_special_chars.pl
abc

Example run 2:

echo "abc" | perl escape_bash_special_chars.pl
'abc
'

Example run 3:

echo -n 'ab^c' | perl escape_bash_special_chars.pl
ab^c

Example run 4:

echo -n 'ab~c' | perl escape_bash_special_chars.pl
'ab~c'

Example run 5:

echo -n "ab'c" | perl escape_bash_special_chars.pl
'ab'"'"'c'

echo 'ab'"'"'c'
ab'c
  • Yes, valid point that. My view is that most people will land on this page, because they have a problem to solve. Not because this makes an interesting academic debate. That's why I'd like to offer solutions and discuss the merits of them, even while being slightly off-topic. – Jari Turkia Jan 23 '18 at 10:49
  • My code is just an implementation of Michael Homer's answer. I didn't intent to bring any more information, than what he did. – Jari Turkia Jan 23 '18 at 11:43