There's no good reason why
[[ $a = a|b ]]
Should report an error instead of testing whether $a is the a|b
string, while [[ $a =~ a|b ]]
doesn't return an error.
The only reason is that |
is generally (outside and inside [[ ... ]]
) a special character. In that [[ $a =
position, bash
expects a type of token that is a normal WORD like the arguments or the targets of redirections in a normal shell command line (but as if the extglob
option had been enabled since bash 4.1).
(by WORD here, I refer to a word in an hypothetical shell grammar like the one described by the POSIX specification, that is something that the shell would parse as one token in a simple shell command line, not other definition of words like the English one of a sequence of letters or a sequence of non-spacing characters. foo"bar baz"
, $(echo x y)
, are two such WORDs).
In a normal shell command line:
echo a|b
Is echo a
piped to b
. a|b
is not a WORD, it's three tokens: a a
WORD, a |
token and a b
WORD token.
When used in [[ $a = a|b ]]
, bash
expects a WORD which it gets (a
), but then finds an unexpected |
token which causes the error.
Interestingly, bash
doesn't complain in:
[[ $a = a||b ]]
Because it's now a a
token followed by a ||
token followed by b
, so it's parsed the same way as:
[[ $a = a || b ]]
Which is testing that $a
is a
or that the b
string is non-empty.
Now, in:
[[ $a =~ a|b ]]
bash
can't have the same parsing rule. Having the same parsing rule would mean that the above would give an error and that one would need to quote that |
to ensure a|b
is a single WORD. But, since bash 3.2, if you do:
[[ $a =~ 'a|b' ]]
That's no longer matching against the a|b
regexp but against the a\|b
regexp. That is, shell quoting has the side effect of removing the special meaning of regexp operators. It's a feature, so the behaviour is similar to the [[ $a = "?" ]]
one, but wildcard patterns (used in [[ $a = pattern ]]
) are shell WORDS (used in globs for instance), while regexps are not.
So bash
has to treat all the extended regexp operators that are otherwise normally special shell characters like |
, (
, )
differently when parsing an argument of the =~
operator.
Still, note that while
[[ $a =~ (ab)*c ]]
now works,
[[ $a =~ [)}] ]]
doesn't. You need:
[[ $a =~ [\)}] ]]
[[ $a =~ [')'}] ]]
Which in previous versions of bash
would incorrectly match on backslash. That one was fixed, but
[[ $a =~ [^]')'] ]]
Does not match on backslash like it should for instance. Because bash
fails to realise that )
is within the brackets, so escapes the )
to result in a [^]\)]
regexp that matches on any character but ]
, \
and )
.
ksh93
has much worse bugs on that front.
In zsh
, it's a normal shell word that is expected and quoting regexp operators doesn't affect the meaning of regexp operators.
[[ $a =~ 'a|b' ]]
Is matching against the a|b
regexp.
That means the =~
can also be added to the [
/test
command:
[ "$a" '=~' 'a|b' ]
test "$a" '=~' 'a|b'
(also work in yash
. The =~
needs to be quoted in zsh
as =something
is a special shell operator there).
bash 3.1 used to behave like zsh
. It changed in 3.2, presumably to align with ksh93
(even though bash
was the shell that first came up with [[ =~ ]]
), but you can still do BASH_COMPAT=31
or shopt -s compat31
to revert to the previous behaviour (except that while [[ $a =~ a|b ]]
would return an error in bash
3.1, it doesn't anymore in bash -O compat31
with newer versions of bash
).
Hope it clarifies why I said the rules were confusing and why using:
[[ $a =~ $var ]]
helps including with portability to other shells.
|
is special) is on by default in the right-hand side of[[ $var = $pattern ]]
. It would be interesting to isolate the versions andshopt
option configurations where this behavior is seen -- if it's only those whereextglob
is on, either by default or explicit configuration, well, there we are. – Charles Duffy Jul 28 '17 at 02:32pattern='a|b'
and then expand$pattern
unquoted on the RHS. – Charles Duffy Jul 28 '17 at 02:37