13

My question comes from How does storing the regular expression in a shell variable avoid problems with quoting characters that are special to the shell?.

  1. Why is there an error:

    $ [[ $a = a|b ]]  
    bash: syntax error in conditional expression: unexpected token `|'
    bash: syntax error near `|b'
    

    Inside [[ ... ]] the second operand of = is expected to be a globbing pattern.

    Is a|b not a valid globbing pattern? Can you point out which syntax rule it violates?

  2. Some comment below points out that | is interpreted as pipe.

    Then changing = for glob pattern to =~ for regex pattern make | work

    $ [[ $a =~ a|b ]]
    

    I learned from Learning Bash p180 in my previous post that | is recognized as pipe at the beginning of interpretation, even before any other steps of interpretation (including parse the conditional expressions in the examples). So how can | be recognized as regex operator when using =~, without being recognized as pipe in invalid use, just as when using =? That makes me think that the syntax error in part 1 doesn't mean that | is interpreted as a pipe.

    Each line that the shell reads from the standard input or a script is called a pipeline; it contains one or more commands separated by zero or more pipe characters (|). For each pipeline it reads, the shell breaks it up into commands, sets up the I/O for the pipeline, then does the following for each command (Figure 7-1):

Thanks.

Tim
  • 101,790
  • 1
    Note that in some versions of bash, extglob parsing (where | is special) is on by default in the right-hand side of [[ $var = $pattern ]]. It would be interesting to isolate the versions and shopt option configurations where this behavior is seen -- if it's only those where extglob is on, either by default or explicit configuration, well, there we are. – Charles Duffy Jul 28 '17 at 02:32
  • 2
    BTW, if you wanted to somewhat more comprehensively rule out the case of the pipe character interfering with a prior stage of parsing (which I agree isn't happening, but it's not as obvious to the reader as it could be), you'd use pattern='a|b' and then expand $pattern unquoted on the RHS. – Charles Duffy Jul 28 '17 at 02:37
  • @CharlesDuffy, that was the point being made in the Q&A which this question is a follow-up to. – Stéphane Chazelas Jul 28 '17 at 11:23
  • Ahh -- the context makes sense; and your answer here is outstanding. Thank you on both counts. – Charles Duffy Jul 28 '17 at 15:07
  • Tim, dijd any of the answers below answer your question? Please consider accepting one if so. Thank you! – Jeff Schaller Jul 30 '17 at 12:10

3 Answers3

14

There's no good reason why

[[ $a = a|b ]]

Should report an error instead of testing whether $a is the a|b string, while [[ $a =~ a|b ]] doesn't return an error.

The only reason is that | is generally (outside and inside [[ ... ]]) a special character. In that [[ $a = position, bash expects a type of token that is a normal WORD like the arguments or the targets of redirections in a normal shell command line (but as if the extglob option had been enabled since bash 4.1).

(by WORD here, I refer to a word in an hypothetical shell grammar like the one described by the POSIX specification, that is something that the shell would parse as one token in a simple shell command line, not other definition of words like the English one of a sequence of letters or a sequence of non-spacing characters. foo"bar baz", $(echo x y), are two such WORDs).

In a normal shell command line:

echo a|b

Is echo a piped to b. a|b is not a WORD, it's three tokens: a a WORD, a | token and a b WORD token.

When used in [[ $a = a|b ]], bash expects a WORD which it gets (a), but then finds an unexpected | token which causes the error.

Interestingly, bash doesn't complain in:

[[ $a = a||b ]]

Because it's now a a token followed by a || token followed by b, so it's parsed the same way as:

[[ $a = a || b ]]

Which is testing that $a is a or that the b string is non-empty.

Now, in:

[[ $a =~ a|b ]]

bash can't have the same parsing rule. Having the same parsing rule would mean that the above would give an error and that one would need to quote that | to ensure a|b is a single WORD. But, since bash 3.2, if you do:

[[ $a =~ 'a|b' ]]

That's no longer matching against the a|b regexp but against the a\|b regexp. That is, shell quoting has the side effect of removing the special meaning of regexp operators. It's a feature, so the behaviour is similar to the [[ $a = "?" ]] one, but wildcard patterns (used in [[ $a = pattern ]]) are shell WORDS (used in globs for instance), while regexps are not.

So bash has to treat all the extended regexp operators that are otherwise normally special shell characters like |, (, ) differently when parsing an argument of the =~ operator.

Still, note that while

 [[ $a =~ (ab)*c ]]

now works,

 [[ $a =~ [)}] ]]

doesn't. You need:

 [[ $a =~ [\)}] ]]
 [[ $a =~ [')'}] ]]

Which in previous versions of bash would incorrectly match on backslash. That one was fixed, but

 [[ $a =~ [^]')'] ]]

Does not match on backslash like it should for instance. Because bash fails to realise that ) is within the brackets, so escapes the ) to result in a [^]\)] regexp that matches on any character but ], \ and ).

ksh93 has much worse bugs on that front.

In zsh, it's a normal shell word that is expected and quoting regexp operators doesn't affect the meaning of regexp operators.

[[ $a =~ 'a|b' ]]

Is matching against the a|b regexp.

That means the =~ can also be added to the [/test command:

[ "$a" '=~' 'a|b' ]
test "$a" '=~' 'a|b'

(also work in yash. The =~ needs to be quoted in zsh as =something is a special shell operator there).

bash 3.1 used to behave like zsh. It changed in 3.2, presumably to align with ksh93 (even though bash was the shell that first came up with [[ =~ ]]), but you can still do BASH_COMPAT=31 or shopt -s compat31 to revert to the previous behaviour (except that while [[ $a =~ a|b ]] would return an error in bash 3.1, it doesn't anymore in bash -O compat31 with newer versions of bash).

Hope it clarifies why I said the rules were confusing and why using:

[[ $a =~ $var ]]

helps including with portability to other shells.

  • zsh is also reporting an error on [[ $a = a|b ]]. –  Jun 02 '18 at 09:04
  • @isaac, yes, that's the point I'm making here. a|b is not a shell WORD here, it's the a, | and b token. Like echo a|b doesn't output a|b or doesn't expand a a|b glob, you need to quote that | as it's a special shell character that is invalid in that context. [[ $a = (a|b) ]] would work like echo (a|b) would work as (a|b) is a zsh wildcard operator. – Stéphane Chazelas Jun 02 '18 at 12:44
  • The wording and explanation on your answer only name bash. That is not the whole truth. –  Jun 02 '18 at 13:38
11

Standard globs ("filename expansion") are: *, ?, and [ ... ]. | is not a valid glob operator in standard (non-extglob) settings.

Try:

shopt -s extglob
[[ a = @(a|b) ]] && echo matched
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • 1
    Thanks. But why isn't | intepereted literally? Why is there a syntax error? – Tim Jul 27 '17 at 19:01
  • 1
    It wasn't quoted. – Jeff Schaller Jul 27 '17 at 19:02
  • 3
    In standard settings, | isn't a glob operator, so isn't | interpreted literally without being quoted? So why is there a syntax error? – Tim Jul 27 '17 at 19:04
  • 1
    | is a control character; it's never treated as a literal character in the same way that a letter or number is. – chepner Jul 27 '17 at 19:26
  • 3
    Because in that mode the shell didn't expect a pipe redirect character in the middle of a not-yet-closed [[]]. [[ $a = a isn't a valid command whose output can be piped to another process (at least that's what the shell thought you were trying to do). – Jason C Jul 27 '17 at 19:29
  • Unfortunately, in [[ a =~ a|b ]] the a|b cannot be quoted. This a good case where the regex should be put into a variable (which can and should be quoted) like re='a|b'; [[ a =~ $re ]] .... – DocSalvager Aug 14 '17 at 16:40
5

If you want a regex match the test would be:

[[ "$a" =~ a|b ]]
Deathgrip
  • 2,566
  • @Tim You should be opening new questions, not continuously editing your current question. – gardenhead Jul 27 '17 at 23:27
  • @gardenhead: My update is to clarify my questions, instead of changing them, in case that you miss it. The second part I added is to show one comment's pipe explanation about my original question (why the syntax error) is not correct. – Tim Jul 27 '17 at 23:37