2

I've really hard time to understand this behaviour:

stackExchange@test:~$ if [[ "two words" =~ \bwords ]]; then echo hi; fi; #(I'd expect this one worked)
stackExchange@test:~$ if [[ "two words" =~ \\bwords ]]; then echo hi; fi; #(or at least this one...)
stackExchange@test:~$ if [[ "two words" =~ \\\bwords ]]; then echo hi; fi;
stackExchange@test:~$ if [[ "two words" =~ \\\\bwords ]]; then echo hi; fi;
stackExchange@test:~$ put_in_a_variable=\\bwords
stackExchange@test:~$ if [[ "two words" =~ $put_in_a_variable ]]; then echo hi; fi;
hi
stackExchange@test:~$

I understand that my variable contains \bword and this got expanded in the pattern section of the conditional expression, but I really cannot understand why seems impossible to achieve the same behaviour using inline shell escaping.

I don't want to do something like if [[ "two words" =~ $(echo \\bwords) ]]; then echo hi; fi;; too weird...

Thanks,
Francesco

Taz
  • 23

1 Answers1

4

The effect of a backslash in the regular expression part of [[ str =~ rex ]] is to quote the following character (exactly like putting it in single quotes), and in bash and since version 3.2, that directs it to do a literal match for it (1). Since b is not special, \b will turn into just b, but '\', "\\" or \\ will turn into \\ in order to match a literal backslash:

[[ abwords =~ \bwords ]] && echo "<$BASH_REMATCH>"
<bwords>
[[ 'a\bwords' =~ \\bwords ]] && echo "<$BASH_REMATCH>"
<\bwords>
# conversely, '|' is just like \|
[[ 'a|words' =~ a'|'words ]] && echo "<$BASH_REMATCH>"
<a|words>

Your idea of putting the regex in a variable is fine. An alternative would be to use a wrapper function:

rematch() [[ $1 =~ $2 ]]

if rematch 'two words' '\bwords\b'; then echo "<$BASH_REMATCH>" fi <words>

In any case, with those work-arounds applied, since \b is a non-standard extended regexp operator (from perl), whether that will work or not will depend on whether the system's regexp library supports it or not. Depending on the system, you may have more luck with some alternative syntaxes for those word-boundary operators such as \</\> or [[:<:]]/[[:>:]].


(1): as documented in its manual:

Any part of the pattern may be quoted to force the quoted portion to be matched as a string

Notice that in the shell, characters which are quoted are actually marked specially, so any subsequent processing by the parser could base decisions on whether a part of a string was quoted or unquoted.

  • thanks for your answer. So, if I understood correctly, given that using backslash, bash put the next param within single quotes (trigghering literal match), there is no way whatsoever to achive the \b regex symbol inline... just by "design" :\ – Taz Oct 29 '20 at 23:40
  • 1
    I for one cannot figure any way to use \b inline. You can do various tricks (eg. bs='\'; [[ k =~ ${bs}bk ]] && echo yeah) -- but I find them even worse than that rematch wrapper function ;-) –  Oct 30 '20 at 00:00
  • totally agree ;) – Taz Oct 30 '20 at 00:15
  • @Taz, in bash, you can always do shopt -s compat31; [[ k =~ '\bk' ]] or switch to zsh which doesn't have that misfeature, and has set -o rematchpcre where \b is guaranteed to be available (contrary to in default EREs) – Stéphane Chazelas Oct 07 '23 at 09:00