0

AFAIK the \b metasequence indicating "word boundaries" is not supported in bash:

if [[ $foo =~ .*\bWORD\b.* ]]; then

Are there reasons why this is not supported?

Imagine I write a patch/pull-request for bash, are there reasons why implementing \b is not possible (apart from reasons like "we don't like this feature")?

AdminBee
  • 22,803
guettli
  • 1,389
  • 1
    Are you sure the issue is not just that \bWORD\b is subject to quote removal in the [[...]] context? Have you tried setting re='\bWORD\b' then [[ $foo =~ $re ]] for example? – steeldriver Mar 18 '24 at 12:25
  • The actual mandatory syntax of regular expressions is small so effectively everything other than ., *, (), and concatenation is an extension that differs between implementations. – davolfman Mar 18 '24 at 18:12

2 Answers2

1

As per the bash man page, the =~ operator seems to be using the POSIX regular expression functions:

When you use =~, the string to the right of the operator is considered a POSIX extended regular expression pattern and matched accordingly (using the POSIX regcomp and regexec interfaces usually described in regex(3)).

The \b metasequence is an element of the Perl regular expression syntax and not part of POSIX, so it would appear it is not supported by the library used by Bash. So, supporting it in Bash would likely mean changing the library, which usually has major side-effects.

It is improbable that this will be done purely to accomodate this syntax element.

AdminBee
  • 22,803
0

First check if Bash even has an RE implementation of its own, or if it just uses one from the system libraries.

But yeah, \b is from Perl regexes, and in general, they also contain piles of other extensions that aren't available in standard regexes. Though GNU systems seem to support \s for whitespace and \w for word chars, but not \d for digits. I don't know why they've decided to pick those odd ones, but in general, having all Perl RE features would likely make the RE engine much more complicated, and even though Perl fans might love it, many of the standard tools' authors might not want that. And then if you start adding this and that, but not all, deciding where to draw the line becomes an issue.

In any case, word borders are non-standard to begin with. On a few systems, \< and \> should work for the left and right borders, while on FreeBSD and Mac, you need [[:<:]] and [[:>:]].

As it happens, like @steeldriver comments, \b also seems to work GNU, at least as I tested. Just that in Bash you need to store the RE in a variable first, to avoid the special chars getting mashed by the shell's parsing processes:

$ re='\bWORD\b'; if [[ WORD =~ $re ]]; then echo y; else echo n; fi
y
$ re='\<WORD\>'; if [[ WORD =~ $re ]]; then echo y; else echo n; fi
y
$ re='\bWORD\b'; if [[ WORDLESS =~ $re ]]; then echo y; else echo n; fi
n
$ re='\<WORD\>'; if [[ WORDLESS =~ $re ]]; then echo y; else echo n; fi
n
ilkkachu
  • 138,973