-1

The idea is to get a rudimentary check on input pattern for a url:

$ ns='abc.def.com'
$ reg_expr="\N*\.(\D{2}|\D{3})$"
$ echo $reg_expr
\N*\.(\D{2}|\D{3})$
$ [[ $ns =~ "$reg_expr" ]] && echo "ok" || echo "no"
no

However, the regex always fails. Online regex checks for same pattern work fine.

https://regex101.com/r/vXxv1w/1

Why does this happen?

preetam
  • 117

2 Answers2

1

That

\N*\.(\D{2}|\D{3})$

Is a perl regexp. bash's [[ =~ ]] operator takes a POSIX extended regexp, not a perl regexp.

To use a perl-style regexp, use zsh and its rematchpcre option:

set -o rematchpcre
[[ $ns =~ '\N*\.(\D{2}|\D{3})$' ]]

Now, that regexp doesn't make much sense.

  • \N is meant to match on any character other than newline, but unless you set the s flag (like with (?s)), . won't match a newline anyway so you could replace \N with ..
  • Having <anything>* at the start or end of a regexp is pointless as it matches on 0 or more of <anything>, so it matches on nothing as well. [[ $ns =~ '\.(\D{2}|\D{3})$' ]] is functionally equivalent.
  • \D matches on any character other than decimal digits which are either 0123456789 or something like 0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯෦෧෨෩෪෫෬෭෮෯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙꧰꧱꧲꧳꧴꧵꧶꧷꧸꧹꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹0123456789 depending on whether the matching is done based on Unicode character properties or not (not by default in zsh unless you add (*UCP)). So that also matches on . characters which makes your check fall apart.
  • \D{2}|\D{3} can be written \D{2,3}
  • $ in perl regexps matches on either the end of the subject or before a newline character at the end of the subject. To match at the end of the subject, you use \z instead.

So assuming you want to match on strings that end with . followed by 2 to 3 characters other than digits and ., with PCRE, you'd need:

# zsh
set -o rematchpcre
[[ $ns =~ '\.[^\d.]{2,3}\z' ]]

Or with ERE:

# zsh/bash -O compat31
[[ $ns =~ '\.[^[:digit:].]{2,3}$' ]]

Or:

# zsh/bash/ksh93
regex='\.[^[:digit:].]{2,3}$'
[[ $ns =~ $regex ]]

Bearing in mind that what [[:digit:]] matches varies with the locale and system. Use [^0123456789.] to match on any character other than . and those specific decimal digit characters.

\D, [^\d.] and [^[:digit:].] all also match on newline characters. If you wanted to make sure strings that contain a newline characters don't match, you'd need regex='^.*\.[^\d\n.]{2,3}\z' (or regex='^\N*\.[^\d\n.]{2,3}\z' to make it more explicit) in perl-style RE and regex=$'^[^\n]*\.[^[:digit:].\n]{2,3}$' in ERE.

0

In the man page for bash, there are several paragraphs describing the operators that are valid within the [[ expression ]] test. The one covering regular expression matching says:

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered a POSIX extended regular expression and matched accordingly ...

(skipping a couple of sentences)

If any part of the pattern is quoted, the quoted portion is matched literally. This means every character in the quoted portion matches itself, instead of having any special pattern matching meaning. If the pattern is stored in a shell variable, quoting the variable expansion forces the entire pattern to be matched literally.

To summarize, inside [[ expression ]], don't put quotes around the regular expression, and don't put quotes around a variable that contains the regular expression. Even though you've specified =~ as the comparison operator, quoting the regular expression changes the comparison to a mere string match, like ==.

It would also be good to follow the references through the man pages to see what regex syntax is supported. You may need to use [[:digit:]] and [^[:digit:]] instead of \d and \D. (I could be mistaken so check it for yourself)

Sotto Voce
  • 4,131