1

What is the regex pattern for checking for a single digit number within a range? I am trying the following pattern which seems to work when tested on https://regex101.com/.

pattern: \b([0-5])\b

Expected result:

input : 2 output : ok

input : 5 output : ok

input : 6 output : no

input : 22 output : no

test$ ch=2
test$ [[ $ch =~ \b([0-5])\b ]] && echo "ok" || echo "no"
no
test$ ch=6
test$ [[ $ch =~ \b([0-5])\b ]] && echo "ok" || echo "no"
no
test$ ch=62
test$ [[ $ch =~ \b([0-5])\b ]] && echo "ok" || echo "no"
no
test$ ch=0
test$ [[ $ch =~ \b([0-5])\b ]] && echo "ok" || echo "no"
no
test$

I have tried double back as well:

test$ ch=2
test$ [[ $ch =~ \\b[0-5]\\b ]] && echo "ok" || echo "no"
no
test$ [[ $ch =~ \\b([0-5])\\b ]] && echo "ok" || echo "no"
no

In my case, the bash is always giving out 'no'. Why is it behaving like this?

preetam
  • 117

2 Answers2

7

To verify that $ch is any one of the ASCII digits 0, 1, 2, 3, 4, or 5, use:

  • Portable (sh syntax):

    case $ch in 
      ([012345]) echo OK;;
      (*) echo not OK;;
    esac
    
  • Korn-style alternative:

    if [[ $ch = [012345] ]]; then
      echo OK
    then
      echo not OK
    fi
    

Do not use ranges such as [0-5] for input validation as (depending on system and locale) that tends to include many other characters that happen to sort between 0 and 5 beside 012345 such as ٠١٢٣٤۰۱۲۳۴߀߁߂߃߄०१२३४০১২৩৪੦੧੨੩੪૦૧૨૩૪୦୧୨୩୪௦௧௨௩௪౦౧౨౩౪౸౹౺౻౼౽౾೦೧೨೩೪൦൧൨൩൪෦෧෨෩෪๐๑๒๓๔໐໑໒໓໔༠༡༢༣༤༪༫༬༭༳၀၁၂၃၄႐႑႒႓႔፩፪፫፬០១២៣៤៰៱៲៳៴᠐᠑᠒᠓᠔᥆᥇᥈᥉᥊᧐᧑᧒᧓᧔᧚᪀᪁᪂᪃᪄᪐᪑᪒᪓᪔᭐᭑᭒᭓᭔᮰᮱᮲᮳᮴᱀᱁᱂᱃᱄᱐᱑᱒᱓᱔⁰⁴₀₁₂₃₄⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅛⅜⅟↉①②③④⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳⑴⑵⑶⑷⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇⒈⒉⒊⒋⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛⓪⓫⓬⓭⓮⓯⓰⓱⓲⓳⓴⓵⓶⓷⓸⓾⓿❶❷❸❹❿➀➁➂➃➉➊➋➌➍➓〇〡〢〣〤㉈㉉㉊㉋㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟㊱㊲㊳㊴㊵㊶㊷㊸㊹㊺㊻㊼㊽㊾㋀㋁㋂㋃㋉㋊㋋㍘㍙㍚㍛㍜㍢㍣㍤㍥㍦㍧㍨㍩㍪㍫㍬㍭㍮㍯㍰㏠㏡㏢㏣㏩㏪㏫㏬㏭㏮㏯㏰㏱㏲㏳㏴㏵㏶㏷㏸㏹㏺㏻㏼㏽㏾꘠꘡꘢꘣꘤꣐꣑꣒꣓꣔꤀꤁꤂꤃꤄꧐꧑꧒꧓꧔꧰꧱꧲꧳꧴꩐꩑꩒꩓꩔꯰꯱꯲꯳꯴01234

You could also use regex [[ $ch =~ ^[012345]$ ]] but that has little advantage over using case or [[...]]'s =.

It could be useful to match on any integer decimal representation of a number in between 0 and 5 including -0, 0004, +5 which you could do with:

[[ $ch =~ ^(-0+|\+?0*[012345])$ ]]

Which is slightly shorter than the Korn-style:

[[ $ch = @(-+(0)|?(+)*(0)[012345]) ]]

And likely easier to read by people familiar with regexps.

Never use arithmetic operators of the [[...]] construct (as in [[ $ch -ge 0 && $ch -le 5 ]]) nor ((...)) (as in (( ch >= 0 && ch <= 5 ))) for input validation as those introduce arbitrary command execution vulnerabilities. [ "$ch" -ge 0 ] && [ "$ch" -le 5 ] doesn't have the problem in bash but would output errors upon incorrect numbers and would allow blanks around the numbers.

\b([0-5])\b is a perl regexp (the default at https://regex101.com), that matches on any one of the 012345 characters preceded and followed by a word boundary, that is provided it's neither preceded nor followed by a word character, word characters being alphanumeric ones and underscores. So for instance it would match in 123.5 because there's a 5 in there that is preceded by . which is not a word character and not followed by anything.

bash's =~ uses POSIX extended regular expressions, not perl regexps and the behaviour for \b in POSIX ERE is unspecified.

As https://regex101.com doesn't currently offer POSIX ERE as a choice of regex flavour, you shouldn't use it to validate regexps used in bash's [[ =~ ]] operator.

There are systems in which the extended regular expression matcher used by bash supports \b as an extension over the standard, but in [[ $ch =~ \b[0-5]\b ]], bash treats \b as a quoted b, the same as if you had written [[ $ch =~ 'b'[0-5]'b' ]] and doesn't pass the backslash to the regex engine.

You can work around that by using:

regex='\b[012345]\b' # with the [0-5] also fixed to [012345]
[[ $ch =~ $regex ]]

Where the backslash will be passed to the regex matcher¹, but that will only work on systems that support that \b extension.

Doing it with standard ERE syntax would look like:

[[ $ch =~ (^|[^[:alnum:]_])[012345]([^[:alnum:]_]|$) ]]

To use perl-style regexps, you could switch to zsh which has a rematchpcre option to use PCRE (PCRE2 for now PCRE3 in the next version) in its own =~ operator.

set -o rematchpcre
[[ $ch =~ '\b[0-5]\b' ]]

Would work there (and zsh doesn't have that misfeature of bash whereby shell quoting is treated as regexp escaping which also allows it to use other regexp engines).

zsh also has a glob operator to match ranges of decimal integer numbers so there [[ $ch = <0-5> ]] would match on 000, 01, 3... And [[ $ch = (-<0-0>|(+|)<0-5>) ]] would do the same as [[ $ch =~ '^(-0+|\+?0*[012345])$' ]] (note the quotes around the regex as a difference with bash 3.2+).


¹ See bash regexp matching fails in [[ ]] and How does storing the regular expression in a shell variable avoid problems with quoting characters that are special to the shell? for details and there for the history of builtin regex matching in Korn-like shells.

1

Since your expression is subject to bash expansions the \b is being expanded to simply b before the regex is being checked (try ch=b4b). Also it seems unnecessary to use a capture group.

You can put your expression in a variable first like so:

exp='\b[0-5]\b'
[[ $ch =~ $exp ]] 

However for something like this I would much prefer using arithmetic operators:

[[ $ch -gt 0 && $ch -lt 6 ]]

or

((ch>0&&ch<6))
jesse_b
  • 37,005
  • But I also want to check for if input is number as well as range in one regex. – preetam Oct 07 '23 at 01:05
  • @preetam if ch is not a number it will fail those checks just like it would the regex test – jesse_b Oct 07 '23 at 01:06
  • try this: ch=44;nums=6;[[ $ch -gt 0 && $ch < $nums ]] && echo "ok" || echo "no" Doesn't work. – preetam Oct 07 '23 at 01:53
  • Depending on the system and locale, regexp [0-5] may match dozens of different characters that happen to sort in-between 0 and 5. Such as ², ³ or decimal digits in other scripts. Use [012345] if you only want to match on the arabic digits – Stéphane Chazelas Oct 07 '23 at 06:53
  • 2
    Never use [[...]]'s arithmetic operators nor ((...)) for input validation as that introduces command injection vulnerabilities. For instance with ch='a[$(reboot)]', that would reboot. – Stéphane Chazelas Oct 07 '23 at 06:57