Matching numbers with regex in case statement

Question

I want to check whether an argument to a shell script is a whole number (i.e., a non-negative integer: 0, 1, 2, 3, …, 17, …, 42, …, etc, but not 3.1416 or −5) expressed in decimal (so nothing like 0x11 or 0x2A). How can I write a case statement using regex as condition (to match numbers)? I tried a few different ways I came up with (e.g., [0-9]+ or ^[0-9][0-9]*$); none of them works. Like in the following example, valid numbers are falling through the numeric regex that's intended to catch them and are matching the * wildcard.

i=1
let arg_n=$#+1

while (( $i < $arg_n )); do
    case ${!i} in
    [0-9]+)
        n=${!i}
        ;;
    *)
        echo 'Invalid argument!'
        ;;
    esac
    let i=$i+1
done

Output:

$ ./cmd.sh 64
Invalid argument!

This variable indirection works just fine. I have more cases in the real script and it works. I'm trying to match any occurrence of real numbers in the program arguments. So 0 or 999 should match. Else if there is some invalid argument like '-x' or letters in stead of numbers, program shall match *, at least thats what I thought. — siery, Mar 21 '18 at 19:35
@John1024, numbers aren't valid names for variables, but they're quite valid for the names of the positional parameters, and ${!i} works fine for those. e.g. set -- aa bb cc; i=2; echo ${!i} prints bb — ilkkachu, Mar 21 '18 at 22:45
that said, the easier way to loop over the arguments to the script would be to just use for val in "$@"; do ... and use $val in the loop — ilkkachu, Mar 21 '18 at 22:46
Read https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y — Gilles 'SO- stop being evil', Mar 21 '18 at 22:56
@John1024, and when it's run, i contains 1, so ${!i} is the same as $1: it expands to the value of the first argument, be it 64 or abc or whatever. What they have is just a convoluted way of looping over the positional parameters / command line arguments. — ilkkachu, Mar 21 '18 at 23:14
The syntax to loop over the positional parameters is for i do something with "$i"; done — Stéphane Chazelas, Mar 22 '18 at 07:17

score 12 · Accepted Answer · answered Mar 21 '18 at 19:41

12

case does not use regexes, it uses patterns

For "1 or more digits", do this:

shopt -s extglob
...
    case ${!i} in
        +([[:digit:]]) )
            n=${!i}
            ;;
    ...

If you want to use regular expressions, use the =~ operator within [[...]]

if [[ ${!i} =~ ^[[:digit:]]+$ ]]; then
    n=${!i}
else
    echo "Invalid"
fi

answered Mar 21 '18 at 19:41

glenn jackman

85,964

Since I could not use regexes in a case statement, and similarly advanced pattern syntax for character repetition like multiple digits for number arguments, I just spelled out the cases more explicitly with lots of pipe characters used, and also a backslash character at the end of line for formatting purposes. – Pysis May 16 '20 at 19:42

G-Man Says 'Reinstate Monica' · Answer 2 · 2018-03-22T18:26:43.053

As glenn says, “case does not use regexes, it uses patterns”. As bash(1) says,

case word in [ [(] pattern [ | pattern ] ... ) list ;; ] ... esac
A case command first expands word, and tries to match it against each pattern in turn, using the same matching rules as for pathname expansion (see Pathname Expansion below).

Similarly, the POSIX specification says,

… each pattern … shall be compared against the expansion of word, according to the rules described in Pattern Matching Notation …

So the patterns are pathname expansion patterns, a.k.a. wildcards, a.k.a. globs, as in ls -l -- *.sh or rm -- *.bak.

Sure, shopt -s extglob and [[ … =~ … ]] are the neatest thing since sliced bread, but they aren’t POSIX, and it can be useful to know how to use the original tools. For years, programmers checked, for example, whether a string was a number by checking whether it was not not a number. You’ve defined a number to be a string that consists (entirely) of one or more digits. So a string is not a number if it is null, or if it contains a character that is not a digit. We can test these conditions with a case statement as follows:

case "$1" in
    ("")
        # null
           ︙
        ;;
    (*[!0-9]*)
        # contains non-numeric character(s)
           ︙
        ;;
    (*)
        # is a whole number (non-negative integer)
           ︙
esac

where [!0-9] is the old-timey shell way of saying [^0-9], which, of course, means any character other than a digit. ([!…] and [^…] both work in bash. [!…] is required to work by POSIX; the result of [^…] is unspecified.) If you don’t care which kind of non-number a string is, you can combine the non-number patterns:

case "$1" in
    ("" | *[!0-9]*)
        # not a number
           ︙
        ;;
    (*)
        # is a number
           ︙
esac

As an exercise, here’s a case statement to handle any kind of real number — to be precise, a string of one or more digits, with optionally a period (.) somewhere, and optionally a minus sign (-) at the beginning.

case "$1" in
    (*[!-.0-9]*)
        # contains non-numeric character(s)
        ;;
    (*?-*)
        # contains '-' somewhere other than the first position
        ;;
    (*.*.*)
        # contains multiple decimal points
        ;;
    (*)
        case "$1" in
            (*[0-9]*)
                # is a real number
                ;;
            (*)
                # not a number
        esac
esac

I added the case-within-a-case to verify that the string does, indeed, contain at least one digit. That wasn’t necessary in the integer example because I tested whether the string was null; a test which I have removed from this statement. Without the second case, a single - or a single . — or even -. — would qualify as a number. Of course we could add patterns to handle those exceptions, but that can get complex. (For example, I almost posted this answer without realizing that -. was one of the exceptions.) I believe that the above approach is more flexible and robust.

Of course the non-number patterns can be combined here, too: (*[!-.0-9]* | *?-* | *.*.*).

Note that on many systems and locales [0-9] matches more than [0123456789]. Generally, you can't rely on ranges outside of the C locale. [[:digit:]] should be OK though. [0123456789] is the safest. — Stéphane Chazelas, May 15 '19 at 05:23
@StéphaneChazelas: But isn’t there a risk that [[:digit:]] will also match more than [0123456789], like Eastern Arabic / Hindi digits (٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, and ٩), Japanese (Kanji) digits (e.g., 零 / 〇, 一, 二, 三, etc.), N’Ko digits (߀, ߁, ߂, ߃, etc.), and others that I haven’t even heard of? — G-Man Says 'Reinstate Monica', May 15 '19 at 16:30
Fun fact: I composed the above comment in Microsoft Word, where I listed the Hindi and N’Ko digits in ascending (LTR) order, but when I pasted them into Internet Explorer, they switched into RTL order, even though I had LTR commas and spaces between them.   Is text direction ignored for punctuation? — G-Man Says 'Reinstate Monica', May 15 '19 at 16:52
[[:digit:]] is more or less required to match on 0123456789 only as it's meant to match what C isdigit() matches and that's 0123456789 only. See http://austingroupbugs.net/view.php?id=1078. In practice, I've not come across standard utilities whose [[:digit:]] matches anything else, but I'm see many where [0-9] matches hundreds of characters (including some Eastern Arabic ones (0-8 generally)). — Stéphane Chazelas, May 15 '19 at 18:49
The more I look at this, the more my head hurts. The POSIX specification for isdigit() says “The isdigit() and isdigit_l() functions shall test whether *c* is a character of class digit in the current locale, …”. I just don’t grok the point in having locales if ٠, ١, ٢, ٣, ٤, ٥, ٦, ٧, ٨, and ٩ aren’t going to be treated as digits in an Eastern Arabic / Hindi locale. — G-Man Says 'Reinstate Monica', May 29 '19 at 03:48

Stéphane Chazelas · Answer 3 · 2022-09-07T15:08:01.650

To match numbers with regexp in case statements, you'd need a shell whose wildcards support regexps. I only know of ksh93 with those.

With ksh93 globs, you can do ~(E)^[0-9]+$ or ~(E:^[0-9]+$) to use an Extended regexp in a glob pattern, or ~(P)^\d+$ to use a perl-like regexp (also G for basic regexp, X for augmented regexp, V for SysV regexp).

So:

#! /bin/ksh93 -
for i do
  case $i in
    (~(E)^[0-9]+$)
      n=$i;;
    (*)
      echo >&2 'Invalid argument!'
      usage
  esac
done

Matching numbers with regex in case statement

3 Answers3

Linked