Testing if a string is a number
You don't need regular expressions for that. Use a case
statement to match the string against wildcard patterns: they're less powerful than regex, but sufficient here. See Why does my regular expression work in X but not in Y? if you need a summary of how wildcard patterns (globs) differ from regex syntax. This works in any sh implementation (even pre-POSIX Bourne).
case $var in
'' | *[!0123456789]*) echo >&2 "This is not a non-negative integer."; exit 2;;
[!0]*) echo >&2 "This is a positive integer. I like it.";;
0*[!0]*) echo >&2 "This is a positive integer written with a leading zero. I don't like it."; exit 2;;
*) echo >&2 "This number is zero. I don't like it."; exit 2;;
esac
Shell portability
Any Unix system has an implementation of sh. Any non-antique Unix or POSIX system has an sh implementation that (mostly) follows the POSIX specification. It's usually in /bin/sh
, but there are a few commercial unices where /bin/sh
is an antique Bourne shell and the modern POSIX sh is in /usr/posix/bin/sh
or some such.
Use #!/usr/bin/env sh
as a shebang line for practical portability if #!/bin/sh
doesn't cut it for you.
[[ … ]]
is not available in POSIX sh. It's available in ksh93, mksh, bash and zsh, but not in dash (a popular /bin/sh
on Linux) or BusyBox (a popular /bin/sh
on embedded Linux). Portable sh doesn't have regex matching built in, only wildcard matching. You can use grep, awk or sed to get regex matching on a POSIX system.
Quoting the regex for =~
Ksh93, bash and zsh have a regex matching operator =~
in [[ … ]]
conditional expressions. They have slightly different quoting rules.
In bash ≥3.1, regex characters only have their special effect on the right of the =~
operator if they're unquoted. So [[ 100 =~ ^[1-9][0-9]*$ ]]
is true but [[ 100 =~ '^[1-9][0-9]*$' ]]
is false ([[ $x =~ '^[1-9][0-9]*$' ]]
only matches strings that have ^[1-9][0-9]*$
as a substring).
In ksh 93u, the effect of quoting a character in a regex depends on the character: characters that are also wildcard characters must not be quoted, but characters that aren't can be in single or double quotes (but not preceded by a backslash). So [[ 100 =~ ^[1-9][0-9]*$ ]]
is true, and so is [[ 100 =~ '^'[1-9][0-9]*'$' ]]
but [[ 100 =~ '^[1-9][0-9]*$' ]]
is false (it matches anything with the substring [1-9][0-9]*
) and [[ 100 =~ ^[1-9][0-9]*\$ ]]
is also false (it matches any string starting with a nonzero digit, then more digits and a $
).
In zsh, any regex character can be quoted or not. Note that this means that to include a character literally, you need two levels of quoting, e.g. \\*
or '\*'
to match an asterisk. So both [[ 100 =~ ^[1-9][0-9]*$ ]]
and [[ 100 =~ '^[1-9][0-9]*$' ]]
are true.
I think putting the regex in a variable is the most reliable way not to depend on the shell's idiosyncrazies.
regex='…' # Use extended regular expression syntax here, with '\'' if you need a literal apostrophe
if [[ $string =~ $regex ]]; …
ranges in regexp/wildcard bracket expressions
What ranges like [0-9]
match depends on the implementation and locale. In general you can't expect it to match on 0123456789 only (though you should be able to assume it will match on at least those). If it's important you match on 0123456789 only, avoid ranges and name the characters individually.
=~
or[[ … ]]
. Plain sh doesn't. If you want portability to “reasonable shells”, target plain sh. If you want to use ksh+ features, target a specific one among ksh, bash or zsh. It's impractical to deploy scripts written in the intersection of ksh, bash and zsh because they have to be invoked differently on each platform. – Gilles 'SO- stop being evil' Jun 05 '20 at 15:00[[
. I started to test of the machine I was using and stopped there. What would be a good way to be more portable for that test? I'm at least interested by BSD's /bin/sh and Debian/Ubuntu one (which is dash if I'm not again mistaken). – AProgrammer Jun 05 '20 at 15:11grep
. – Jim L. Jun 05 '20 at 15:25echo "$var" | grep -E '^[1-9][0-9]*\$' > /dev/null
but I'd not be surprised if there is a better way, I'm not that knowledgeable about shell scripting. – AProgrammer Jun 05 '20 at 15:26if
in a shell script. – AProgrammer Jun 05 '20 at 15:27echo "$var" | grep -qx '[0-9][0-9]*'
– Jim L. Jun 05 '20 at 15:28