2

That should be easy, just use [[ "$var" =~ '^[1-9][0-9]*$' ]]. But I don't get the behavior I expect excepted with zsh. I don't control the machine where the script will be run, so portability along reasonable shells (Solaris legacy Bourne shell is not reasonable) is an issue. Here are some tests:

% zsh --version
zsh 4.3.10 (x86_64-redhat-linux-gnu)
% zsh -c "[[ 100 =~ '^[1-9][0-9]*\$' ]] && echo OK"
OK
% sh --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
% sh -c "[[ 100 =~ '^[1-9][0-9]*\$' ]] && echo OK" 
% bash --version
GNU bash, version 4.2.53(1)-release (x86_64-unknown-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
% bash -c "[[ 100 =~ '^[1-9][0-9]*\$' ]] && echo OK"
% ksh --version
  version         sh (AT&T Research) 93u+ 2012-08-01
% ksh -c "[[ 100 =~ '^[1-9][0-9]*\$' ]] && echo OK"
% 

I seems to be missing something. What?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
AProgrammer
  • 2,318
  • 3
    For the bash case, possibly this is relevant? – steeldriver Jun 05 '20 at 14:54
  • 3
    Note that only ATT ksh, bash and zsh have =~ or [[ … ]]. Plain sh doesn't. If you want portability to “reasonable shells”, target plain sh. If you want to use ksh+ features, target a specific one among ksh, bash or zsh. It's impractical to deploy scripts written in the intersection of ksh, bash and zsh because they have to be invoked differently on each platform. – Gilles 'SO- stop being evil' Jun 05 '20 at 15:00
  • There may be a better one on the site, but for matching numbers: https://unix.stackexchange.com/a/369268/117549 – Jeff Schaller Jun 05 '20 at 15:10
  • @Gilles'SO-stopbeingevil', it seems I was mistaken about the portability of [[. I started to test of the machine I was using and stopped there. What would be a good way to be more portable for that test? I'm at least interested by BSD's /bin/sh and Debian/Ubuntu one (which is dash if I'm not again mistaken). – AProgrammer Jun 05 '20 at 15:11
  • Does it have to be shell syntax? This would be a simple task for grep. – Jim L. Jun 05 '20 at 15:25
  • @ilkkachu testing if a variable is a number. With Gilles remark, I'm at echo "$var" | grep -E '^[1-9][0-9]*\$' > /dev/null but I'd not be surprised if there is a better way, I'm not that knowledgeable about shell scripting. – AProgrammer Jun 05 '20 at 15:26
  • @JimL. it is to use as the condition of an if in a shell script. – AProgrammer Jun 05 '20 at 15:27
  • echo "$var" | grep -qx '[0-9][0-9]*' – Jim L. Jun 05 '20 at 15:28

1 Answers1

3

Testing if a string is a number

You don't need regular expressions for that. Use a case statement to match the string against wildcard patterns: they're less powerful than regex, but sufficient here. See Why does my regular expression work in X but not in Y? if you need a summary of how wildcard patterns (globs) differ from regex syntax. This works in any sh implementation (even pre-POSIX Bourne).

case $var in
  '' | *[!0123456789]*) echo >&2 "This is not a non-negative integer."; exit 2;;
  [!0]*) echo >&2 "This is a positive integer. I like it.";;
  0*[!0]*) echo >&2 "This is a positive integer written with a leading zero. I don't like it."; exit 2;;
  *) echo >&2 "This number is zero. I don't like it."; exit 2;;
esac

Shell portability

Any Unix system has an implementation of sh. Any non-antique Unix or POSIX system has an sh implementation that (mostly) follows the POSIX specification. It's usually in /bin/sh, but there are a few commercial unices where /bin/sh is an antique Bourne shell and the modern POSIX sh is in /usr/posix/bin/sh or some such.

Use #!/usr/bin/env sh as a shebang line for practical portability if #!/bin/sh doesn't cut it for you.

[[ … ]] is not available in POSIX sh. It's available in ksh93, mksh, bash and zsh, but not in dash (a popular /bin/sh on Linux) or BusyBox (a popular /bin/sh on embedded Linux). Portable sh doesn't have regex matching built in, only wildcard matching. You can use grep, awk or sed to get regex matching on a POSIX system.

Quoting the regex for =~

Ksh93, bash and zsh have a regex matching operator =~ in [[ … ]] conditional expressions. They have slightly different quoting rules.

In bash ≥3.1, regex characters only have their special effect on the right of the =~ operator if they're unquoted. So [[ 100 =~ ^[1-9][0-9]*$ ]] is true but [[ 100 =~ '^[1-9][0-9]*$' ]] is false ([[ $x =~ '^[1-9][0-9]*$' ]] only matches strings that have ^[1-9][0-9]*$ as a substring).

In ksh 93u, the effect of quoting a character in a regex depends on the character: characters that are also wildcard characters must not be quoted, but characters that aren't can be in single or double quotes (but not preceded by a backslash). So [[ 100 =~ ^[1-9][0-9]*$ ]] is true, and so is [[ 100 =~ '^'[1-9][0-9]*'$' ]] but [[ 100 =~ '^[1-9][0-9]*$' ]] is false (it matches anything with the substring [1-9][0-9]*) and [[ 100 =~ ^[1-9][0-9]*\$ ]] is also false (it matches any string starting with a nonzero digit, then more digits and a $).

In zsh, any regex character can be quoted or not. Note that this means that to include a character literally, you need two levels of quoting, e.g. \\* or '\*' to match an asterisk. So both [[ 100 =~ ^[1-9][0-9]*$ ]] and [[ 100 =~ '^[1-9][0-9]*$' ]] are true.

I think putting the regex in a variable is the most reliable way not to depend on the shell's idiosyncrazies.

regex='…' # Use extended regular expression syntax here, with '\'' if you need a literal apostrophe
if [[ $string =~ $regex ]]; …

ranges in regexp/wildcard bracket expressions

What ranges like [0-9] match depends on the implementation and locale. In general you can't expect it to match on 0123456789 only (though you should be able to assume it will match on at least those). If it's important you match on 0123456789 only, avoid ranges and name the characters individually.