#!/bin/sh
in the syntax of the shell is a comment. However, that #!
tells the kernel, when executing that file that the interpreter stored at that /bin/sh
path should be used to interpret that file, and should be executed with the path of the script as argument.
is_integer () compound-command
Is the POSIX sh syntax to define a function.
{
...
}
is a compound command called a command group. Its only purpose is to group commands, here to make it the body of the function. Here, it's superfluous as its content is only one compound command, however using the { ... }
command group as the body of every function is common practice and makes for more readable code so is generally recommended. The same function could have been written:
is_integer () case "${1#[+-]}" in
(*[!0123456789]*) return 1 ;;
('') return 1 ;;
(*) return 0 ;;
esac
case something in (pattern1 | pattern2) ...;; (pattern3)... ; esac
is a case
/esac
construct (makes up a compound command) which matches something
in turn against each pattern(s), and upon the first match, executes the corresponding code.
Here something
is ${1#[-+]}
. That's a parameter expansion, which applies the ${param#pattern}
operator to the 1
parameter which is the first argument to the function. That operator strips the shortest string that matches the pattern from the start of the contents of the parameter. [-+]
is a wildcard pattern (not regexp) that matches on either the -
or +
character. So ${1#[-+]}
expands to the value of the first argument stripped of a sign. So if the first argument was -2
, that becomes 2. If it was -
is becomes the empty string. If it was 2
is stays 2
.
You'll notice "${1#[+-]}"
is quoted. Generally, you need to quote parameter expansions as otherwise they're subject to split+glob. Here, it's one of the very few contexts where that wouldn't happens though, so strictly speaking those quotes are superfluous (but don't harm and are still good practice).
Then that value is matched against some patterns.
*[!0123456789]*
is *
--any number of characters (though most shells will also accept non characters)-- followed by [!0123456789]
--any character that is neither 0
nor 1
... nor 9
-- followed by any number of characters (*
again). So it will match on any string that contains a character (or non-character in most shells) that is not a decimal digit number.
If there's a match, the return 1
code is executed which will cause the function to return with that 1
exit code which, like any number other than 0 means false / failure.
''
is one way to represent the empty string. The empty string is also not a valid number but wouldn't have been matched by the previous pattern.
Then *
matches anything. So the return 0
would be run for any string that didn't match any of the previous patterns. It's superfluous here as the case
statement is the last command in that function, and a case
statement returns success / true if no command was run within.
So here, that function definition could be shortened to:
is_integer() case ${1#[-+]} in
('' | *[!0123456789]*) false
esac
Though that doesn't make it more legible.
In any case, that code is right to use [0123456789]
. Especially for input validation (and it's critical to validate input when it's used in shell arithmetic expressions, see Security Implications of using unsanitized data in Shell Arithmetic evaluation), [0-9]
or [[:digit:]]
should not be used, especially if your sh
implementation is bash
as [0-9]
may match on any character (or possibly multi-character collation element) that sorts in between 0 and 9 and [[:digit:]]
on some BSDs will match on digits of any decimal numeral systems, not only the 0123456789 English ones, even in English locales.
For instance, on a GNU system, in a typical US English locale (which these days tend to use UTF-8 as their charset), in bash
, [0-9]
would also match on
,
,
and hundreds of other characters). And on FreeBSD, in that same locale, [[:digit:]]
would match on hundreds of different characters (including
).
If you let through
for instance during input validation, you're not closing the paths to those arbitrary code injection vulnerabilities. In ksh
and on GNU systems,
is a valid variable name (and that's the case for many other characters matched by [0-9]
). If that variable is set (in the environment for instance) and contains a[0$(reboot>&2)]
, then:
is_integer "$1" || exit
echo "$(( $1 + 1 ))"
in ksh will cause a reboot if is_integer
fails to reject that
input.
To use a regular expression to do the matching, you'd need expr
or awk
, though few shells have those commands builtin, so it would be less efficient. Some [
implementations like the [
builtin of zsh
or yash
can also do regexp matching. And some shells also have a [[ ... ]]
conditional expression construct that can do regexp matching, but none of those are in standard sh
and come with their own problem when it comes to input validation.
While the *
shell wildcard in most sh
implementations will match on sequences of bytes even if some of them don't form valid characters, same for [!0123456789]
, the .*
or [^0123456789]
regexp equivalent often doesn't.
Here, it may not be a problem as long as that matching is positive. Doing a negative matching like:
regexp() {
awk -- 'BEGIN {exit !(ARGV[1] ~ ARGV[2])}' "$@"
}
is_integer() {
! regexp "${1#[-+]}" '^(.[^012345679].)?$'
}
As a direct translation of that case
statement would be wrong as it would fail to reject input that contains sequences of bytes not forming valid characters, but
is_number() {
regexp "$1" '^[-+]?[0123456789]+$'
}
Should be fine as it would reject any input containing sequences of bytes not forming valid characters.