2

When comparing a variable value to something (I'll take arithmetic comparison so that you can't use the "x$VAR" == "xyes" trick), how do I protect against the case when the variable contains whitespace and/or shell syntax?

$ (b="-z foo"; if [[ "$b" -eq 5 ]]; then echo foo; fi)
+(:50): b='-z foo'
+(:50): [[ -z foo -eq 5 ]]
-bash: [[: -z foo: syntax error in expression (error token is "foo")

Tried: single and double brackets, arithmetic evaluation ((())), quoting and no quoting, "${b##-}", even "${b@Q}"; checked https://mywiki.wooledge.org/BashPitfalls .

2 Answers2

4

When a value needs to be a decimal integer number and I don't want my user to be presented with a confusing error and script abort event, I test the variable for the presence of non-digit characters and handle it with my own error messages:

[[ -z ${val} ]] && {
  echo "$0: Oops, \"${val}\" is empty" >&2
  exit 1
}
[[ ${val} =~ [^[:digit:]] ]] && {
  echo "$0: Oops, non-numeric characters found in \"${val}\"" >&2
  exit 1
}

If the value is a string where whitespace is undesirable, I'll use a similar technique to check for whitespace:

[[ ${val} =~ [[:blank:]] ]] && {
  echo "$0: Oops, \"${val}\" contains space or tab characters" >&2
  exit 1
}

If you want to also test for newline, carriage return, form feed, and vertical tab, use the [:space:] character class.

This is the kind of thing I do in my in bash v4.x scripts. I don't have tips to offer for other shells.

Sotto Voce
  • 4,131
  • If the bash script author is worried about POSIX character classes matching non-ASCII characters, the solution I suggest is to include export LC_ALL=C (or export LC_CTYPE=C) early in the script. – Sotto Voce Jul 10 '22 at 08:41
3

In the [[ ... ]] ksh construct (copied by a few shells including bash though with variations), operands of arithmetic operators are interpreted as arithmetic expressions, and if not sanitised, externally supplied input used there introduce arbitrary execution vulnerabilities.

Same applies to (( a == b )) (also from ksh).

To avoid that, you need to decide what input you want to accept, and either do the sanitisation by hand and reject anything that doesn't match what you want or use a command other than [[ ... ]] that only accepts constants in the format you want to allow.

If you want to allow only decimal integer constants with an optional leading sign, and possibly surrounding whitespace, in bash and a few other shells, you can use the [ aka test builtin which only accepts decimal integers.

[ "$b" -eq 5 ]

If $b is not a decimal integer constant, [ will fail with an error and return with a status of 2.

To allow all the arithmetic expression constants that POSIX specifies (such as 12, -1, 010, 0x5), you could resort to dash where bare variables inside arithmetic expressions are only accepted if they contain constants or the empty string (optionally surrounded by whitespace).

VAL="$b" dash -c '[ "$((VAL))" -eq 5 ]'

(again with a status of 2 if $b doesn't contain a valid constant).

To allow more numbers such as 1.123 (possibly 1,123 in some locale and some awk implementations), 50e-1 and possibly more, you can use awk:

awk -- 'BEGIN{exit !(ARGV[1] == 5)}' "$b"

There, if $b is not recognised as a number, then a string comparison will be performed which will return false. With GNU awk you can tell whether the string is recognised as a number by checking if typeof(ARGV[1]) returns strnum.

Beware of range and precision as well. 4.9999999999999999 or 18446744073709551621 (264+5) may be considered the same as 5 depending on whom you ask.