According to $ man gawk
, the strtonum()
function can convert a string into a number:
strtonum(str)
Examine str, and return its numeric value. If str begins with a leading 0, treat it as an octal number. If str begins with a leading 0x or 0X, treat it as a hexadecimal number. Oth‐ erwise, assume it is a decimal number.
And if the string begins with a leading 0
, the number is treated as octal, while if it begins with 0x
it's treated as hexadecimal.
I've run these commands to check my understanding of the function:
$ awk 'END { print strtonum("0123") }' <<<''
83
$ awk 'END { print strtonum("0x123") }' <<<''
291
The string "0123"
is correctly treated as containing an octal number and converted into the decimal number 83
.
Similarly, the string "0x123"
is correctly treated as containing an hexadecimal number and converted into the decimal number 291
.
Now, here's what happens if I run the same commands, but moving the numerical strings from the program text to the input data:
$ awk 'END { print strtonum($1) }' <<<'0123'
123
$ awk 'END { print strtonum($1) }' <<<'0x123'
291
I understand the second result which is identical as in the previous commands, but I don't understand the first one. Why does gawk now treat 0123
as a decimal number, even though it begins with a leading 0
which characterizes octal numbers?
I suspect it has something to do with the strnum attribute, because for some reason 1, gawk gives this attribute to 0123
but not to 0x123
:
$ awk 'END { print typeof($1) }' <<<'0123'
strnum
$ awk 'END { print typeof($1) }' <<<'0x123'
string
1 It may be due to a variation between awk implementations:
To clarify, only strings that are coming from a few sources (here quoting the POSIX spec): [...] are to be considered a numeric string if their value happens to be numerical (allowing leading and trailing blanks, with variations between implementations in support for hex, octal, inf, nan...).
I'm using gawk version 4.2.62
, and the output of $ awk -V
is:
GNU Awk 4.2.62, API: 2.0 (GNU MPFR 3.1.4, GNU MP 6.1.0)
strtonum()
looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string:$ awk '{ print strtonum($1 "") }' <<<'0123'
. Could you clarify which rule in the link of the user manual explains why0x123
doesn't look like a number? Because it looks like a number to me; at least that's how I would write291
in hexadecimal in an awk progam text. Is it because of the alphabetical characterx
? – user938271 Feb 26 '19 at 14:13awk
specification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string. – Stephen Kitt Feb 26 '19 at 14:29gawk
treats0x10
as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed) – Stéphane Chazelas Feb 26 '19 at 14:44split()
in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions:awk 'BEGIN{s="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"}'
. Same thing with$1
as witha[1]
. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A. – Feb 26 '19 at 14:45