According to $ man gawk, the strtonum() function can convert a string into a number:
strtonum(str)Examine str, and return its numeric value. If str begins with a leading 0, treat it as an octal number. If str begins with a leading 0x or 0X, treat it as a hexadecimal number. Oth‐ erwise, assume it is a decimal number.
And if the string begins with a leading 0, the number is treated as octal, while if it begins with 0x it's treated as hexadecimal.
I've run these commands to check my understanding of the function:
$ awk 'END { print strtonum("0123") }' <<<''
83
$ awk 'END { print strtonum("0x123") }' <<<''
291
The string "0123" is correctly treated as containing an octal number and converted into the decimal number 83.
Similarly, the string "0x123" is correctly treated as containing an hexadecimal number and converted into the decimal number 291.
Now, here's what happens if I run the same commands, but moving the numerical strings from the program text to the input data:
$ awk 'END { print strtonum($1) }' <<<'0123'
123
$ awk 'END { print strtonum($1) }' <<<'0x123'
291
I understand the second result which is identical as in the previous commands, but I don't understand the first one. Why does gawk now treat 0123 as a decimal number, even though it begins with a leading 0 which characterizes octal numbers?
I suspect it has something to do with the strnum attribute, because for some reason 1, gawk gives this attribute to 0123 but not to 0x123:
$ awk 'END { print typeof($1) }' <<<'0123'
strnum
$ awk 'END { print typeof($1) }' <<<'0x123'
string
1 It may be due to a variation between awk implementations:
To clarify, only strings that are coming from a few sources (here quoting the POSIX spec): [...] are to be considered a numeric string if their value happens to be numerical (allowing leading and trailing blanks, with variations between implementations in support for hex, octal, inf, nan...).
I'm using gawk version 4.2.62, and the output of $ awk -V is:
GNU Awk 4.2.62, API: 2.0 (GNU MPFR 3.1.4, GNU MP 6.1.0)
strtonum()looks for numbers first is to use a dummy string concatenation and force the numeric string to become a literal string:$ awk '{ print strtonum($1 "") }' <<<'0123'. Could you clarify which rule in the link of the user manual explains why0x123doesn't look like a number? Because it looks like a number to me; at least that's how I would write291in hexadecimal in an awk progam text. Is it because of the alphabetical characterx? – user938271 Feb 26 '19 at 14:13awkspecification: a number is a possibly empty sequence of spaces, followed by “+” or “-”, followed by digits forming a floating-point decimal number. “x” isn’t allowed in a number, so it’s a string. – Stephen Kitt Feb 26 '19 at 14:29gawktreats0x10as a 16 number (as required (by mistake) by some older version of the POSIX spec, but now only allowed) – Stéphane Chazelas Feb 26 '19 at 14:44split()in awk will create "dual-nature" values which are both strings and numbers. This does not happen with other functions:awk 'BEGIN{s="0 1 2"; split(s, a); z = substr(s, 1, 1); print a[1], z, (a[1] == z), a[1] ? "yes" : "no", z ? "yes" : "no"}'. Same thing with$1as witha[1]. As I already mentioned in a couple of comments / answers, this is not properly specified in the standard, but only vaguely alluded to. It probably deserves its own Q&A. – Feb 26 '19 at 14:45