23

I have been experimenting with hex numbers in AWK (gawk), but sometimes when I print them using e.g. printf, they are printed with some LSBs masked out, like in the following example:

awk 'BEGIN { x=0xffffffffbb6002e0; printf("%x\n", x); }'
ffffffffbb600000

Why do I experience this behaviour and how can I correct it?

I'm using gawk on Debian Buster 10.

Shuzheng
  • 4,411
  • Javascript has exactly the same behavior, for the same reason: the number is a 64-bit float, even though it looks like an integer. – Ken Shirriff May 11 '21 at 18:06

4 Answers4

38

Numbers in AWK are floating-point numbers by default, and your value exceeds the precision available. 0xffffffffbb6002e0 ends up represented as 0 10000111110 1111111111111111111111111111111101110110110000000000 in IEEE-754 binary64 (double-precision) format, which represents the integer value 0xffffffffbb600000. Note the change in the low 12 bits, rounded to zero.

The smallest positive integer to get any rounding error when converted to double is 253 + 1. The larger the number, the larger the gap between values a double can represent. (Steps of 2, then 4, then 8, etc; that's why the low hex digits of your number round to zero.)


With GAWK, if it’s built with MPFR and MP (which is the case in Debian), you can force arbitrary precision instead with the -M option:

$ awk -M 'BEGIN { x=0xffffffffbb6002e0; printf("%x\n", x); }'
ffffffffbb6002e0

For calculations, this will default to the same 53 bits of precision as available with IEEE-754 doubles, but the PREC variable can be used to control that. See the manual linked above for extensive details.

There is a difference in handling for large integers and floating-point values requiring more than the default precision, which can result in surprising behaviour; large integers are parsed correctly with -M and its default settings (only subsequent calculations are affected by PREC), whereas floating-point values are stored with the precision defined at the time they are parsed, which means PREC needs to be set appropriately beforehand:

# Default settings, integer value too large to be exactly represented by a binary64
$ awk 'BEGIN { v=1234567890123456789; printf "%.20f\n", v }'
1234567890123456768.00000000000000000000
# Forced arbitrary precision, same integer value stored exactly without rounding
$ awk -M 'BEGIN { v=1234567890123456789; printf "%.20f\n", v }'
1234567890123456789.00000000000000000000
# Default settings, floating-point value requiring too much precision
$ awk 'BEGIN { v=123456789.0123456789; printf "%.20f\n", v }'
123456789.01234567165374755859
# Forced arbitrary precision, floating-point parsing doesn’t change
$ awk -M 'BEGIN { v=123456789.0123456789; printf "%.20f\n", v }'
123456789.01234567165374755859
# Forced arbitrary precision, PREC set in the BEGIN block, no difference
$ awk -M 'BEGIN { PREC=94; v=123456789.0123456789; printf "%.20f\n", v }'
123456789.01234567165374755859
# Forced arbitrary precision, PREC set initially
$ awk -M -vPREC=94 'BEGIN { v=123456789.0123456789; printf "%.20f\n", v }'
123456789.01234567890000000000

When reading input values, AWK only recognises decimal values as numbers; to handle non-decimal values (octal or hexadecimal), fields should be processed using GAWK’s strtonum function.

Peter Cordes
  • 6,466
Stephen Kitt
  • 434,908
9

To convert an string (that looks like a number) in awk:

  1. It could be assigned to a variable as a program constant.
  2. The function strtonum() could convert the text.
  3. Awk could be called with the option -n (now deprecated).

Once converted to a number, in most awk (gawk, mawk, nawk, bawk), it is stored as a 64 bit floating point. Those numbers could include only 53 bits of mantissa. Any additional bits are truncated. That allows for 53/4 = 13 hexadecimal digits (well, technically, 1 as the integer and 13 digits after the dot).

The hexadecimal you used 0xffffffffbb6002e0 is this in binary:

bc <<<"obase=2;ibase=16;FFFFFFFFBB6002E0"
1111111111111111111111111111111110111011011000000000001011100000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<== up to here 53 bits.

All fractional numbers and most integers in awk are stored as a floating number. The only other option with GNU awk is to use arbitrary precision, the -M option. Using that option means that immediately all integers are represented with as many digits as needed and as the computer memory allows.

$ awk -M 'BEGIN{print 3^4^5}'
373391848741020043532959754184866588225409776783734007750636931722079040617265251229993688938803977220468765065431475158108727054592160858581351336982809187314191748594262580938807019951956404285571818041046681288797402925517668012340617298396574731619152386723046235125934896058590588284654793540505936202376547807442730582144527058988756251452817793413352141920744623027518729185432862375737063985485319476416926263819972887006907013899256524297198527698749274196276811060702333710356481

Which will allow your integer to be used without any problem as long as it is used only in calculations with other integers. No division.

$ awk -M 'BEGIN{x=strtonum(0xffffffffbb6002e0); y=x+234; z=x/77; printf("%x\n%x\n%f\n",x,y,z)}'
ffffffffbb6002e0
ffffffffbb6003ca
239568104838418400.000000

The correct result from x/77 should be 239568104838418388.36363636363636363636 according to bc.

If you need to have numbers with fractional part that require more than 53 bits (which is the precision retained even with -M) you need to make the variable PREC bigger than 53 as needed:

$ awk -M -vPREC=200 'BEGIN{x=strtonum(0xffffffffbb6002e0); y=x+234; z=x/77; printf("%x\n%x\n%f\n",x,y,z)}'
ffffffffbb6002e0
ffffffffbb6003ca
239568104838418388.363636

Hope that this helps.


Code for all claims:

Using the shell for portability and using the %a that is closer to the internal representation of floats, 53 bits is 13 digits:

$ dash -c 'printf "%a\n" 0x1.12345678901234567890123'
0x1.1234567890123p+0

Other shells (and some awk) might use an 80 bit number with 64 bit mantissa which could use up to 16 digits:

ksh -c 'printf "%a\n" 0x1.12345678901234567890123'
0x1.1234567890123456000000000000p+0

Awk is limited to what it could accept as hexadecimal (as a program constant (x=)).

$ awk 'BEGIN { x=0x1fffffffffffff ; y=0x3fffffffffffff; printf("%18s %16x\n%18s %16x\n", x, x+0,y,y+0); }'
  9007199254740991   1fffffffffffff
 18014398509481984   40000000000000

$ mawk -vx=$(printf '%d\n' 0xffffffff) 'BEGIN{y=x*2;printf("%18s %16x\n%18s %16x\n", x, x+0,y,y+0); }' 4294967295 7fffffff 8.58993e+09 7fffffff

$ bawk 'BEGIN { x=2147483647 ; y=x*2+1; printf("%18s %16x\n%18s %16x\n", x, x+0,y,y+0); }' 2147483647 7fffffff 4294967295 80000000

And, input from a file and/or the user can not accept hexadecimal numbers unless the -n option (which is already deprecated) or the function strtonum() (recommended) is used:

$ awk '{x=$1; printf "%s %x\n",x,x}' <<<0x123
0x123 0

$ awk -n '{x=$1; printf "%s %x\n",x,x}' <<<0x123 0x123 123

$ awk -n '{x=strtonum($1); printf "%s %x\n",$1,x}' <<<0x123 0x123 123

On the first input awk only reads the first 0 and rejects everything after the x because it looks like a word. It works correctly on the other two cases.

So, we must use a decimal number to simplify things for awk. If your printf is limited, use bc:

$ val=$(printf "%d" 0x1234567890)
$ awk -vx="$val" 'BEGIN{printf "%s %x\n", x,x}'
78187493520 1234567890

$ val=$(bc <<<'ibase=16;1234567890') $ awk -vx="$val" 'BEGIN{printf "%s %x\n", x,x}' 78187493520 1234567890

But still, awk is limited:

$ val=$(bc <<<'ibase=16; 12345678901234')
$ awk -vx="$val" 'BEGIN{printf "%s %x\n", x,x}'
5124095575331380 12345678901234

$ val=$(bc <<<'ibase=16; 123456789012345') $ awk -vx="$val" 'BEGIN{printf "%s %x\n", x,x}' 81985529205302085 123456789012340

Here it cuts the last 5, as it could not be represented in a float of 53 bits.

The ability to process large numbers improve if the bignum (-M) option for arbitrary precision is used, but only for integers:

$ val=$(bc <<<'ibase=16; 12345678901234567890123456789')" 
$ awk    -vx="$val" 'BEGIN{printf "%s %x\n", x,x}'
5907679980460342222050878921467785 5.90768e+33

$ awk -M -vx="$val" 'BEGIN{printf "%s %x\n", x,x}' 5907679980460342222050878921467785 12345678901234567890123456789

If you actually need to work with big numbers and long decimals, you need to also change the PREC used (53 by default).

$ awk -M -vx='12345678901234567890123456789' 'BEGIN{printf "%s \n%f\n", x,x/100}'
12345678901234567890123456789 
123456789012345678152597504.000000

$ awk -M -vPREC=500 -vx='12345678901234567890123456789' 'BEGIN{printf "%s \n%f\n", x,x/100}' 12345678901234567890123456789 123456789012345678901234567.890000

0

The way I deal with the different precision levels of gawk, mawk134, and mawk2, is by writing a wrapper function to encapsulate sub-shell gawk execution. So whenever any function detects the input is higher than the precision of its current environment, it'll call itself via this wrapper to have gawk -M in sub-shell execute it, and return the result using getline (encapsulated away via the wrapper, which also trims out last single trailing \n).

Say if I wanna do prime factoring of 2^190 - 1. I quote them and pass it as a string into my functions, so the sub-shell still can see it all instead of having precision pre-trimmed, thus nullifying the point of the subshell.

As part of the wrapper, I also make a best guess estimate of what PREC I need to declare for the sub-shell, then add some fixed padding on top of that just to be sure.

peterh
  • 9,731
0

@user232326 : mawk does NOT have fewer digits or less precision compared to any other non-bignum awk

echo '0x1' | mawk '{ __ = (_+=_^=_<_)^_^_+_
                      _ = $___
                      do { print _
                            _ = (_) "F" } while(--__) }' | 

mawk '$++NF = +$_' CONVFMT='%.f'

0x1 1
0x1F 31
0x1FF 511
0x1FFF 8191
0x1FFFF 131071

0x1FFFFF 2097151 0x1FFFFFF 33554431 0x1FFFFFFF 536870911 0x1FFFFFFFF 8589934591 0x1FFFFFFFFF 137438953471

0x1FFFFFFFFFF 2199023255551 0x1FFFFFFFFFFF 35184372088831 0x1FFFFFFFFFFFF 562949953421311 0x1FFFFFFFFFFFFF 9007199254740991 <—- same 53-bit cutoff as everyone else

0x1FFFFFFFFFFFFFF 144115188075855872 0x1FFFFFFFFFFFFFFF 2305843009213693952 0x1FFFFFFFFFFFFFFFF 36893488147419103232 0x1FFFFFFFFFFFFFFFFF 590295810358705651712

As for gawk w/ GMP, the same syntax works just as well ::

gawk -nMb '$++NF = +$_'

gawk -nMb '$+=' # bare-minimum for decoding only


0x1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FF

4039625758913875912589359586083743995055512833714435504016293178
4405818923584863616496501764403641829610897451152372524367649448
9381136513688601904830603539007885967091262451146877471879870651
3349507204798008446034599027330327469520229761792309521308822705
7315045381303609469864426332260759440498904912981902392916737085
4280258562184832057118685702200441579025725972570741637827408855
7539268782324108022139590742950464807732169699793894037705738050
4622016541609039033907105888525262156446377158664154337098178225
8208724188074965854412482977694064579867966694295026692915370058
0664809825619018524194481701382449528831

0x1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FF

1363435169524269911828730305917228578060330202495290865539790982
7371516648113991190203302931565024840299466147451962815913873385
2522148007951372470634694349884863287355274123692326228153506025
6449621064516989547745964858178261655700151613192377357876272375
3156697498965000125995000331697960011316222573511904381270377443
4456738875664045570001855928254411693905730491114844096087034918
0110069521894617937431758347281426018625770132400320548130722516
6581169572950590530092280535153064143971968989561240329571796530
4034671009378453485380650812724095212618212396062157234960481467
1312496126201400644568812644451589364365644177632186208286933790
9829018821635355377770812129054951732940645330118952100067845292
9499538615211430772648425799413833175610717429354109678520925936
9213741889835945449962987646277279520936823276212321257829515103
3833942601963332189216012837778886594561378459489506510978807905
1381170571094365120915372807398095428422973784056054752315872220
6176340894514475543363534324174989951767803426763248896782575695
6047046953931595142033232854792107764474920475260981720944316709
8739461779304244458289383760901691933781958789000516340686979313
5446799567286170096351223188223641209884190975068669526951363494
8064661926449661593498469096732000102551481986625059721859097023
1726688811730072732699831526450745038544550742727648714254733590
7774351873409504567020373570218528016291856798481655030939274271
4787653513477621616294217160057179651190358795284704549316135872
5782302794393563446379952423431203733351023596207851972269134924
1473866429560425952173227274453753743028488443693826611022885537
5995608665402239484573975444565201963942537401503451945098413849
2184683355191045373463587431269003903982824275385711729868082132
6682795880292533526378698693688996358845682733278495368170564525
9456366887736325297172769244159702608462232371649579128861166308
3146179523134793830196042118066256429686413912354295908154824425
6130341819910904936906533980771755459818443263092971274710031970
7573512975965883846474277774713629817982323051193651451248156736
2996831962992996635761366324620420095941245617301824773213245469
3201871694784767029791611132886255360021651781010829953339166910
0115409479385913245176484047267776268252482696842335688711201114
8647555167444815602858442639151579849209445956292545371253991252
9247285581648184047805290035018987112075523528720001019297013315
3178377044945635942923139339040177840832567544772268363635082205
8618823457749648313208184464474111