Losing precision with dc

Question

I want to use dc to handle some base 16 numbers with hexadecimal points but I'm running into precision problems. For example, below I'm multiplying F423F.FD by 100, both hex. The expected answer is F423FFD, instead it is giving F423FFA.E1, close but not accurate enough even after rounding.

$ dc
16 d i o F423F.FD 100 * p
F423FFA.E1

I read that dc was an unlimited precision calculator, and this is not a large number by any means. Is there something I'm doing wrong?

Thank for your answers. Given the issues with dc, I bit the bullet and wrote my own parser for real numbers in other bases. If anyone is interested in the code, I can post it here.

Stephen Kitt · Answer 1 · 2018-07-17T21:47:06.710

8

Expressed as decimal (using dc to convert), this corresponds to 999999.98 (rounded down) × 256, i.e. 255999994.88, which is F423FFA.E1 in hexadecimal.

So the difference comes from dc’s rounding behaviour: instead of calculating 256 × (999999 + 253 ÷ 256), which would give 255999997, it rounds 253 ÷ 256 down and multiplies the result.

dc is an arbitrary precision calculator, which means it can calculate to any precision you want, but you have to tell it what that is. By default, its precision is 0, meaning division produces integer values only, and multiplication uses the number of digits in the input. To set the precision, use k (and bear in mind that precision is always expressed in decimal digits, regardless of the input or output radix):

10 k
16 d i o
F423FFD 100 / p
F423F.FD0000000
100 * p
F423FFD.000000000

(8 digits’ precision would be sufficient since that’s what you need to represent 1 ÷ 256 in decimal.)

edited Jul 17 '18 at 21:47

answered Jul 17 '18 at 16:51

Stephen Kitt

434,908

1

That would seem to be a completely unexpected result for an "arbitrary precision" calculator? – Yimin Rong Jul 17 '18 at 19:17
This is the correct answer, as it uses dc as intended without resorting to « add a bunch of 0s to fix it » (note that all of this information could be found in the man page). – D. Ben Knoble Jul 17 '18 at 21:21
3

It still loses precision when k is set: 10 k 16 d i o F423F.FD p → F423F.FA, so I would have to scale up all the numbers prior to using them in dc. Basically amounting to pre-parsing them anyway. – Yimin Rong Jul 17 '18 at 22:00
2

@Yimin yes, unfortunately dc scales its input using only the number of digits, which seems like a bug to me (since the number of digits is calculated using the input radix, but applied to the decimal value). – Stephen Kitt Jul 18 '18 at 07:33
In other words, dc always does internal computations in decimal? That's not what I expected. – dhag Jul 18 '18 at 14:34
1

@dhag that’s what POSIX specifies (for bc, which dc is based on): “Internal computations shall be conducted as if in decimal, regardless of the input and output bases, to the specified number of decimal digits.” – Stephen Kitt Jul 18 '18 at 14:58
1

It is really a problem of how a constant is being parsed. Try 20 k 16 d i o 0.3 1 / p (which prints .19999999999999999). Understand that the operation is just dividing 0.2 by 1 (which in theory should not change the value). While 20 k 16 d i o 0.3000 1 / p (correctly) prints .30000000000000000. (Cont.) – Jul 19 '18 at 11:16
(Cont.) A similar issue happens with bc bc <<<"scale=20;obase=16;ibase=16;0.3/1" while bc <<<"scale=20;obase=16;ibase=16;0.3000/1" works correctly. – Jul 19 '18 at 11:25
Note that ksh (not bash nor zsh) does the "right thing" with hexadecimal constants. Example: ksh -c 'a=0x0.fd; b=0x100;echo "$a" "$((a*b))"' prints 0x0.fd and 253. So, this works: ksh -c 'a=0xF423F.FD; b=0x100; printf "%x\n" "$((a*b))"' – Jul 19 '18 at 11:38
1

@Isaac yes, scaling being applied to decimal digits really means that fractional non-base-10 input is useless. Try 20 k 16 d i o 0.3 p for kicks... – Stephen Kitt Jul 19 '18 at 11:45
Hmm ... Not really useless @StephenKitt, it is just a non-intuitive fact that makes this happen. The solution is not difficult by the way. I'll write an answer to expand on the detail. – Jul 19 '18 at 11:55
@Isaac perhaps not useless but very unintuitive (especially since, as per POSIX, scaling is applied by truncating — so 0.3 in hex, which is 0.1875 in decimal, gets truncated to 0.1, which is represented as 0.1 in hex again thanks to more truncation). – Stephen Kitt Jul 19 '18 at 11:58
The solution I'll propose does not conflict with POSIX (imVHo), the math could be still done in decimal with no problem (and exact results). Working on it. .... – Jul 19 '18 at 12:03

score 6 · Accepted Answer · edited Jul 17 '18 at 17:09

6

Note that just printing the original number shows it is rounded:

$ dc <<<'16 d i o F423F.FD p'
F423F.FA

You can get around it by adding lots of trailing zeroes for more precision:

$ dc <<<'16 d i o F423F.FD000000 100 * p'
F423FFD.0000000

edited Jul 17 '18 at 17:09

slm

369,824

answered Jul 17 '18 at 17:03

meuh

51,383

Thanks. I think it will end up taking more code to massage the numbers for dc to use then to just write a parser directly! (Input may or may not have a decimal, and can be in other bases, so the amount of padding varies.) – Yimin Rong Jul 17 '18 at 19:21
2

I'll mark this as the accepted answer. The people responsible for maintaining dc replied: To properly handle non-decimal fractional digits would require a completely different model than the decimal-scale model used by dc and bc (as dictated by POSIX for bc, and by historical tradition for both)., so technically it could be fixed in dc, but that would probably break bc, so classified as WONTFIX. – Yimin Rong Jul 18 '18 at 12:05

score 1 · Answer 3 · 2018-07-19T19:35:20.890

The Issue

The problem is the way in which dc (and bc) understand numeric constants.
For example, the value (in hex) 0.3 (divided by 1) gets transformed into a value close to 0.2

$ dc <<<"20k 16 d i o 0.3 1 / p"
.199999999999999999999999999

In fact, the plain constant 0.3 also gets changed:

$ dc <<<"20 k 16 d i o     0.3     p"
.1

It seems that it is in an odd way, but it is not (more later).
Adding more zeros makes the answer approach the correct value:

$ dc <<<"20 k 16 d i o     0.30     p"
.2E

$ dc <<<"20 k 16 d i o     0.300     p"
.2FD

$ dc <<<"20 k 16 d i o     0.3000     p"
.3000

The last value is exact and will stay exact no matter how may more zeros are added.

$ dc <<<"20 k 16 d i o     0.30000000     p"
.3000000

The problem is also present in bc:

$ bc <<< "scale=20; obase=16; ibase=16;    0.3 / 1"
.19999999999999999

$ bc <<< "scale=20; obase=16; ibase=16;    0.30 / 1"
.2E147AE147AE147AE

$ bc <<< "scale=20; obase=16; ibase=16;    0.300 / 1"
.2FDF3B645A1CAC083

$ bc <<< "scale=20; obase=16; ibase=16;    0.3000 / 1"
.30000000000000000

One digit per bit?

The very non intuitive fact for floating point numbers is that the number of digits required (after the dot) is equal to the number of binary bits (also after the dot). A binary number 0.101 is exactly equal to 0.625 in decimal. The binary number 0.0001110001 is (exactly) equal to 0.1103515625 (ten decimal digits)

$ bc <<<'scale=30;obase=10;ibase=2; 0.101/1; 0.0001110001/1'; echo ".1234567890"
.625000000000000000000000000000
.110351562500000000000000000000
.1234567890

Also, for a floating point number like 2^(-10), which in binary has only one (set) bit:

$ bc <<<"scale=20; a=2^-10; obase=2;a; obase=10; a"
.0000000001000000000000000000000000000000000000000000000000000000000
.00097656250000000000

Has the same number of binary digits .0000000001 (10) as decimal digits.0009765625 (10). That may not be the case in other bases but base 10 is the internal representation of numbers in both dc and bc and therefore is the only base that we really need to care about.

The math proof is at the end of this answer.

bc scale

The number of digits after the dot could be counted with the built-in function scale() form bc:

$ bc <<<'obase=16;ibase=16; a=0.FD; scale(a); a; a*100'
2
.FA
FA.E1

As shown, 2 digits is insufficient to represent the constant 0.FD.

And, also, just counting the number of characters used after the dot is a very incorrect way to report (and use) the scale of the number. The scale of a number (in any base) should calculate the number of bits needed.

Binary digits in a hex float.

As it is known, each hex digit use 4 bits. Therefore, each hex digit after the decimal dot require 4 binary digits, which due to the (odd?) fact above also require 4 decimal digits.

Therefore, a number like 0.FD will require 8 decimal digits to be represented correctly:

$ bc <<<'obase=10;ibase=16;a=0.FD000000; scale(a);a;a*100'
8
.98828125
253.00000000

Add zeros

The math is straightforward (for hex numbers):

Count the number of hex digits (h) after the dot.
Multiply h by 4.
Add h×4 - h = h × (4-1) = h × 3 = 3×h zeros.

In shell code (for sh):

a=F423F.FD
h=${a##*.}
h=${#h}
a=$a$(printf '%0*d' $((3*h)) 0)
echo "$a"

echo "obase=16;ibase=16;$a*100" | bc

echo "20 k 16 d i o $a 100 * p" | dc

Which will print (correctly both in dc and bc):

$  sh ./script
F423F.FD000000
F423FFD.0000000
F423FFD.0000000

Internally, bc (or dc) could make the number of digits required match the number calculated above (3*h) to convert hex floats to the internal decimal representation. Or some other function for other bases (assuming the number of digits is finite in relation to base 10 (internal of bc and dc) in such other base). Like 2ⁱ (2,4,8,16,...) and 5,10.

posix

The posix specification states that (for bc, which dc is based on):

Internal computations shall be conducted as if in decimal, regardless of the input and output bases, to the specified number of decimal digits.

But "… the specified number of decimal digits." could be understood to be " … the needed number of decimal digits to represent the numeric constant" (as described above) without affecting the "decimal internal computations"

Because:

bc <<<'scale=50;obase=16;ibase=16; a=0.FD; a+1'
1.FA

bc is not really using 50 ("the specified number of decimal digits") as set above.

Only if divided it is converted (still incorrectly as it uses an scale of 2 to read the constant 0.FD before expanding it to 50 digits):

$ bc <<<'scale=50;obase=16;ibase=16; a=0.FD/1; a'
.FAE147AE147AE147AE147AE147AE147AE147AE147A

However, this is exact:

$ bc <<<'scale=50;obase=16;ibase=16; a=0.FD000000/1; a'
.FD0000000000000000000000000000000000000000

Again, reading numeric strings (constants) should use the correct number of bits.

Math proof

In two steps:

A binary fraction can be written as a/2ⁿ

A binary fraction is a finite sum of negative powers of two.

For example:

= 0.00110101101 = 
= 0. 0     0      1     1      0      1     0      1      1     0       1

= 0 + 0×2^-1 + 0×2^-2 + 1×2^-3 + 1×2^-4 + 0×2^-5 + 1×2^-6 + 0×2^-7 + 1×2^-8 + 1×2^-9 + 0×2^-10 + 1×2^-11

= 2^-3 + 2^-4 + 2^-6 + 2^-8 + 2^-9 + 2^-11 = (with zeros removed)

In a binary fraction of n bits, the last bit has a value of 2^-n, or 1/2ⁿ. In this example: 2^-11 or 1/2¹¹.

= 1/2³ + 1/2⁴ + 1/2⁶ + 1/2⁸ + 1/2⁹ + 1/2¹¹ = (with inverse)

In general, the denominator could become 2ⁿ with a positive numerator exponent of two. All terms can then be combined into a single value a/2ⁿ. For this example:

= 2⁸/2¹¹ + 2⁷/2¹¹ + 2⁵/2¹¹ + 2³/2¹¹ + 2²/2¹¹ + 1/2¹¹ = (expressed with 2¹¹)

= (2⁸ + 2⁷ + 2⁵ + 2³ + 2² + 1 ) / 2¹¹ = (extracting common factor)

= (256 + 128 + 32 + 8 + 4 + 1) / 2¹¹ = (converted to value)

= 429 / 2¹¹

Every Binary Fraction Can Be Expressed As b/10ⁿ

Multiply a/2ⁿ by 5ⁿ /5ⁿ, getting (a×5ⁿ)/(2ⁿ×5ⁿ) = (a×5ⁿ)/10ⁿ = b/10ⁿ, where b = a×5ⁿ. It has n digits.

For the example, we have:

(429·5¹¹)/10¹¹ = 20947265625 / 10¹¹ = 0.20947265625

It has been shown that every binary fraction is a decimal fraction with the same number of digits.