What is the rationale for the bash shell not warning you of arithmetic overflow etc.?

Question

There are limits set for the arithmetic evaluation capabilities of the bash shell. The manual is succinct about this aspect of shell arithmetic but states:

Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language.

Which fixed-width integer this refers to is really about which data type is used (and the specifics of why this is are beyond this) but the limit value is expressed in /usr/include/limits.h in this fashion:

#  if __WORDSIZE == 64
#   define ULONG_MAX     18446744073709551615UL
#  ifdef __USE_ISOC99
#  define LLONG_MAX       9223372036854775807LL
#  define ULLONG_MAX    18446744073709551615ULL

And once you know that, you can confirm this state of fact like so:

# getconf -a | grep 'long'
LONG_BIT                           64
ULONG_MAX                          18446744073709551615

This is a 64 bits integer and this translates directly in the shell in the context of arithmetic evaluation:

# echo $(((2**63)-1)); echo $((2**63)); echo $(((2**63)+1)); echo $((2**64))
9223372036854775807        //the practical usable limit for your everyday use
-9223372036854775808       //you're that much "away" from 2^64
-9223372036854775807     
0
# echo $((9223372036854775808+9223372036854775807))
-1

So between 2⁶³ and 2⁶⁴-1, you get negative integers showing you how far off from ULONG_MAX you are¹. When the evaluation reaches that limit and overflows, by whatever order that is, you get no warning and that part of the evaluation is reset to 0 which may yield some unusual behavior with something like right-associative exponentiation for instance:

echo $((6**6**6))                      0   // 6^46656 overflows to 0
echo $((6**6**6**6))                   1   // 6^(6^46656) = 6^0 = 1
echo $((6**6**6**6**6))                6   // 6^(6(6^46656)) = 6^(6^0) = 6^1
echo $((6**6**6**6**6**6))         46656   // 6^(6^(6^(6^46656))) = 6^6
echo $((6**6**6**6**6**6**6))          0   // = 6^6^6^1 = 0
...

Using sh -c 'command' doesn't change anything so I have to assume this is normal and compliant output. Now that I think I have a basic but concrete understanding of the arithmetic range and limit and what it means in the shell for expression evaluation, I thought I could quickly peek at what data types the other software in Linux use. I used some bash sources I had to complement the input of this command:

{ shopt -s globstar; for i in /path/to/source_bash-4.2/include/**/*.h /usr/include/**/*.h; do grep -HE '\b(([UL])|(UL)|())LONG|\bFLOAT|\bDOUBLE|\bINT' $i; done; } | grep -iE 'bash.*max'

bash-4.2/include/typemax.h:#    define LLONG_MAX   TYPE_MAXIMUM(long long int)
bash-4.2/include/typemax.h:#    define ULLONG_MAX  TYPE_MAXIMUM(unsigned long long int)
bash-4.2/include/typemax.h:#    define INT_MAX     TYPE_MAXIMUM(int)

There's more output with the if statements and I can search for a command like awk too etc. I notice the regular expression I used doesn't catch anything about arbitrary precision tools I have such as bc and dc.

Questions

What is the rationale for not warning you (like awk does when evaluating 2^1024) when your arithmetic evaluation overflows? Why are the negative integers between 2⁶³ and 2⁶⁴-1 exposed to the end user when he's evaluating something?
I have read somewhere that some flavor of UNIX can interactively change ULONG_MAX? Has anyone heard of this?
If someone arbitrarily changes the value of the unsigned integer maximum in limits.h, then recompiles bash, what can we expect will happen?

^Note

^{1. I wanted to illustrate more clearly what I saw, as it is very simple empirical stuff. What I noticed is that:}

^{(a)Any evaluation that gives < 2^63-1 is correct}
^{(b)Any evaluation that gives => 2^63 up to 2^64 gives a negative
integer:}
- ^{The range of that integer is x to y. x = -9223372036854775808 and y = 0.}

^{Considering this, an evaluation which is like (b) can be expressed as
2^63-1 plus something within x..y. For instance if we're literally asked to evaluate (2^63-1)+100 002 (but could be any number smaller than in (a) ) we get -9223372036854675807. I'm just stating the obvious I guess but this also means that the two following expressions:}

^{(2^63-1) + 100 002 AND;}
^{(2^63-1) + (LLONG_MAX - {what the shell gives us for ((2^63-1) +
100 002), which is -9223372036854675807}) well, using positive values we have;}
- ^{(2^63-1) + (9223372036854775807 - 9223372036854675807 = 100 000)}
- ^{= 9223372036854775807 + 100 000}

^{are very close indeed. The second expression is "2" apart from (2^63-1) + 100 002 i.e. what we're evaluating. This is what I mean by you get negative integers showing you how far off from 2^64 you are. I mean with those negative integers and knowledge of the limits, well you cannot finish the evaluation within the x..y range in the bash shell but you can elsewhere - the data is usable up to 2^64 in that sense (I could add it up on paper or use it in bc). Beyond that however the behavior is similar to that of 6^6^6 as the limit is reached as described below in the Q...}

My guess is that the rationale boils down to "the shell is not the right tool for math". It's not designed for it and does not attempt to deal with it gracefully as you show. Hell, most shells don't even deal with floats! — terdon, Feb 27 '14 at 16:06
@terdon Although the way the shell deals with numbers in this case is exactly the same as every high level language I've ever heard of. Integer types are a fixed size and can overflow. — goldilocks, Feb 27 '14 at 16:54
@terdon Indeed, as I researched this since the 6^6^6 timing Q I came to realize that. I also guessed the reason why I couldn't find much content was because this had to do with C, or even C99. As I'm neither a developer nor an IT person, I have to come to terms with all the knowledge which backgrounds these assumptions. Surely someone who requires arbitrary precision knows about data type but obviously I'm not that person :) (but I did notice awk's behavior @ 2^53+1 i.e. float double; just is precision and internal vs. printing etc. is beyond me!). — , Feb 28 '14 at 00:49
If you want to work with big numbers in the shell, use bc, e.g.: $num=$(echo 6^6^6 | bc). Unfortunately, bc puts in line breaks, so you have to num=$(echo $num | sed 's/\\\s//g') afterward; if you do it in a pipe, there are actual newline characters, which are awkward with sed, although num=$(echo 6^6^3 | bc | perl -pne 's/\\\s//g') works. In either case you now have an integer which can be used, e.g., num2=$(echo "$num * 2" | bc). — goldilocks, Feb 28 '14 at 14:42
...Someone here pointed out you can disable this line break feature of bc by setting BC_LINE_LENGTH=0. — goldilocks, Feb 28 '14 at 14:52
@goldilocks I thought those breaks were just "visual presentation" in my terminal emulator loll. After some hands on playing around, I opted for a GMP wrapper for my everyday math in the shell - also happens to not have those line breaks by default :) — , Feb 28 '14 at 20:18

goldilocks · Accepted Answer · 2014-02-27T23:20:56.480

So between 2^63 and 2^64-1, you get negative integers showing you how far off from ULONG_MAX you are.

No. How do you figure that? By your own example, the max is:

> max=$((2**63 - 1)); echo $max
9223372036854775807

If "overflow" meant "you get negative integers showing you how far off from ULONG_MAX you are", then if we add one to that, shouldn't we get -1? But instead:

> echo $(($max + 1))
-9223372036854775808

Perhaps you mean this is a number you can add to $max to get a negative difference, since:

> echo $(($max + 1 + $max))
-1

But this does not in fact continue to hold true:

> echo $(($max + 2 + $max))
0

This is because the system uses two's complement to implement signed integers.¹ The value resulting from an overflow is NOT an attempt to provide you with a difference, a negative difference, etc. It is literally the result of truncating a value to a limited number of bits, then having it interpreted as a two's complement signed integer. For example, the reason $(($max + 1 + $max)) comes out as -1 is because the highest value in two's complement is all bits set except the highest bit (which indicates negative); adding these together basically means carrying all the bits to the left so you end up with (if the size were 16-bits, and not 64):

11111111 11111110

The high (sign) bit is now set because it carried over in the addition. If you add one more (00000000 00000001) to that, you then have all bits set, which in two's complement is -1.

I think that partially answers the second half of your first question -- "Why are the negative integers...exposed to the end user?". First, because that is the correct value according to the rules of 64-bit two's complement numbers. This is the conventional practice of most (other) general purpose high level programming languages (I cannot think of one that does not do this), so bash is adhering to convention. Which is also the answer to the first part of the first question -- "What's the rationale?": this is the norm in the specification of programming languages.

WRT the 2nd question, I have not heard of systems which interactively change ULONG_MAX.

If someone arbitrarily changes the value of the unsigned integer maximum in limits.h, then recompiles bash, what can we expect will happen?

It would not make any difference to how the arithmetic comes out, because this is not an arbitrary value that is used to configure the system -- it's a convenience value that stores an immutable constant reflecting the hardware. By analogy, you could redefine c to be 55 mph, but the speed of light will still be 186,000 miles per second. c is not a number used to configure the universe -- it's a deduction about the nature of the universe.

ULONG_MAX is exactly the same. It is deduced/calculated based on the nature of N-bit numbers. Changing it in limits.h would be a very bad idea if that constant is used somewhere assuming it is supposed to represent the reality of the system.

And you cannot change the reality imposed by your hardware.

^{1. I don't think that this (the means of integer representation) is actually guaranteed by bash, since it depends on the underlying C library and standard C does not guarantee that. However, this is what is used on most normal modern computers.}

I'm very thankful! Coming to terms with the elephant in the room and thinking. Yes in the first part it's mostly about words. I have updated my Q to show what I had meant. I'll research why two's complement describes some of what I saw and your answer is invaluable in understanding that! As far as the UNIX Q is concerned I must have misread something about ARG_MAX with AIX here. Cheers! — , Feb 28 '14 at 06:16
In fact you can use two's complement to determine the value if you are sure you are in the range > 2 * $max, as you describe. My points are 1) that's not the purpose, 2) make sure you understand if you want to do that, 3) it's not very useful because of the very limited applicability, 4) as per the footnote it's not actually guaranteed that the system does use two's complement. In short, trying to exploit that in program code would be considered a very poor practice. There are "big number" libraries/modules (for shells under POSIX, bc) -- use those if you need to. — goldilocks, Feb 28 '14 at 13:00
It's only recently I watched something which leveraged the two's complement to implement an ALU with a 4-bit binary adder with fast carry IC; there was even a comparison with one's complement (to see how off it was). Your explanation was instrumental in me being able to name and connect what I saw here with what was discussed in those videos, increasing the chance I might really grasp all the implications down the line once it all sinks in.Thanks again for that! Cheers! — , Nov 23 '18 at 20:20

What is the rationale for the bash shell not warning you of arithmetic overflow etc.?

1 Answers1

Linked