There are limits set for the arithmetic evaluation capabilities of the bash
shell. The manual is succinct about this aspect of shell arithmetic but states:
Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language.
Which fixed-width integer this refers to is really about which data type is used (and the specifics of why this is are beyond this) but the limit value is expressed in /usr/include/limits.h
in this fashion:
# if __WORDSIZE == 64
# define ULONG_MAX 18446744073709551615UL
# ifdef __USE_ISOC99
# define LLONG_MAX 9223372036854775807LL
# define ULLONG_MAX 18446744073709551615ULL
And once you know that, you can confirm this state of fact like so:
# getconf -a | grep 'long'
LONG_BIT 64
ULONG_MAX 18446744073709551615
This is a 64 bits integer and this translates directly in the shell in the context of arithmetic evaluation:
# echo $(((2**63)-1)); echo $((2**63)); echo $(((2**63)+1)); echo $((2**64))
9223372036854775807 //the practical usable limit for your everyday use
-9223372036854775808 //you're that much "away" from 2^64
-9223372036854775807
0
# echo $((9223372036854775808+9223372036854775807))
-1
So between 263 and 264-1, you get negative integers showing you how far off from ULONG_MAX you are1. When the evaluation reaches that limit and overflows, by whatever order that is, you get no warning and that part of the evaluation is reset to 0 which may yield some unusual behavior with something like right-associative exponentiation for instance:
echo $((6**6**6)) 0 // 6^46656 overflows to 0
echo $((6**6**6**6)) 1 // 6^(6^46656) = 6^0 = 1
echo $((6**6**6**6**6)) 6 // 6^(6(6^46656)) = 6^(6^0) = 6^1
echo $((6**6**6**6**6**6)) 46656 // 6^(6^(6^(6^46656))) = 6^6
echo $((6**6**6**6**6**6**6)) 0 // = 6^6^6^1 = 0
...
Using sh -c 'command'
doesn't change anything so I have to assume this is normal and compliant output. Now that I think I have a basic but concrete understanding of the arithmetic range and limit and what it means in the shell for expression evaluation, I thought I could quickly peek at what data types the other software in Linux use. I used some bash
sources I had to complement the input of this command:
{ shopt -s globstar; for i in /path/to/source_bash-4.2/include/**/*.h /usr/include/**/*.h; do grep -HE '\b(([UL])|(UL)|())LONG|\bFLOAT|\bDOUBLE|\bINT' $i; done; } | grep -iE 'bash.*max'
bash-4.2/include/typemax.h:# define LLONG_MAX TYPE_MAXIMUM(long long int)
bash-4.2/include/typemax.h:# define ULLONG_MAX TYPE_MAXIMUM(unsigned long long int)
bash-4.2/include/typemax.h:# define INT_MAX TYPE_MAXIMUM(int)
There's more output with the if
statements and I can search for a command like awk
too etc. I notice the regular expression I used doesn't catch anything about arbitrary precision tools I have such as bc
and dc
.
Questions
- What is the rationale for not warning you (like
awk
does when evaluating 2^1024) when your arithmetic evaluation overflows? Why are the negative integers between 263 and 264-1 exposed to the end user when he's evaluating something? - I have read somewhere that some flavor of UNIX can interactively change ULONG_MAX? Has anyone heard of this?
- If someone arbitrarily changes the value of the unsigned integer maximum in
limits.h
, then recompilesbash
, what can we expect will happen?
Note
1. I wanted to illustrate more clearly what I saw, as it is very simple empirical stuff. What I noticed is that:
- (a)Any evaluation that gives < 2^63-1 is correct
- (b)Any evaluation that gives => 2^63 up to 2^64 gives a negative
integer:
- The range of that integer is x to y. x = -9223372036854775808 and y = 0.
Considering this, an evaluation which is like (b) can be expressed as 2^63-1 plus something within x..y. For instance if we're literally asked to evaluate (2^63-1)+100 002 (but could be any number smaller than in (a) ) we get -9223372036854675807. I'm just stating the obvious I guess but this also means that the two following expressions:
- (2^63-1) + 100 002 AND;
- (2^63-1) + (LLONG_MAX - {what the shell gives us for ((2^63-1) +
100 002), which is -9223372036854675807}) well, using positive values we have;
- (2^63-1) + (9223372036854775807 - 9223372036854675807 = 100 000)
- = 9223372036854775807 + 100 000
are very close indeed. The second expression is "2" apart from (2^63-1) + 100 002 i.e. what we're evaluating. This is what I mean by you get negative integers showing you how far off from 2^64 you are. I mean with those negative integers and knowledge of the limits, well you cannot finish the evaluation within the x..y range in the bash shell but you can elsewhere - the data is usable up to 2^64 in that sense (I could add it up on paper or use it in bc). Beyond that however the behavior is similar to that of 6^6^6 as the limit is reached as described below in the Q...
bc
, e.g.:$num=$(echo 6^6^6 | bc)
. Unfortunately,bc
puts in line breaks, so you have tonum=$(echo $num | sed 's/\\\s//g')
afterward; if you do it in a pipe, there are actual newline characters, which are awkward with sed, althoughnum=$(echo 6^6^3 | bc | perl -pne 's/\\\s//g')
works. In either case you now have an integer which can be used, e.g.,num2=$(echo "$num * 2" | bc)
. – goldilocks Feb 28 '14 at 14:42bc
by settingBC_LINE_LENGTH=0
. – goldilocks Feb 28 '14 at 14:52