To expand on the topic of modulo bias, your formula is:
max=$((6*3600))
$(($RANDOM%max/3600))
And in this formula, $RANDOM
is a random value in the range 0-32767.
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.
It helps to visualize how this maps to possible values:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to 9*3600
, it turns out as:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767
1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM
is lower than 32768 % 6
(eliminate the states that do not map perfectly to available random range).
max=6
for f in {1..100000}
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n
Test result:
16425 5
16515 1
16720 0
16769 2
16776 4
16795 3
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).
$RANDOM
variable does not use a good PRNG internally. – forest Jul 07 '19 at 01:19