Testing random

Sun Jun 7 11:51:20 EDT 2015

Chris Angelico wrote:

> On Sun, Jun 7, 2015 at 8:40 PM, Thomas 'PointedEars' Lahn
> <PointedEars at web.de> wrote:
>> Cecil Westerhof wrote:
>>> I wrote a very simple function to test random:
>>>     def test_random(length, multiplier = 10000):
>>>         number_list = length * [0]
>>>         for i in range(length * multiplier):
>>>             number_list[random.randint(0, length - 1)] += 1
>>>         minimum = min(number_list)
>>>         maximum = max(number_list)
>>>         return (minimum, maximum, minimum / maximum)
>>
>> As there is no guarantee that every number will occur randomly, using a
>> dictionary at first should be more efficient than a list:
> 
> Hmm, I'm not sure that's actually so. His code is aiming to get
> 'multiplier' values in each box; for any serious multiplier (he starts
> with 10 in the main code), you can be fairly confident that every
> number will come up at least once. 

The wording shows a common misconception: that random distribution would 
mean that it is guaranteed or more probable that every element of the set 
will occur at least once.  It is another common misconception that 
increasing the number of trials would increase the probability of that 
happening.  But that is not so.

The law of large numbers only says that as you increase the number of 
trials, that the relative frequency *approaches* the probability for each 
value of the probability variable, or IOW “the average of the results 
obtained from a large number of trials should be close to the expected 
value, and will *tend to become closer* as more trials are performed.” 
(<http://en.wikipedia.org/wiki/Law_of_large_numbers>; emphasis by me)

> […] a true RNG could legitimately produce nothing but 7s for the entire 
> run, it's just extremely unlikely.

That reasoning is precisely a result of the misconception described above.  
Because people think that every value must occur, they do not think it 
possible that (much) repetition could occur with a (pseudo-)random 
generator, and when they want to mince words they say “(extremely) unlikely” 
instead.  For example, when people see

  6 5 8 7 9 3 7 8 4 7 5 6 8 8 1 2 8 3 5 7 5 4 1 2 4 8 8 7 5 1

and are told that this is random sequence, they find it hard to believe.  
They think something like: “Look at all those repeated 8, and “7, 5” 
occurs twice.  4 occurs more often than 2, and there are much more 5s than 
1s.  That cannot be possibly be random!”

Yes, it *can*.  I have just produced it with

#------------------------------------------------------------------
import random
print(" ".join([str(random.randint(1, 9)) for i in range(0, 30)]))'
#------------------------------------------------------------------

But if you think this *Mersenne Twister* PRNG which “generates numbers with 
nearly uniform distribution and a large period” is flawed, use a proper die 
as RNG, and a sheet of paper to record the outcomes, and do the experiment.  
The outcome is not going to be different.

If the distribution is even and the game is fair, every outcome *always* has 
the same probability of occurring.  As a result, every sequence of outcomes 
of equal length *always* has the same probability of occurring, and the 
probability for a particular sequence of equal length does _not_ increase or 
decrease based on the number of occurrences of previous outcomes.  Those are 
*independent* events.

See also: 
<http://www.teacherlink.org/content/math/interactive/probability/numbersense/misconceptions/home.html>, 
in particular the section “Representativeness”

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.