Testing random

Steven D'Aprano steve at pearwood.info
Sun Jun 7 08:52:51 EDT 2015


On Sun, 7 Jun 2015 04:27 pm, Cecil Westerhof wrote:

> I wrote a very simple function to test random:
>     def test_random(length, multiplier = 10000):
>         number_list = length * [0]
>         for i in range(length * multiplier):
>             number_list[random.randint(0, length - 1)] += 1
>         minimum = min(number_list)
>         maximum = max(number_list)
>         return (minimum, maximum, minimum / maximum)

Putting aside the timing aspects, your frequency calculations are not done
in a very Pythonic manner. A better way might be:


from collections import Counter
from random import randint

def test_random(length, multiplier = 10000):
    freq = Counter(
        randint(0, length - 1) for i in range(length * multiplier)
        )
    minimum = min(freq[i] for i in range(length))
    maximum = max(freq.values())
    return (minimum, maximum, minimum / maximum)


Note carefully the way I calculate the minimum value. That way, if by chance
some number has a frequency of zero, the minimum returned will be zero. For
the maximum, that doesn't matter, since any missing numbers will not be the
most frequent number.

Better than returning the min and max, you should return the counter object
itself. Before doing so, I make sure that any missing numbers are given a
value of zero.


def test_random(size, samples=1000000):
    """Return a sample of `samples` random numbers from 0 to size-1."""
    freq = Counter(randint(0, size-1) for i in range(samples))
    # Avoid missing numbers. (Likely for small samples.)
    freq.subtract(dict.fromkeys(range(size), 0))
    return freq


Now once you have the frequencies, you can look at many interesting
statistics, and verify that you have a valid sample.

py> f = test_random(100)
py> sum(f.values()) == 1000000  # Total number of samples is right?
True
py> sorted(f) == list(range(100))  # Any missing numbers?
True
py> max(f) - min(f)  # statistical range
99
py> min(f.values())  # Lowest frequency?
9775
py> max(f.values())  # And the highest?
10313

The ratio of those two frequencies isn't very interesting, but we can
calculate it:

py> min(f.values())/max(f.values())
0.9478328323475226

For a uniformly distributed population, that ratio will be 1.0 exactly, but
for any finite sample, it could be any number between 0 and 1. We expect
that if we have a large random sample, the ratio should approach 1, but
moderate deviations from that are expected. In fact, we should be surprised
if the ratio is exactly 1, and suspect shenanigans if it is.

We can extract the mode, and its frequency:

py> f.most_common(1)  # mode
[(85, 10313)]

In Python 3.4 or better, we can look at some more statistics:

py> import statistics
py> statistics.median(f.elements())
50.0
py> statistics.mean(f.elements())
49.531949
py> statistics.stdev(f.elements())
28.873178122988367

And verify that the mode is correct:

py> statistics.mode(f.elements())
85

We can see what happens if the data is fiddled with:

py> g = f.copy()
py> g[42] += g[23]
py> g[23] = 0
py> statistics.median(g.elements())
50.0
py> statistics.mean(g.elements())
49.7263
py> statistics.stdev(g.elements())
28.757647388343212
py> g.most_common(3)
[(42, 20258), (85, 10313), (56, 10208)]

The median is unchanged, the mean shifts slightly higher, and the standard
deviation increases. But as you can see, these are not particularly
powerful tests of randomness: only the mode shows an obvious deviation from
uniformity.


> But what are the ranges you
> would expect with a good random function. Then it could be used to
> test a random function.


Python's pseudo-random number generator is based on the Mersenne Twister.
This is one of the most high-quality and extensively tested PRNGs
available, with a period of 2**19937-1 before it repeats. Its randomness
properties are very good indeed. (Although of course that isn't to say that
there aren't bugs in the Python implementation.)

Mersenne Twister is is not suitable for cryptographic work, but apart from
that, it is pretty much as indistinguishable from random as any
deterministic computer program can be.



-- 
Steven




More information about the Python-list mailing list