Testing random

Sun Jun 7 06:41:53 EDT 2015

On 07.06.2015 08:27, Cecil Westerhof wrote:
> I wrote a very simple function to test random:
>     def test_random(length, multiplier = 10000):
>         number_list = length * [0]
>         for i in range(length * multiplier):
>             number_list[random.randint(0, length - 1)] += 1
>         minimum = min(number_list)
>         maximum = max(number_list)
>         return (minimum, maximum, minimum / maximum)
> 
> When running:
>     for i in range(1, 7):
>         print(test_random(100, 10 ** i))
> I get:
>     (3, 17, 0.17647058823529413)
>     (73, 127, 0.5748031496062992)
>     (915, 1086, 0.8425414364640884)
>     (9723, 10195, 0.9537027954879843)
>     (99348, 100620, 0.9873583780560524)
>     (997198, 1002496, 0.9947151908835546)
> 
> and when running:
>     for i in range(1, 7):
>         print(test_random(1000, 10 ** i))
> I get:
>     (2, 20, 0.1)
>     (69, 138, 0.5)
>     (908, 1098, 0.8269581056466302)
>     (9684, 10268, 0.9431242695753799)
>     (99046, 100979, 0.9808574059953059)
>     (996923, 1003598, 0.9933489305478886)
> 
> It shows that when the first parameter increases the deviation
> increases and when the second parameter increases the deviation
> decreases. Exactly what you would expect. But what are the ranges you
> would expect with a good random function. 

Really depends on the number of samples. Appearantly, a good RNG would
come close to 1.0 for the ratio.

> Then it could be used to test a random function.

This is an interesting test (more interesting to me than it looks at the
first sight, and certainly better than what I had come up with), but
unfortunately, there is more to testing randomness.

The test clearly suggests that random functions should have a min/max
ratio of about 1.0. Think of a "random" function like this:

    def fake_random(minv, maxv, _cache={}):
        try:
            return next(_cache[minv, maxv])
        except (StopIteration, KeyError):
            iterator = iter(range(minv, maxv+1))
            _cache[minv, maxv] = iterator
            return next(iterator)

(if that is hard to read, I agree; it returns the sequence from minv to
maxv (inclusively) over and over again for equal minv and maxv. don’t
pass anything to _cache :), just call it like random.randint)

This gives a "perfect" ratio of 1.0, but is not very random. This kind
of function would probably be ruled out by a compression or correlation
test. If you want to dig deeper into the algorithms for random testing,
the dieharder test suite [1] is probably worth a look.

It all boils down to the use case. For some use cases, the
``fake_random`` might be good enough (unittesting would be such a case:
it is certainly uniformly distributed and allows full coverage of the
tested range), for others it would fail catastrophically (don’t generate
your cryptographic keys with this!).

cheers,
Jonas

   [1]: http://www.phy.duke.edu/~rgb/General/dieharder.php

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20150607/9fa6160b/attachment.sig>