[Tutor] A question on randomness

Paul Sidorsky paulsid@shaw.ca
Thu, 11 Jul 2002 14:19:39 -0600


Andrea Valle wrote:

> But it seems to me that the code works as it often produces looping patterns
> (at least, they seem so to me): if the range is  0, 5, something like
> 12312354542222 etc.
> I know pseudo-random generation is a complex matter. Is there a way to
> "improve" it? (I mean, to obtain a "good sequence" like 1432543122140133
> etc.)

I'm not an expert by any means but I have taken some University-level
statistics, and to be quite honest I can't see any difference in those
two examples.  

>From what I can tell, you appear to be asking for avoiding long runs
(like the 4 twos in a row) and clusters of numbers (like 123123). 
However, there's nothing out of the ordinary about getting repeating
numbers or patterns of numbers.  If anything it probably means the
random number generator is doing its job.  You can't complain if you
know the dice are fair but you don't like the outcome.  :-)

Anything observed over the short term is virtually meaningless in
analysing an RNG.  Proper statistical analysis would require many, many
trials.  I'm pretty sure the Python people did all this stuff when they
first implemented the RNG, and I expect Python's generator is one of the
fairest available in a common language.

Anyhow, here are some totally unscientific tests:

Frequency-wise the RNG looks to be fair.  In a million trials the
distribution came out quite close:

>>> freqdict = {}
>>> for i in range(5):
	freqdict[i] = 0
>>> import random
>>> for i in range(1000000):
	freqdict[random.randrange(5)] += 1	
>>> freqdict
{0: 200112, 1: 200110, 2: 199790, 3: 199432, 4: 200556}

In a small sample there's nothing too out of the ordinary patternwise
that I can see.  There's only one run of 4.  The odds of any run of 4
are only 1 in 125.  Far far far more improbable things happen to all of
us every day and we don't even notice.  :-)

>>> for i in range(100):
	print random.randrange(5),

1 3 4 4 2 3 3 2 2 2 3 4 3 0 2 3 4 2 3 1 1 3 0 0 1 1 4 4 3 3 4 0 4 1 1 1
0 3 0 4 0 1 2 0 3 2 0 0 2 1 4 3 2 4 4 1 1 1 1 2 4 4 1 2 0 4 3 3 0 2 1 3
1 4 0 1 1 4 3 4 2 4 0 1 1 0 2 2 4 3 1 2 2 4 1 3 4 2 4 4

Notice that if I try to "prevent duplicates" by using a longer range, we
still get runs and clusters of numbers:

>>> for i in range(100):
	print random.randrange(1000000) % 5,
	
3 1 4 0 1 0 4 2 2 4 4 4 2 1 4 0 1 1 1 4 4 3 4 3 4 2 3 0 0 0 1 3 3 2 4 0
3 1 4 3 4 0 1 0 2 4 4 3 0 1 2 0 4 1 2 1 1 4 1 3 4 0 0 0 2 2 2 1 3 3 4 1
3 0 3 4 1 1 4 0 3 3 1 0 0 0 1 0 0 4 0 1 0 2 0 3 2 2 4 3

If you truly want to place a limit on a run length, you'll have to
manage it yourself, kind of like this:

_lastnum = -1;
_count = 0;

def myrandom():
    while 1:
        x = random.randrange(5)
        if x != _lastnum:
            _lastnum = x
            _count = 1
            break
        if _count < 2:
            _count += 1
            break

This (if it works; it's untested) will guarantee you won't get the same
number more than twice in a row.  However, I wouldn't be surprised if
this significantly affects the randomness of the generator.

I hope I've understood your problem accurately.  If not, feel free to
ask questions based on my examples.

-- 
======================================================================
Paul Sidorsky                                          Calgary, Canada
paulsid@shaw.ca                        http://members.shaw.ca/paulsid/