Does shuffle() produce uniform result ?

Mark Dickinson dickinsm at gmail.com
Fri Aug 24 21:30:14 EDT 2007


On Aug 24, 8:54 am, Hrvoje Niksic <hnik... at xemacs.org> wrote:
> tooru honda <tooru_ho... at fast-mail.org> writes:
> > I have read the source code of the built-in random module,
> > random.py.  After also reading Wiki article on Knuth Shuffle
> > algorithm, I wonder if the shuffle method implemented in random.py
> > produces results with modulo bias.
>
> It doesn't have modulo bias because it doesn't use modulo to produce a
> random index; it multiplies the floating point value with the desired
> range.  I'm not sure if that method produces any measurable bias.

It produces exactly the same level of bias as the 'modulo bias'
obtained by reducing a random integer in [0, 2**53). For example,
suppose we're trying to produce an integer x in the range 0 through 6
inclusive.  If n is a random variable whose values are uniformly
distributed across range(0, 2**53) then:

x = n % 7

will produce 0, 1, 2 and 3 with probability (2**53//7+1)/2**53, and 4,
5 and 6 with probability (2**53//7)/2**53, while

x = floor((n/2**53)*7)

will produce 0, 1, 3 and 5 with probability (2**53//7+1)/2**53, and 2,
4 and 6 with probability (2**53//7)/2*53.

Either way, you'd have a very hard time detecting such a tiny bias.
At the other end of the scale, if you're trying to produce a value in
[0, 2**53-2] (for example) then it looks worse:  with either method,
one of the values occurs exactly twice as often as all of the others.
But since there are now so many values, you'd again have problems
detecting any bias.

Steven Holden wrote:
> Frankly I don't think you need to worry.

What he said.

Mark




More information about the Python-list mailing list