Implementaion of random.shuffle

Josiah Carlson josiah.carlson at sbcglobal.net
Tue Jul 17 12:13:25 EDT 2007


shabda raaj wrote:
> The code for shuffle is
> 
>          if random is None:
>              random = self.random
>          for i in reversed(xrange(1, len(x))):
>              # pick an element in x[:i+1] with which to exchange x[i]
>              j = int(random() * (i+1))
>              x[i], x[j] = x[j], x[i]
> 
> With
>              j = int(random() * (i+1))
> being the only call to random().
> As long as (i + 1) is not large enough so that j is generated with
> equal probability in range [0, i] we would get all permutations with
> equal probability. With MT with period of 2**19937 (i+1) would have to
> be really large before this the case.

Be careful of assuming too much about permutations and randomness.  In 
particular, you are assuming too much about what mersenne twister 
guarantees, what that implies regarding random permutations, and what a 
random permutation really is.

In particular, if we merely enumerate all possible permutations of n 
elements, there are n! total permutations.  Now, because the generation 
of a permutation *is fundamentally indistinguishable from choosing a 
number in the range [0...n!)*, then we can rely on the knowledge and 
research regarding the generation of such numbers.  In particular, 
thanks to birthday collisions, you only need to generate O(sqrt(n!)) 
random permutations to have a 50% chance of having come upon a duplicate 
permutation.  For example:

 >>> a = range(10)
 >>> b = {}
 >>> import random
 >>> while 1:
...     x = a[:]
...     random.shuffle(x)
...     x = tuple(x)
...     if x not in b:
...             b[x] = None
...     else:
...             break
...
 >>> len(b)
3605
 >>> x
(4, 3, 9, 8, 0, 2, 1, 7, 6, 5)
 >>> x in b
True
 >>>

I only generated 3605 permutations of 10 items before I got a collision. 
  According to your claims, I should have gone through all 3628800 
possible permutations before that happened.  Note that I could repeat 
this exercise with any decent pseudo random number generator, or actual 
random information generated from any good random source.  As long as 
the PRNG doesn't suck, it should offer some variant of the above.


> Anyway, is there some test harness we can run to test the robustness
> of shuffle? We can run that test harness for large values and see at
> what point all permutations are not possible or come with unequal
> probability.

Shuffle works as intended.  Your assumptions about it's functionality 
are incorrect.  Testing will only prove that your assumptions are wrong.


  - Josiah



More information about the Python-list mailing list