Implementaion of random.shuffle

Tue Jul 17 02:44:05 EDT 2007

On Jul 16, 8:29 pm, Steve Holden <st... at holdenweb.com> wrote:
> Hrvoje Niksic wrote:
> > Steve Holden <st... at holdenweb.com> writes:
>
> >> So it would appear that the developers chose the Knuth algorithm
> >> (with a slight variation) for *their* implementation. Now you have
> >> to ask yourself whether your surmise is genuinely correct (in which
> >> case the documentation may contain a bug) or whether the
> >> documentation is indeed correct and you are in error.
>
> > That is a good question.  The random module uses the Mersenne twister,
> > which has a repetition period of 2**19937.  The number of n-sized
> > permutations of a list with n elements is n!, while each shuffle
> > requires n calls to the PRNG.  This means that to be able to generate
> > all permutations, the PRNG must have a period of at least n! * n.  In
> > the case of MT, it means that, regarding the period, you are safe for
> > lists with around 2079 elements.  shuffle's documentation may have
> > been written before the random module was converted to use the MT.
>
> > 2**19937 being a really huge number, it's impossible to exhaust the
> > Mersenne twister by running it in sequence.  However, there is also
> > the question of the spread of the first shuffle.  Ideally we'd want
> > any shuffle, including the first one, to be able to produce any of the
> > n! permutations.  To achieve that, the initial state of the PRNG must
> > be able to support at least n! different outcomes, which means that
> > the PRNG must be seeded by at least log2(n!) bits of randomness from
> > an outside source.  For reference, Linux's /dev/random stops blocking
> > when 64 bits of randomness are available from the entropy pool, which
> > means that, in the worst case, shuffling more than 20 elements cannot
> > represent all permutations in the first shuffle!
>
> Thanks for this thoughtful analysis. I believe we can therefore leave
> the documentation (which almost certainly *was* written before the
> adoption of MT) alone for now.
>
> regards
>   Steve
> --
> Steve Holden        +1 571 484 6266   +1 800 494 3119
> Holden Web LLC/Ltd          http://www.holdenweb.com
> Skype: holdenweb      http://del.icio.us/steve.holden
> --------------- Asciimercial ------------------
> Get on the web: Blog, lens and tag the Internet
> Many services currently offer free registration
> ----------- Thank You for Reading -------------

The code for shuffle is

         if random is None:
             random = self.random
         for i in reversed(xrange(1, len(x))):
             # pick an element in x[:i+1] with which to exchange x[i]
             j = int(random() * (i+1))
             x[i], x[j] = x[j], x[i]

With
             j = int(random() * (i+1))
being the only call to random().
As long as (i + 1) is not large enough so that j is generated with
equal probability in range [0, i] we would get all permutations with
equal probability. With MT with period of 2**19937 (i+1) would have to
be really large before this the case.

Anyway, is there some test harness we can run to test the robustness
of shuffle? We can run that test harness for large values and see at
what point all permutations are not possible or come with unequal
probability.