Implementaion of random.shuffle

Hrvoje Niksic hniksic at xemacs.org
Mon Jul 16 10:55:53 EDT 2007


Steve Holden <steve at holdenweb.com> writes:

> So it would appear that the developers chose the Knuth algorithm
> (with a slight variation) for *their* implementation. Now you have
> to ask yourself whether your surmise is genuinely correct (in which
> case the documentation may contain a bug) or whether the
> documentation is indeed correct and you are in error.

That is a good question.  The random module uses the Mersenne twister,
which has a repetition period of 2**19937.  The number of n-sized
permutations of a list with n elements is n!, while each shuffle
requires n calls to the PRNG.  This means that to be able to generate
all permutations, the PRNG must have a period of at least n! * n.  In
the case of MT, it means that, regarding the period, you are safe for
lists with around 2079 elements.  shuffle's documentation may have
been written before the random module was converted to use the MT.

2**19937 being a really huge number, it's impossible to exhaust the
Mersenne twister by running it in sequence.  However, there is also
the question of the spread of the first shuffle.  Ideally we'd want
any shuffle, including the first one, to be able to produce any of the
n! permutations.  To achieve that, the initial state of the PRNG must
be able to support at least n! different outcomes, which means that
the PRNG must be seeded by at least log2(n!) bits of randomness from
an outside source.  For reference, Linux's /dev/random stops blocking
when 64 bits of randomness are available from the entropy pool, which
means that, in the worst case, shuffling more than 20 elements cannot
represent all permutations in the first shuffle!



More information about the Python-list mailing list