Is there such a beast as a "perfect" shuffle? :)

Sun May 12 03:00:27 EDT 2002

> http://www.faqts.com/knowledge_base/view.phtml/aid/4406/fid/546

[Christos Georgiou]
> This seems to be what I want, as long as it works.  Thanks, tim_one.

How will you know whether it works <0.1 wink>?  While the std
random.shuffle() can only get at a miniscule percentage of the possible
permutations of a deck of cards, random.random()'s period is about 7e12, and
that's enough for an awful lot of permutations.  Do you have a test in mind
that can distinguish them from a truly random sampling?  If not, you're
probably making life needlessly difficult.

[Christopher Browne]
> ...
> The nonintuitive part is that the random range shrinks as you move
> along.  If you leave the range open, there's a bit of a bias.  It's a
> significant bias, if there are only 3 cards.  It's not nearly so
> significant with 52.
>
> You could do far worse than running through the 52 locations and, for
> each one, swap it with a randomly selected location.  That will
> certainly shuffle things about pretty decently.

[back to Christos]
> ...
> The algorithm I always[1] used for shuffling a sequence / array was the
> one you describe in your last paragraph, and I am not that sure about
> the effectiveness of the 'non-intuitive' modification, but I have not a
> reason to doubt you, apart from my programmer's stubbornness :).

Christopher is right, and if you haven't detected any bias from the
algorithm you've been using, it really doesn't matter which random number
generator you use <wink>.

Simple proof:  suppose at each of 52 positions, you swap the current
position with any of the 52.  Then you make 52 choices 52 times, so there
are 52**52 possible ways the algorithm can go.  Since 52! doesn't divide
52**52 evenly, the algorithm necessarily favors some permutations over
others.

In Christopher's variation (which the std random.shuffle implements), at the
first position you make one of 52 choices, at the second position one of 51
choices, and so on.  There are 52! possible ways the algorithm can go, and
52! does divide into that evenly.  That doesn't prove it's a correct
algorithm (see, e.g., Knuth for a proof -- it's easy), but it does show that
it's not obviously broken.

Play with this:

def shuffle(N):
    from random import randrange
    countfirst = [0] * N
    count = 0
    while 1:
        count += 1
        cards = range(N)
        # Bad algorithm!
        for i in range(N):
            j = randrange(N)
            cards[i], cards[j] = cards[j], cards[i]
        first = cards[0]
        countfirst[first] += 1
        if count % N == 0:
            print countfirst

shuffle(52)

If you pass 1 or 2, there's no bias evident in which card ends up in the
first position after shuffling.  In fact, the algorithm *is* fair for N=1
and N=2.  But for N > 2 the bias is obvious from inspection of the output,
and, contrary to Christopher's claim, it's a major bias at N=52 too.  Look
at, e.g., how often the original 2nd card ends up in the first position,
compared to how often the original last card ends up first:  it's harder for
the last card to end up first.  This isn't because of a bias in
random.randrange:  the *algorithm* simply gives the second card more chances
than the last to end up first.  OTOH, every card has an equal chance of
ending up in the last position.  The breakage is subtle that way, but it's
real and highly significant.

worry-about-the-70th-digit-after-the-1st-one-is-repaired-ly y'rs  - tim