Is there such a beast as a "perfect" shuffle? :)

Mon May 13 03:43:45 EDT 2002

On Sun, 12 May 2002 03:00:27 -0400, rumours say that Tim Peters
<tim.one at comcast.net> might have written:

>> http://www.faqts.com/knowledge_base/view.phtml/aid/4406/fid/546
>
>[Christos Georgiou]
>> This seems to be what I want, as long as it works.  Thanks, tim_one.
>
>How will you know whether it works <0.1 wink>?  While the std
>random.shuffle() can only get at a miniscule percentage of the possible
>permutations of a deck of cards, random.random()'s period is about 7e12, and
>that's enough for an awful lot of permutations.  Do you have a test in mind
>that can distinguish them from a truly random sampling?  If not, you're
>probably making life needlessly difficult.

Well, I know it works because you say it does <wink>.
Seriously now, you surely got a point.  And having tested quite
thoroughly (I think) during the weekend, it seems indeed that I am
splitting hairs.

>[Christopher Browne]
>> ...
>> The nonintuitive part is that the random range shrinks as you move
>> along.  If you leave the range open, there's a bit of a bias.  It's a
>> significant bias, if there are only 3 cards.  It's not nearly so
>> significant with 52.
>>
>> You could do far worse than running through the 52 locations and, for
>> each one, swap it with a randomly selected location.  That will
>> certainly shuffle things about pretty decently.
>
>[back to Christos]
>> ...
>> The algorithm I always[1] used for shuffling a sequence / array was the
>> one you describe in your last paragraph, and I am not that sure about
>> the effectiveness of the 'non-intuitive' modification, but I have not a
>> reason to doubt you, apart from my programmer's stubbornness :).
>
>Christopher is right, and if you haven't detected any bias from the
>algorithm you've been using, it really doesn't matter which random number
>generator you use <wink>.

Ouch!  :)

>Simple proof:  suppose at each of 52 positions, you swap the current
>position with any of the 52.  Then you make 52 choices 52 times, so there
>are 52**52 possible ways the algorithm can go.  Since 52! doesn't divide
>52**52 evenly, the algorithm necessarily favors some permutations over
>others.

>In Christopher's variation (which the std random.shuffle implements), at the
>first position you make one of 52 choices, at the second position one of 51
>choices, and so on.  There are 52! possible ways the algorithm can go, and
>52! does divide into that evenly.  That doesn't prove it's a correct
>algorithm (see, e.g., Knuth for a proof -- it's easy), but it does show that
>it's not obviously broken.
>
>Play with this:
>
>def shuffle(N):
>    from random import randrange
>    countfirst = [0] * N
>    count = 0
>    while 1:
>        count += 1
>        cards = range(N)
>        # Bad algorithm!
>        for i in range(N):
>            j = randrange(N)
>            cards[i], cards[j] = cards[j], cards[i]
>        first = cards[0]
>        countfirst[first] += 1
>        if count % N == 0:
>            print countfirst
>
>shuffle(52)
>
>If you pass 1 or 2, there's no bias evident in which card ends up in the
>first position after shuffling.  In fact, the algorithm *is* fair for N=1
>and N=2.  But for N > 2 the bias is obvious from inspection of the output,
>and, contrary to Christopher's claim, it's a major bias at N=52 too.  Look
>at, e.g., how often the original 2nd card ends up in the first position,
>compared to how often the original last card ends up first:  it's harder for
>the last card to end up first.  This isn't because of a bias in
>random.randrange:  the *algorithm* simply gives the second card more chances
>than the last to end up first.  OTOH, every card has an equal chance of
>ending up in the last position.  The breakage is subtle that way, but it's
>real and highly significant.

Yep, you surely made your point by logic and by code; I agree that
Christopher is right, after checking the output of your program with
both algorithms.  I will forget about the 70th digit and go on with the
13th, it's fine.

Anyway, late thanks to all posting in this thread, especially:
Christopher for offering an improvement to my algorithm
Tim for taking me by the hand
Terry for doing some homework for me
-- 
TZOTZIOY, I speak England very best,
Real email address: 'dHpvdEBzaWwtdGVjLmdy\n'.decode('base64')