random shuffles

Sat Jul 22 08:03:24 EDT 2006

   From: "danielx" <danielwong at berkeley.edu>
   Date: 22 Jul 2006 01:43:30 -0700

   Boris Borcic wrote:
   > does
   >
   > x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
   >
   > pick a random shuffle of x with uniform distribution ?

...

   Let e be the element which was in the first position to begin with.
   What are its chances of being there after being "sorted"? Well, in
   order for it to still be there, it would have to survive N rounds of
   selection. In each selection, it has 1/2 chance of surviving. That
   means its total chance of survival is 1/(2**N), which is much less than
   1/N. QED!

This proof makes sense to me if the sorting algorithm makes a random
decision every time it swaps.

Couldn't we easily get an n*log(n) shuffle by performing a _single_
mapping of each element in the collection to a pair made up of a
random key and its value, and then sorting based on the keys and
stripping them out? i.e., in O(n) time, turn

   "2 clubs", "2 diamonds", "2 hearts", "2 spades", "3 clubs"

into something like

   [0.395, "2 clubs"], [0.158, "2 diamonds"], [0.432, "2 hearts"], [0.192, "2 spades"], [0.266, "3 clubs"]

and then in O(n*log(n)) time, sort it into

   [0.158, "2 diamonds"], [0.192, "2 spades"], [0.266, "3 clubs"], [0.395, "2 clubs"], [0.432, "2 hearts"]

and then strip out the keys again in O(n) time?

Or perhaps this has been discussed already and I missed that part? I
just joined the list... apologies if this is a repeat.

   You can accomplish this by doing what I will call "random selection".
   It would be like linear selection, except you wouldn't bother checking
   the head against every other element. When you want to figure out what
   to put at the head, just pick at random! I suspect this is what Python
   does in random.shuffle, since it is rather an easy to see it would
   result in something uniform (I swear I haven't looked at the source
   code for random.shuffle :P).

But, after making the linear selection, don't you still need to use
O(log(n)) time to find and remove the item from the collection? I
don't know much about Python's internal data structures, but if
there's an array in there, you need O(n) time to move up everything
after the deleted item; if there's a list, you need O(n) time to find
the element you picked at random; if there's a balanced tree, you
could find it and delete it in O(log(n)) time...

Help me review my undergraduate data structures here -- is there some
way to pick each item _exactly_once_ in O(n) time?

Dave Wonnacott