random shuffles

danielx danielwong at berkeley.edu
Sat Jul 22 15:02:30 EDT 2006


David G. Wonnacott wrote:
> From: "danielx" <danielwong at berkeley.edu>
>    Date: 22 Jul 2006 01:43:30 -0700
>
>    Boris Borcic wrote:
>    > does
>    >
>    > x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>    >
>    > pick a random shuffle of x with uniform distribution ?
>
> ...
>
>    Let e be the element which was in the first position to begin with.
>    What are its chances of being there after being "sorted"? Well, in
>    order for it to still be there, it would have to survive N rounds of
>    selection. In each selection, it has 1/2 chance of surviving. That
>    means its total chance of survival is 1/(2**N), which is much less than
>    1/N. QED!
>
>
> This proof makes sense to me if the sorting algorithm makes a random
> decision every time it swaps.
>
> Couldn't we easily get an n*log(n) shuffle by performing a _single_
> mapping of each element in the collection to a pair made up of a
> random key and its value, and then sorting based on the keys and
> stripping them out? i.e., in O(n) time, turn
>
>    "2 clubs", "2 diamonds", "2 hearts", "2 spades", "3 clubs"
>
> into something like
>
>    [0.395, "2 clubs"], [0.158, "2 diamonds"], [0.432, "2 hearts"], [0.192, "2 spades"], [0.266, "3 clubs"]
>
> and then in O(n*log(n)) time, sort it into
>
>    [0.158, "2 diamonds"], [0.192, "2 spades"], [0.266, "3 clubs"], [0.395, "2 clubs"], [0.432, "2 hearts"]
>
> and then strip out the keys again in O(n) time?
>
> Or perhaps this has been discussed already and I missed that part? I
> just joined the list... apologies if this is a repeat.
>

Yes, that would work beautifully. You could use the key argument of
list.sort. What we are talking about though, is using the cmp argument.
I.e. will list.sort(cmp=lambda:???) be able to give you a shuffle? I
think most of us think this depends on what sort algorithm Python uses.

>
>
>    You can accomplish this by doing what I will call "random selection".
>    It would be like linear selection, except you wouldn't bother checking
>    the head against every other element. When you want to figure out what
>    to put at the head, just pick at random! I suspect this is what Python
>    does in random.shuffle, since it is rather an easy to see it would
>    result in something uniform (I swear I haven't looked at the source
>    code for random.shuffle :P).
>
>
> But, after making the linear selection, don't you still need to use
> O(log(n)) time to find and remove the item from the collection? I
> don't know much about Python's internal data structures, but if
> there's an array in there, you need O(n) time to move up everything
> after the deleted item; if there's a list, you need O(n) time to find
> the element you picked at random; if there's a balanced tree, you
> could find it and delete it in O(log(n)) time...

I'm almost sure there's a C array back there as well (well, not
techinically an array, but something from malloc), because indexing a
Python list is supposed to be "fast". It would take constant time to
put something at the head. Just swap with the thing that's already
there. Since you have to do this swap for each position, it takes time
proportional to N.

When I originally came up with this idea, I was thinking you would not
choose a new head among previously chosen heads. But if you do allow
that, I think it will still be uniform. So my original idea was
something like this:

  1  2  3  4
# ^ head pos
stage 0: we need to choose something to put in pos 0. We can choose
anything to go there.

(swap)
  3  2  1  4
#    ^ head pos
stage 1: we need to choose something to put in pos 1. We can choose
among things in positions greater than or equal to 1 ie, we may choose
among 2 1 4.

etc.

This is what I meant when I said this would be like linear selection,
because once something is in its correct position, it doesn't get
moved. But you might also be able to achieve a uniform sort if in
stages 1 and up, you are still allowed to choose anything you want to
be the head. I'll let someone else figure it out :P.

>
> Help me review my undergraduate data structures here -- is there some
> way to pick each item _exactly_once_ in O(n) time?

I think I answered this in the first segment of this post. Let me know
if I don't seem clear.

> 
> Dave Wonnacott




More information about the Python-list mailing list