fastest method to choose a random element

caca at mailinator.com caca at mailinator.com
Sat Jan 5 20:36:06 EST 2008


On Jan 5, 9:50 pm, Paul Hankin <paul.han... at gmail.com> wrote:
> On Jan 5, 5:12 pm, Paul Hankin <paul.han... at gmail.com> wrote:
>
>
>
> > On Jan 5, 4:14 pm, c... at mailinator.com wrote:
>
> > > On Jan 5, 5:07 pm, c... at mailinator.com wrote:
>
> > > > Hello, Paul and Arnaud.
> > > > While I think about your answers: do you think there is any way to
> > > > avoid shuffle?
> > > > It may take unnecessary long on a long list most of whose elements
> > > > have the property.
>
> > You could generate a random shuffle of range(len(seq)) lazily, and use
> > that to iterate over your sequence.
>
> > import random
> > import itertools
>
> > def randxrange(n):
> >     "Generate the numbers 0 to n-1 in a random order"
> >     x = range(n)
> >     for i in xrange(n):
> >         r = random.randrange(n - i)
> >         yield x[r]
> >         x[r] = x[n - i - 1]
>
> > def shuffled(seq):
> >     "Generate the elements of seq in a random order"
> >     return (seq[i] for i in randxrange(len(seq)))
>
> > def pick_random(seq, prop):
> >     return itertools.ifilter(prop, shuffled(seq)).next()
>
> At the risk of filling this thread with my posts, here's a much
> simplified and faster version of this code that uses no extra storage.
>
> import random
>
> def pick_random(seq, prop):
>     L = len(seq)
>     for i in xrange(L):
>         r = random.randrange(L - i)
>         if prop(seq[r]):
>             return seq[r]
>         seq[r] = seq[L - i - 1]
>
> --
> Paul Hankin

This one is good. Someone commented that you destroy the list, but
that can be fixed:

 def pick_random(seq, prop):
     L = len(seq)
     for i in xrange(L):
         r = random.randrange(L - i)
         if prop(seq[r]):
             return seq[r]
         seq[r], seq[L - i - 1]= seq[L - i - 1],seq[r]

just pushing elements without the property to the end of the list
(list is mutated, true, but shuffle will do that too). In each
iteration of the for loop, there is only one random element, one check
of the property, and rebinding of elements without altering the lenght
of the list. This is optimal and has all the functionality.

Two more comments:
for buzcor: deleting an element in the middle of a list is costly
for martin: that is certainly enough for me

Thanks everybody!



More information about the Python-list mailing list