fastest method to choose a random element

bukzor workitharder at gmail.com
Sun Jan 6 17:23:50 EST 2008


On Jan 5, 5:36 pm, c... at mailinator.com wrote:
> On Jan 5, 9:50 pm, Paul Hankin <paul.han... at gmail.com> wrote:
>
>
>
> > On Jan 5, 5:12 pm, Paul Hankin <paul.han... at gmail.com> wrote:
>
> > > On Jan 5, 4:14 pm, c... at mailinator.com wrote:
>
> > > > On Jan 5, 5:07 pm, c... at mailinator.com wrote:
>
> > > > > Hello, Paul and Arnaud.
> > > > > While I think about your answers: do you think there is any way to
> > > > > avoid shuffle?
> > > > > It may take unnecessary long on a long list most of whose elements
> > > > > have the property.
>
> > > You could generate a random shuffle of range(len(seq)) lazily, and use
> > > that to iterate over your sequence.
>
> > > import random
> > > import itertools
>
> > > def randxrange(n):
> > >     "Generate the numbers 0 to n-1 in a random order"
> > >     x = range(n)
> > >     for i in xrange(n):
> > >         r = random.randrange(n - i)
> > >         yield x[r]
> > >         x[r] = x[n - i - 1]
>
> > > def shuffled(seq):
> > >     "Generate the elements of seq in a random order"
> > >     return (seq[i] for i in randxrange(len(seq)))
>
> > > def pick_random(seq, prop):
> > >     return itertools.ifilter(prop, shuffled(seq)).next()
>
> > At the risk of filling this thread with my posts, here's a much
> > simplified and faster version of this code that uses no extra storage.
>
> > import random
>
> > def pick_random(seq, prop):
> >     L = len(seq)
> >     for i in xrange(L):
> >         r = random.randrange(L - i)
> >         if prop(seq[r]):
> >             return seq[r]
> >         seq[r] = seq[L - i - 1]
>
> > --
> > Paul Hankin
>
> This one is good. Someone commented that you destroy the list, but
> that can be fixed:
>
>  def pick_random(seq, prop):
>      L = len(seq)
>      for i in xrange(L):
>          r = random.randrange(L - i)
>          if prop(seq[r]):
>              return seq[r]
>          seq[r], seq[L - i - 1]= seq[L - i - 1],seq[r]
>
> just pushing elements without the property to the end of the list
> (list is mutated, true, but shuffle will do that too). In each
> iteration of the for loop, there is only one random element, one check
> of the property, and rebinding of elements without altering the lenght
> of the list. This is optimal and has all the functionality.
>
> Two more comments:
> for buzcor: deleting an element in the middle of a list is costly
> for martin: that is certainly enough for me
>
> Thanks everybody!

Just for fun, I profiled my answer versus the final answer over 300
seconds. I wrapped some parts of the functions in trivial functions so
they would show up in the profiling. I found that my answer was 50%
slower, but not because of the delete, but mostly the len() inside the
loop, and the bool(seq) at the start of each loop. I was able to
rewrite mine to only be 14 seconds slower (the time to make the copy
at the beginning), but don't believe that's useful enough to post.


   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
  1516221   48.545    0.000  158.850    0.000 picked.py:
14(pick_random_bukzor)
  3066421   19.359    0.000   53.815    0.000 picked.py:8(randrange2)
  3066421   16.852    0.000   24.751    0.000 picked.py:9(len2)
  1516221   14.712    0.000   14.712    0.000 picked.py:12(make_copy)
  3066421    9.767    0.000    9.767    0.000 picked.py:10(notempty)
  1550200    7.260    0.000    7.260    0.000 picked.py:7(delete)

  1516221   31.574    0.000  104.384    0.000 picked.py:
27(pick_random)
  3063737   19.556    0.000   54.617    0.000 picked.py:25(randrange3)
  1516221    8.485    0.000   12.582    0.000 picked.py:26(len3)
  1547516    5.611    0.000    5.611    0.000 picked.py:
24(do_transform)

def delete(list, index): del list[index]
def randrange2(X): return randrange(X)
def len2(seq): return len(seq)
def notempty(seq): if seq: return True
def make_copy(l): return list(l)
def pick_random_bukzor(seq, prop=bool):
    temp = make_copy(seq)
    while notempty(seq):
        i = randrange2(len2(temp))
        if prop(temp[i]): return temp[i]
        else: delete(temp, i)

def do_transform(seq, L, r, i): seq[r], seq[L - i - 1]= seq[L - i -
1],seq[r]
def randrange3(X): return randrange(X)
def len3(seq): return len(seq)
def pick_random(seq, prop=bool):
    L = len3(seq)
    for i in xrange(L):
        r = randrange3(L - i)
        if prop(seq[r]): return seq[r]
        do_transform(seq, L, r, i)



More information about the Python-list mailing list