selecting a random item from a set

Alexander Schmolck a.schmolck at gmx.net
Tue Jan 6 23:23:02 EST 2004


[sorry for the late reply the server here had been down a couple of days and
then I was busy]

"Tim Peters" <tim.one at comcast.net> writes:

> [Tim, asks which algorithms require random selection, as opposed
>  to arbitrary selection, from a set]
> 
> [Alexander Schmolck]
> > Feature selection schemes, for example?
> 
> Eh?  That's like "numerical methods" <wink/frown>.  IOW, too vague.

Well, pretty much any scheme where the feature space is to large to explore
exhaustively, IOW from forward-backward feature selection to markov chain
monte carlo. If that's still to vague, I'd be happy to provide more detail.

> You can do it in O(1) space now at essentially full C speed via, e.g.,
> 
>     i = random.randrange(len(some_set))
>     random_elt = itertools.islice(some_set, i, None).next()
>     some_set.remove(random_elt)
> 
> That tricks it into finding the starting point with a C loop, and then the
> iterator is discarded after extracting its first element.

Yes, this is a much better (more readable and efficient)

 
> But (of course) that still takes average time proportional to the # of
> elements in the set.  Well, that wishes away a detail:  this can actually be
> arbitrarily expensive, as there's no bound on how sparse a Python dict can
> get, and the C dict iteration code has to visit every slot, occupied or not,
> to find "the next" occupied slot.  

Sure.
 
> If the sets are small, you're not going to beat random.choice(tuple(s)).

Yep, I had a vague feeling this might be the case and because it's also simple
and clear, so that's in fact what I used (almost -- I did list(s), which ought
to be only marginally slower I would think).

> If the sets are large, low-level optimization of an approach with the wrong
> O() behavior is just a monumentally bad idea. 

Unlikely, for the current application they should be < 100 elements. So I
guess I shall stick to the tuple-cast for the time being. 

Thanks -- this has been quite informative (and the other suggestions you made
might still come in handy at some point).

'as






More information about the Python-list mailing list