selecting a random item from a set

Alexander Schmolck a.schmolck at gmx.net
Tue Dec 30 22:04:28 EST 2003


"Tim Peters" <tim.one at comcast.net> writes:

> [Alexander Schmolck]
> > Quite a few algortihms prominently feature choosing/removing a single
> > random element from a set at a time.
> 
> Really?  

Feature selection schemes, for example?

> > On first sight I can't think of anything better to achieve this with
> > `sets.Set` than something along the lines of:
> >
> >   for dummy, randomElement in zip(range(randrange(len(s)+1)), s): pass
> 
> It's too obscure for my tastes, 

And buggy (a misplaced paren).

> but something like that would certainly work. Probably not. I'm not
> convinced there's a real need, and given the implementation of sets there's
> no way to do it essentially more efficient than the way you've already
> figured out

> (it can be done using O(1) space instead, 

s/zip/izip/ s/range/xrange/ ?

> but it still takes O(len(s)) time -- the implementation of sets is
> lumpy, so the O(1)-time indexing trick used by random.choice() can't apply).

The complexity might be the same, but you could certainly do it with a lot
less overhead (with C implmented sets, at least). But if really not that many
people need it, that's of course not enough of a justification.

> If you think this is important enough to be worth the bother <wink>, you
> could define a hybrid data structure combining a set and a list, keeping
> them in synch.  random.choice() can then be applied to the list part to get
> a random element in constant time.  The only tricky part is deletion of an
> element.  You need another dict under the covers to map each key to its
> index in the list.  When it comes time to delete a key, you *overwrite* the
> list position it occupied with the last key in the list (and update that
> key's index), then reduce the length of the list by 1.  That keeps deletion
> constant-time (expected-case) too.

Yep, such a scheme might be interesting. I currently haven't profiled any of
the (unfinished) code yet or thought hard about it, so I don't know the
relative importance of large constant factors vs. complexity.

I'm not entirely bitten by the premature optimization bug however -- I know
that most of the time in an existing implementation of what I want to do in
another language is spent doing these set operations and takes minutes to
hours to come up with its outputs.

'as





More information about the Python-list mailing list