selecting a random item from a set
Alexander Schmolck
a.schmolck at gmx.net
Tue Dec 30 22:04:28 EST 2003
"Tim Peters" <tim.one at comcast.net> writes:
> [Alexander Schmolck]
> > Quite a few algortihms prominently feature choosing/removing a single
> > random element from a set at a time.
>
> Really?
Feature selection schemes, for example?
> > On first sight I can't think of anything better to achieve this with
> > `sets.Set` than something along the lines of:
> >
> > for dummy, randomElement in zip(range(randrange(len(s)+1)), s): pass
>
> It's too obscure for my tastes,
And buggy (a misplaced paren).
> but something like that would certainly work. Probably not. I'm not
> convinced there's a real need, and given the implementation of sets there's
> no way to do it essentially more efficient than the way you've already
> figured out
> (it can be done using O(1) space instead,
s/zip/izip/ s/range/xrange/ ?
> but it still takes O(len(s)) time -- the implementation of sets is
> lumpy, so the O(1)-time indexing trick used by random.choice() can't apply).
The complexity might be the same, but you could certainly do it with a lot
less overhead (with C implmented sets, at least). But if really not that many
people need it, that's of course not enough of a justification.
> If you think this is important enough to be worth the bother <wink>, you
> could define a hybrid data structure combining a set and a list, keeping
> them in synch. random.choice() can then be applied to the list part to get
> a random element in constant time. The only tricky part is deletion of an
> element. You need another dict under the covers to map each key to its
> index in the list. When it comes time to delete a key, you *overwrite* the
> list position it occupied with the last key in the list (and update that
> key's index), then reduce the length of the list by 1. That keeps deletion
> constant-time (expected-case) too.
Yep, such a scheme might be interesting. I currently haven't profiled any of
the (unfinished) code yet or thought hard about it, so I don't know the
relative importance of large constant factors vs. complexity.
I'm not entirely bitten by the premature optimization bug however -- I know
that most of the time in an existing implementation of what I want to do in
another language is spent doing these set operations and takes minutes to
hours to come up with its outputs.
'as
More information about the Python-list
mailing list