fastest way to find the intersection of n lists of sets

Prateek surekap at gmail.com
Sun Apr 29 19:34:27 EDT 2007


>      For the above example, it's worth sorting lists_of_sets by the
> length of the sets, and doing the short ones first.

Thanks. I thought so - I'm doing just that using a simple Decorate-
Sort-Undecorate idiom.

>      How big are the sets?  If they're small, but you have a lot of
> them, you may be better off with a bit-set representation, then
> using AND operations for intersection.  If they're huge (tens of millions
> of entries), you might be better off doing sorts and merges on the
> sets.

I have either 2 or 3 sets (never more) which can be arbitrarily large.
Most of them are small (between 0 and few elements - say less that 5).
A few will be megamonstrous ( > 100,000 items)

>      When you ask questions like this, it's helpful to give some
> background.  We don't know whether this is a homework assignment, or
> some massive application that's slow and you need to fix it, even
> if it requires heavy implementation effort.
>
Its definitely not a homework assignment - its part of a commercial
database query engine. Heavy implementation effort is no problem.

Prateek





More information about the Python-list mailing list