fastest way to find the intersection of n lists of sets

Paul Rubin http
Sun Apr 29 21:12:14 EDT 2007


Prateek <surekap at gmail.com> writes:
> The big set does stay around for a while - I've implemented an LRU
> based caching algorithm on the code that does the I/O. Since the db is
> transactioned, I keep one copy in the current transaction cache (which
> is a simple dictionary) and one in the main read cache (LRU based)
> (which obviously survives across transactions). Since the largest sets
> also tend to be the most frequently used, this basically solves my I/O
> caching issue.

The best approach varies from instance to instance.  Some big
databases often will do stuff like statistically sample the sets from
a given query, try a few different strategies on the samples in order
to figure out which one works best on them, then use the apparent best
strategy on the full sets.



More information about the Python-list mailing list