fastest way to find the intersection of n lists of sets

Sun Apr 29 21:12:14 EDT 2007

Prateek <surekap at gmail.com> writes:
> The big set does stay around for a while - I've implemented an LRU
> based caching algorithm on the code that does the I/O. Since the db is
> transactioned, I keep one copy in the current transaction cache (which
> is a simple dictionary) and one in the main read cache (LRU based)
> (which obviously survives across transactions). Since the largest sets
> also tend to be the most frequently used, this basically solves my I/O
> caching issue.

The best approach varies from instance to instance.  Some big
databases often will do stuff like statistically sample the sets from
a given query, try a few different strategies on the samples in order
to figure out which one works best on them, then use the apparent best
strategy on the full sets.