fastest way to find the intersection of n lists of sets

Prateek surekap at gmail.com
Mon Apr 30 04:19:34 EDT 2007


On Apr 30, 12:37 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> [Prateek]
>
> > The reason why I'm casting to a list first is because I found that
> > creating a long list which I convert to a set in a single operation is
> > faster (although probably less memory efficient - which I can deal
> > with) than doing all the unions.
>
> That would be a surprising result because set-to-set operations do
> not have to recompute hash values.  Also, underneath-the-hood,
> both approaches share the exact same implementation for inserting
> new values one the hash value is known.
>
> If you've seen an unfavorable speed comparison, then you most likely
> had code that built new intermediate sets between step:
>
>    common = s1 | s2 | s3 | s4 | s5 | s6 | s7 | s8 | s9
>
> Instead, it is faster to build-up a single result set:
>
>     common = set()
>     for s in s1, s2, s3, s4, s5, s6, s7, s8, s9:
>         common |= s
>
> Raymond Hettinger

Thanks Raymond,

This was my old code:
self.lv is a dictionary which retrieves data from the disk or cache
v_r = reduce(operator.or_, [self.lv[x.id] for x in v], set())

This code ran faster:
v_r = reduce(operator.add, [list(self.lv.get(x.id, [])) for x in v],
[])
v_r = set(v_r)

I was doing 3 of these and then intersecting them.

Now, I'm doing...
v_r = set()
_efs = frozenset()
for y in [self.lv.get(x.id, _efs) for x in v]:
    v_r |= y

Since the number of sets is always 2 or 3, I just do the intersections
explicitly like so:
if len(list_of_unioned_sets) == 3:
    result = list_of_unioned_sets[0]
    result &= list_of_unioned_sets[1]
    result &= list_of_unioned_sets[2]
elif len(list_of_unioned_sets) == 2:
    result = list_of_unioned_sets[0]
    result &= list_of_unioned_sets[1]
else:
   # Do something else...

Sorry for the relatively non-descript variable names.
Prateek




More information about the Python-list mailing list