fastest way to find the intersection of n lists of sets
Prateek
surekap at gmail.com
Mon Apr 30 04:19:34 EDT 2007
On Apr 30, 12:37 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> [Prateek]
>
> > The reason why I'm casting to a list first is because I found that
> > creating a long list which I convert to a set in a single operation is
> > faster (although probably less memory efficient - which I can deal
> > with) than doing all the unions.
>
> That would be a surprising result because set-to-set operations do
> not have to recompute hash values. Also, underneath-the-hood,
> both approaches share the exact same implementation for inserting
> new values one the hash value is known.
>
> If you've seen an unfavorable speed comparison, then you most likely
> had code that built new intermediate sets between step:
>
> common = s1 | s2 | s3 | s4 | s5 | s6 | s7 | s8 | s9
>
> Instead, it is faster to build-up a single result set:
>
> common = set()
> for s in s1, s2, s3, s4, s5, s6, s7, s8, s9:
> common |= s
>
> Raymond Hettinger
Thanks Raymond,
This was my old code:
self.lv is a dictionary which retrieves data from the disk or cache
v_r = reduce(operator.or_, [self.lv[x.id] for x in v], set())
This code ran faster:
v_r = reduce(operator.add, [list(self.lv.get(x.id, [])) for x in v],
[])
v_r = set(v_r)
I was doing 3 of these and then intersecting them.
Now, I'm doing...
v_r = set()
_efs = frozenset()
for y in [self.lv.get(x.id, _efs) for x in v]:
v_r |= y
Since the number of sets is always 2 or 3, I just do the intersections
explicitly like so:
if len(list_of_unioned_sets) == 3:
result = list_of_unioned_sets[0]
result &= list_of_unioned_sets[1]
result &= list_of_unioned_sets[2]
elif len(list_of_unioned_sets) == 2:
result = list_of_unioned_sets[0]
result &= list_of_unioned_sets[1]
else:
# Do something else...
Sorry for the relatively non-descript variable names.
Prateek
More information about the Python-list
mailing list