clarification

Alex Martelli aleax at mac.com
Sat Aug 18 12:26:10 EDT 2007


samwyse <dejanews at email.com> wrote:
   ...
> Finally, does anyone familar with P3K know how best to do the reduction
> without using 'reduce'?  Right now, sets don't support the 'add' and 
> 'multiply' operators, so 'sum' and (the currently ficticious) 'product'
> won't work at all; while 'any' and 'all' don't work as one might hope.
> Are there an 'intersection' and 'union' built-ins anywhere?

For intersection and union of a sequence of sets, I'd use:

def union_all(sos):
    result = set()
    for s in sos: result.update(s)
    return result

def intersect_all(sos):
    it = iter(sos)
    result = set(it.next())
    for s in it: result.intersection_update(s)
    return result

The latter will raise an exception if sos is empty -- I don't think the
"intersection of no sets at all" has a single natural interpretation
(while the "union of no sets at all" appears to be naturally interpreted
as an empty set)... if you disagree, just wrap a try/except around the
initialization of result, and return whatever in the except clause.

Of course, hoisting the unbound method out of the loops can afford the
usual small optimization.  But my point is that, in Python, these
operations (like, say, the concatenation of a sequence of lists, etc)
are best performed "in place" via loops calling mutator methods such as
update and intersection_update (or a list's extend, etc), rather than
"functionally" (building and tossing away many intermediate results).
E.g., consider:

brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
range(0, 100, 3)]' 'r=set()' 'for x in sos: r.update(x)'
100000 loops, best of 3: 18.8 usec per loop

brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
range(0, 100, 3)]' 'r=reduce(set.union, sos, set())'    
10000 loops, best of 3: 87.2 usec per loop

Even in a case as tiny as this one, "reduce" is taking over 4 times
longer than the loop with the in-place mutator -- and it only gets
worse, as we're talking about O(N squared) vs O(N) performance!  Indeed,
this is part of what makes reduce an "attractive nuisance"...;-).  [[And
so is sum, if used OTHERWISE than for the documented purpose, computing
"the sum of a sequence of numbers": a loop with r.extend is similarly
faster, to concatenate a sequence of lists, when compared to sum(sol,
[])...!!!]]


Alex



More information about the Python-list mailing list