clarification

Sun Aug 19 12:03:53 EDT 2007

samwyse <dejanews at email.com> wrote:
   ...
> > brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
> > range(0, 100, 3)]' 'r=set()' 'for x in sos: r.update(x)'
> > 100000 loops, best of 3: 18.8 usec per loop
> > 
> > brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
> > range(0, 100, 3)]' 'r=reduce(set.union, sos, set())'    
> > 10000 loops, best of 3: 87.2 usec per loop
> > 
> > Even in a case as tiny as this one, "reduce" is taking over 4 times
> > longer than the loop with the in-place mutator -- and it only gets
> > worse, as we're talking about O(N squared) vs O(N) performance!  Indeed,
> > this is part of what makes reduce an "attractive nuisance"...;-).  [[And

The set-union case, just like the list-catenation case, is O(N squared)
(when approached in a functional way) because the intermediate result
often _grows_ [whenever a new set or list operand adds items], and thus
a new temporary value must be allocated, and the K results-so-far
"copied over" (as part of constructing the new temp value) from the
previous temporary value; and sum(range(N)) grows quadratically in N.
The in-place approach avoids that fate by a strategy of proportional
over-allocation (used by both set and lists) that makes in-place
operations such as .update(X) and .extend(X) amortized O(K) where K is
len(X).

In the set-intersection case, the intermediate result _shrinks_ rather
than growing, so the amount of data "copied over" is a diminishing
quantity at each step, and so the analysis showing quadratic behavior
for the functional approach does not hold; behavior may be roughly
linear, influenced in minor ways by accidental regularities in the sets'
structure and order (especially likely for short sequences of small
sets, as in your case).  Using a slightly longer sequence of slightly
larger sets, with little structure to each, e.g.:

random.seed(12345)  # set seed to ensure total repeatability
los=[set(random.sample(range(1000), 990)) for x in range(200)]

at the end of the setup (the intersection of these 200 sets happens to
contain 132 items), I measure (as usual on my 18-months-old Macbook Pro
laptop):

stmt = 'reduce(set.intersection,los)'
best of 3: 1.66e+04 usec per loop
stmt = 'intersect_all(los)'
best of 3: 1.66e+04 usec per loop

and occasionally 1.65 or 1.67 instead of 1.66 for either or both,
whether with 10,000 or 100,000 loops.  (Not sure whether your
observations about the reduce-based approach becoming faster with more
loops may be due to anomalies in Windows' scheduler, or the accidental
regularities mentioned above; my timings are probably more regular since
I have two cores, one of which probably ends up dedicated to whatever
task I'm benchmarking while the other one runs all "background" stuff).

> turn indicates that both implementations actually work about same and
> your "O(n squared)" argument is irrelevant.

It's indeed irrelevant when the behavior _isn't_ quadratic (as in the
case of intersections) -- but unfortunately it _is_ needlessly quadratic
in most interesting cases involving containers (catenation of sequences,
union of sets, merging of dictionaries, merging of priority-queues,
...), because in those cases the intermediate temporary values tend to
grow, as I tried to explain in more detail above.

Alex