[Python-ideas] set.add(x) slower than if x in set:set.add(x)

Mon Sep 14 16:19:06 CEST 2009

Interesting discussion here!  I did a little analysis:

let a = cost of "item in set"
let b = cost of set.add
let c = cost of sadd = set.add

if I do x+y additions to my set where x is number of items already in
the set and y is the number not in the set, using my original approach
I get a cost of:

   a.(x+y) + b.y     # we always test for set membership, but only add new items

for the "if item not in set: set.add(item)" approach, or

   a.(x+y) + c.y,  if I substitute sadd for set.add

If I always add the item to the set without checking first, the cost is

  b.(x+y)   (or c.(x+y) using sadd)

assuming that set.add() takes close to the same time whether the item
is in the set or not (in reality I suppose that it needs to play with
a few pointers if the item is not in the set and may need to rebalance
the tree, if it is a red-black tree or something similar -- what is
it, actually?)

That means that break-even should happen when:

a.(x+y) +b.y = b.(x+y)

ax + ay + by = bx + by

ay = (b-a)x

y = ((b-a)/a)x

plugging in the numbers from earlier in this thread, we have:

a = .276 uS
b = .489 uS
c = .298uS

(b-a)/a = .772
(c-a)/a = .08

so the approach "if item not in set: set.add(item)" wins when the
number of items not added to the set (thus failing the "if" statement)
 >= 23% of those that are added.  Setting sadd = set.add beforehand
increases that break-even point to about 92% (almost no advantage).

Lesson learned: If you know (or can reasonably test) your sample data,
you can choose the better method.  If your data is unknown (random
from your viewpoint), you could assume that about half the items to be
added are already in the set, which is < 92%, so there is no point in
doing the "if item in set" test beforehand.

YMMV

On Mon, Sep 14, 2009 at 9:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Yuvgoog Greenle wrote:
>> So this pattern is a valid python optimization? Funky...
>>
>> Sadly, there's no way around it unless the interpreter somehow did it
>> magically for you.
>
> The interpreter has no way of knowing a priori that the branch won't be
> taken 999,999 times out of 1,000,000.
>
> Think about what you are actually comparing here:
>
> Average speed of 1 million calls to s.add(x)
>
> Average speed of 1 million calls to "x in s" + one call to s.add(x)
>
> Now think about the fact that s.add(x) includes a containment test plus
> function call and name lookup overhead even in the case that the item is
> already present in the set.
>
> All overhead included:
> $ python -m timeit -s "s = set()" "s.add(1)"
> 1000000 loops, best of 3: 0.197 usec per loop
>
> Lose the attribute lookup:
> $ python -m timeit -s "s = set()" -s "sadd = s.add" "sadd(1)"
> 10000000 loops, best of 3: 0.146 usec per loop
>
> Skip the function call altogether most of the time:
> $ python -m timeit -s "s = set()" "if 1 not in s: s.add(1)"
> 10000000 loops, best of 3: 0.101 usec per loop
>
> Just do the containment test:
> $ python -m timeit -s "s = set([1])" "1 not in s"
> 10000000 loops, best of 3: 0.1 usec per loop
>
> Now then, lets also look at the absolute numbers we're discussing here
> (on my machine, anyway). Is the fastest version twice as fast as the
> slowest version? Yes it is. But that difference is only 97
> *nano*seconds. And relative to the recommended approach of caching the
> attribute looking, we're only saving 45 nanoseconds.
>
> And in more realistic use cases where some items are already in the set
> and some aren't, the "I'll check first" implementation can become a
> pessimisation instead of an optimisation. To emphasise the point, we'll
> go to the other extreme where the item is added to the set every time:
>
> All the overhead:
> $ python -m timeit -s "s = set()" "s.add(1)" "s.clear()"
> 1000000 loops, best of 3: 0.374 usec per loop
>
> The "optimised" approach:
> $ python -m timeit -s "s = set()" "if 1 not in s: s.add(1)" "s.clear()"
> 1000000 loops, best of 3: 0.444 usec per loop
>
> Oops, looks like the approach that saves us 45 nanoseconds when the item
> is already in the set may cost us up to *70* nanoseconds when it turns
> out we need to add the item after all.
>
> Caching attribute lookups before time critical loops is a good
> optimisation technique that most experienced Python programmers learn.
> Pre-checking a condition that a called function is just going to check
> again and bail out quickly in less than 50% of cases? Usually a bad idea
> - the extra checks made when the function is invoked anyway will often
> cancel out any gains you make from avoiding the function call overhead.
>
> That said, as with any micro-optimisation: time it on a range of typical
> and end-case data sets. If you avoid the function call overhead often
> enough, doing a pre-check may be a net win for some inner loops.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>

-- 
Gerald Britton