[New-bugs-announce] [issue8685] set(range(100000)).difference(set()) is slow

Andrew Bennetts report at bugs.python.org
Tue May 11 11:04:19 CEST 2010


New submission from Andrew Bennetts <spiv at users.sourceforge.net>:

set.difference(s), when s is also a set, basically does::

    res = set()
    for elem in self:
        if elem not in other:
            res.add(elem)

This is wasteful when len(self) is much greater than len(other):

$ python -m timeit -s "s = set(range(100000)); sd = s.difference; empty = set()" "sd(empty)"
100 loops, best of 3: 12.8 msec per loop
$ python -m timeit -s "s = set(range(10)); sd = s.difference; empty = set()" "sd(empty)"
1000000 loops, best of 3: 1.18 usec per loop

Here's a patch that compares the lengths of self and other before that loop, and if len(self) is greater, swaps them.  The new timeit results are:

$ python -m timeit -s "s = set(range(100000)); sd = s.difference; empty = set()" "sd(empty)"
1000000 loops, best of 3: 0.289 usec per loop
$ python -m timeit -s "s = set(range(10)); sd = s.difference; empty = set()" "sd(empty)"
1000000 loops, best of 3: 0.294 usec per loop

----------
components: Interpreter Core
files: set-difference-speedup.diff
keywords: patch
messages: 105489
nosy: spiv
priority: normal
severity: normal
status: open
title: set(range(100000)).difference(set()) is slow
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file17292/set-difference-speedup.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8685>
_______________________________________


More information about the New-bugs-announce mailing list