[issue8425] a -= b should be fast if a is a small set and b is a large set
Alexander Belopolsky
report at bugs.python.org
Mon May 10 19:44:37 CEST 2010
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
The problem is apparently due to the fact that small_set -= large_set
iterates over the large set removing entries from small_set while more
efficient small_set - large_set builds a new set iterating over a
small_set and adding items that are not in the large set. Since
lookups are O(1), it is more efficient to iterate over a small set
while looking up a large set than the other way around. I am
attaching a minimalist patch which gives up in-place update for s1 -=
s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
The code can be further optimized by replicating set_difference logic
in set_isub instead of relying on fall back behavior, but the big
question here with whether it is acceptable to give up preserving set
identity in in-place subtract to gain performance.
----------
keywords: +patch
Added file: http://bugs.python.org/file17282/issue8425.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8425>
_______________________________________
-------------- next part --------------
Index: Objects/setobject.c
===================================================================
--- Objects/setobject.c (revision 81048)
+++ Objects/setobject.c (working copy)
@@ -1612,7 +1612,10 @@
static PyObject *
set_isub(PySetObject *so, PyObject *other)
{
- if (!PyAnySet_Check(other)) {
+ if (!PyAnySet_Check(other) ||
+ /* Fall back to s = s - o if len(s) < len(o) to
+ avoid interation over a large set. */
+ PySet_GET_SIZE(so) < PySet_GET_SIZE(other)) {
Py_INCREF(Py_NotImplemented);
return Py_NotImplemented;
}
More information about the Python-bugs-list
mailing list