[issue8425] a -= b should be fast if a is a small set and b is a large set

Alexander Belopolsky report at bugs.python.org
Mon May 10 19:44:37 CEST 2010


Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

The problem is apparently due to the fact that small_set -= large_set
iterates over the large set removing entries from small_set while more
efficient small_set - large_set builds a new set iterating over a
small_set and adding items that are not in the large set.  Since
lookups are O(1), it is more efficient to iterate over a small set
while looking up a large set than the other way around.  I am
attaching a minimalist patch which gives up in-place update for s1 -=
s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
 The code can be further optimized by replicating set_difference logic
in set_isub instead of relying on fall back behavior, but the big
question here  with whether it is acceptable to give up preserving set
identity in in-place subtract to gain performance.

----------
keywords: +patch
Added file: http://bugs.python.org/file17282/issue8425.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8425>
_______________________________________
-------------- next part --------------
Index: Objects/setobject.c
===================================================================
--- Objects/setobject.c	(revision 81048)
+++ Objects/setobject.c	(working copy)
@@ -1612,7 +1612,10 @@
 static PyObject *
 set_isub(PySetObject *so, PyObject *other)
 {
-    if (!PyAnySet_Check(other)) {
+    if (!PyAnySet_Check(other) ||
+	/* Fall back to s = s - o if len(s) < len(o) to
+	   avoid interation over a large set. */
+	PySet_GET_SIZE(so) < PySet_GET_SIZE(other)) {
         Py_INCREF(Py_NotImplemented);
         return Py_NotImplemented;
     }


More information about the Python-bugs-list mailing list