[Python-Dev] Comparison speed

Tue, 15 May 2001 08:44:23 +0200

> Applying that to the above leaves you with nothing but
> 
>    if (v == w && op == Py_EQ) /* then return Py_True */
> 
> [...] So I don't see much future in that.

Is this really exactly what Python would guarantee? I'm surprised that
x==x would always be true, but x!=x might be true also. In a type where
x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
I had expected that you'd reduced it to

  if (v == w && op == Py_EQ) /* then return Py_True */
  if (v == w && op == Py_NE) /* then return Py_False */

The one application where this may help is list_contains, in
particular when searching a list of interned strings.

> You got an 8% speedup for one type by tricking the switch stmt into
> appearing 3 calls earlier.  What if the implementation were smarter,
> and did it for *all* relevant types even a call or two before that?

Please have a look at the patch below. Since I made a CVS update since
yesterday, I had to readjust the baseline results:

0.790
0.780
0.770
0.780
0.780
0.790
0.780
0.790
0.790
0.790

The patch moves the case "equal types, supporting cmp" to somewhat
earlier, just after the attempt to do richcompare. Now I get

0.760
0.770
0.750
0.770
0.750
0.750
0.760
0.760
0.760
0.760

So while there is some saving, this is not as good as implementing
richcompare.

> I don't see any reason "in principle" that compares couldn't be much
> faster, and via the usual gimmicks: bigger, smarter functions that
> remember what they've already determined so don't need to figure it
> out over and over again, and fast paths to favor common cases at the
> expense of comparisons from Mars.

I agree "in principle" :-) However, you cannot move the case "equal
types, implementing tp_compare" before the case "one of them
implements tp_richcompare" without changing the semantics. 

The change here is what you'd do when you have both richcmp and
oldcomp; Python clearly mandates using richcmp. In case this is not
obvious (it wasn't to me): UserList will complain about using the
deprecated __cmp__, and dictionaries will iterate over their elements
differently.

Given that richcomp has to be tried first, this patch does the "common
case" at the earliest possible time, and with no overhead, except for
PyErr_Occurred call.

So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT
TP_RICHCOMPARE (sorry for shouting). If you think the extra work for
type implementors is not acceptable, we can offer a convenience
function that everybody implementing tp_compare can put into
tp_richcompare. For strings, I would still special-case
tp_richcompare: when tracing calls to string_richcompare, I found that
most calls with Py_EQ can be decided by checking that the string
lengths are not equal. This is all "bigger, faster functions" put to
work.

Regards,
Martin

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -u -r2.131 object.c

--- object.c	2001/05/11 03:36:45	2.131
+++ object.c	2001/05/15 06:16:53
@@ -477,16 +477,6 @@
 	if (PyInstance_Check(w))
 		return (*w->ob_type->tp_compare)(v, w);
 
-	/* If the types are equal, don't bother with coercions etc. */
-	if (v->ob_type == w->ob_type) {
-		if ((f = v->ob_type->tp_compare) == NULL)
-			return 2;
-		c = (*f)(v, w);
-		if (PyErr_Occurred())
-			return -2;
-		return c < 0 ? -1 : c > 0 ? 1 : 0;
-	}
-
 	/* Try coercion; if it fails, give up */
 	c = PyNumber_CoerceEx(&v, &w);
 	if (c < 0)
@@ -590,15 +580,21 @@
    -1 if v < w;
     0 if v == w;
     1 if v > w;
+   If the object implements a tp_compare function, it returns
+   whatever this function returns (whether with an exception or not).
 */
 static int
 do_cmp(PyObject *v, PyObject *w)
 {
 	int c;
+	cmpfunc f;
 
 	c = try_rich_to_3way_compare(v, w);
 	if (c < 2)
 		return c;
+	if (v->ob_type == w->ob_type
+	    && (f = v->ob_type->tp_compare) != NULL)
+		return (*f)(v, w);
 	c = try_3way_compare(v, w);
 	if (c < 2)
 		return c;
@@ -760,16 +756,9 @@
 }
 
 static PyObject *
-try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+convert_3way_to_object(int op, int c)
 {
-	int c;
 	PyObject *result;
-
-	c = try_3way_compare(v, w);
-	if (c >= 2)
-		c = default_3way_compare(v, w);
-	if (c <= -2)
-		return NULL;
 	switch (op) {
 	case Py_LT: c = c <  0; break;
 	case Py_LE: c = c <= 0; break;
@@ -782,16 +771,46 @@
 	Py_INCREF(result);
 	return result;
 }
+	
 
 static PyObject *
+try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+{
+	int c;
+
+	c = try_3way_compare(v, w);
+	if (c >= 2)
+		c = default_3way_compare(v, w);
+	if (c <= -2)
+		return NULL;
+	return convert_3way_to_object(op, c);
+}
+
+static PyObject *
 do_richcmp(PyObject *v, PyObject *w, int op)
 {
 	PyObject *res;
+	cmpfunc f;
 
+
 	res = try_rich_compare(v, w, op);
 	if (res != Py_NotImplemented)
 		return res;
 	Py_DECREF(res);
+
+	/* If the types are equal, don't bother with coercions etc. 
+	   Instances are special-cased in try_3way_compare, since
+	   a result of 2 does *not* mean one value being greater
+	   than the other. */
+	if (v->ob_type == w->ob_type
+	    && !PyInstance_Check(v)
+	    && (f = v->ob_type->tp_compare) != NULL) {
+		int c;
+		c = (*f)(v, w);
+		if (PyErr_Occurred())
+			return NULL;
+		return convert_3way_to_object(op, c);
+	}
 
 	return try_3way_to_rich_compare(v, w, op);
 }