[Python-Dev] Comparison speed

Tue, 15 May 2001 05:26:32 -0400

[Martin v. Loewis]
> Is this really exactly what Python would guarantee? I'm surprised that
> x==x would always be true, but x!=x might be true also. In a type where
> x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> I had expected that you'd reduced it to
>
>   if (v == w && op == Py_EQ) /* then return Py_True */
>   if (v == w && op == Py_NE) /* then return Py_False */

I agree that would be more analogous to what PyObject_Compare() does.

I'm not sure either make sense for rich comparisons; for example, under
IEEE-754 rules, a NaN must compare not-equal to everything, including
itself(!), and richcmps are the only hope Python users have of modeling that.
Doing those pointer checks before giving richcmps a chance would kill that
hope.  Can we agree to drop this one until somebody produces stats saying
it's important?  I have no reason to suspect that it is.

> The one application where this may help is list_contains, in
> particular when searching a list of interned strings.

string_compare() could special-case pointer equality too, although I suspect
doing so would be a net loss.

> Please have a look at the patch below.

I will, but not tonight anymore -- it's been a very long day.

> ...
> I agree "in principle" :-) However, you cannot move the case "equal
> types, implementing tp_compare" before the case "one of them
> implements tp_richcompare" without changing the semantics.

Of course.  But except for instance objects, answering "does the type
implement tp_richcompare?" is one lousy pointer check, and the answer will
usually be-- provided we don't start stuffing code into *every* object's
tp_richcompare slot! --"no, so I can go to tp_compare immediately".
Coercions and richcmps are the oddball cases today.

> The change here is what you'd do when you have both richcmp and
> oldcomp; Python clearly mandates using richcmp.

Yes, except you don't usually have both today and reality is exploitable
<wink>.

> In case this is not obvious (it wasn't to me): UserList will complain
> about using the deprecated __cmp__,

Sounds like a bug to me; if cmp is deprecated, that's also news to me.

> and dictionaries will iterate over their elements differently.

dicts didn't have a tp_richcompare slot before I added it last week, and
because dicts can do a much faster and more-general job on Py_EQ and Py_NE
than dict cmp (but on nothing else).  I originally took away the tp_compare
slot for dicts and lived to regret it -- it has both now.

> Given that richcomp has to be tried first, this patch does the "common
> case" at the earliest possible time, and with no overhead, except for
> PyErr_Occurred call.

The earliest *reasonable* time would be after a short block of new pointer
checks while still inside PyObject_RichCompare():  I believe the usual case
today is that the objects are of the same type, the type doesn't have a
tp_richcompare slot, but does have a tp_compare slot.  This covers at least
ints, floats, longs and strings, where the overhead of a single function call
is most often larger than the time it actually takes to compare the darned
things.  It's not important to, e.g., get to a dict comparison quickly,
because comparing dicts is darned expensive even after we find the dict
comparison routine.  Ditto comparing instances or matrices etc.  Optimizing
for richcmps is optimizing the less important thing.

BTW, tuples have a richcompare slot today and it's unclear that's a good
idea.  They do the same kind of Py_EQ/Py_NE "length check" you like for
strings, and I'd be surprised if that didn't cost more than it saves.  Unlike
strings, whenever I compare tuples they *always* have the same size (e.g.,
think of all the decorator pattern ways tuples are used to augment sorts).

OK, across a full run of the test suite, tuplerichcompare() was called about
162000 times, all but about 50 times with Py_EQ or Py_NE.  The number of
times this code block at the start bore fruit:

	if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) {
		/* Shortcut: if the lengths differ, the tuples differ */
		PyObject *res;
		if (op == Py_EQ)
			res = Py_False;
		else
			res = Py_True;
		Py_INCREF(res);
		return res;
	}

was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code
just burned cycles.  I want to move toward optimizations that save more than
they cost <0.7 wink>.

> ...
> For strings, I would still special-case tp_richcompare: when tracing
> calls to string_richcompare, I found that most calls with Py_EQ can
> be decided by checking that the string lengths are not equal.

I expect you'd also find that the current string_compare() usually decides
they're not equal on the first character comparison (which *it*
special-cases).  So special-casing on length isn't a clear win over what's
already done.  But, if it is, bravo!  Special-case the snot out of it without
calling *any* string functions (merely calling string_richcompare likely
costs a good deal more than comparing the lengths).

more-measuring-less-guessing-ly y'rs  - tim