how to convert code that uses cmp to python3

Steven D'Aprano steve at pearwood.info
Fri Apr 8 07:56:39 EDT 2016


On Fri, 8 Apr 2016 05:35 pm, Antoon Pardon wrote:

> Op 08-04-16 om 00:21 schreef Chris Angelico:
>> On Fri, Apr 8, 2016 at 6:56 AM, Antoon Pardon
>> <antoon.pardon at rece.vub.ac.be> wrote:
>>> That solution will mean I will have to do about 100% more comparisons
>>> than previously.
>> Try it regardless. You'll probably find that performance is fine.
>> Don't prematurely optimize!
>>
>> ChrisA
> 
> But it was already working and optimized. The python3 approach forces
> me to make changes to working code and make the performance worse.


What exactly is the problem here? Is it just that the built-in "cmp"
function is gone? Then define your own:

def cmp(a, b):
    """Return negative if a<b, zero if a==b, positive if a>b."""
    return (b < a) - (a < b)


That's pretty much how it works in terms of Python operators. It may be very
slightly different in some corner cases, but you may not notice unless
you're using some weird objects.

If that's not good enough, you can copy the code from the built-in cmp from
the 2.7 release and make a C extension. Or just duplicate the built-in cmp
semantics even more closely. To do that, you have to look at the C code.

In Python 2.7, the built-in cmp is implemented as PyObject_Cmp.

https://hg.python.org/cpython/file/2.7/Python/bltinmodule.c

PyObject_Cmp does some error-checking, then calls PyObject_Compare:

https://hg.python.org/cpython/file/2.7/Objects/abstract.c

PyObject_Compare does some error-checking, then it checks for object
identity (as an optimization), then calls do_cmp:

https://hg.python.org/cpython/file/2.7/Objects/object.c

do_cmp has a bunch of logic to decide whether to use rich comparisons or the
legacy __cmp__ method, and then calls one of a bunch of functions. If you
care, I recommend that you read them yourself, because I'm not fluent with
C. But as near as I can tell, the logic is basically:

(1) if both arguments a and b are the same, and they define __cmp__, 
    then return the result of type(a).__cmp__(a, b);

(2) otherwise try in this order a == b, a < b, a > b, and return
    the appropriate value for the first which succeeds (if any);

(3) otherwise (there are no comparison operators defined at all,
    so if a and b are the same type, return 
    cmp(address of a, address of b) (yes, this is insane);

(4) if they're different types, there are a bunch of arbitrary rules 
    that decide which object comes first, all subject to change 
    without warning:

- None is smaller than everything;
- numbers are smaller than other things;
- otherwise compare type names;
- if the type names happen to be the same, or if the numeric values
  are incompatible, then compare the object addresses.


I've simplified a lot: there's extra handling for classic classes, and
warnings if __cmp__ doesn't return -1, 0 or 1, and lots of error checking.
In a nutshell, cmp in Python 2 is a tangled mess. I'm not surprised the
devs wanted to get rid of it.


-- 
Steven




More information about the Python-list mailing list