Guido rethinking removal of cmp from sort method

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Mar 30 22:13:53 EDT 2011


On Wed, 30 Mar 2011 11:06:20 +0200, Antoon Pardon wrote:

> As far as I can see, key will only produce significant speedups, if
> comparing items can then be completly done internally in the python
> engine without referencing user python code.

Incorrect. You don't even need megabytes of data to see significant 
differences. How about a mere 1000 short strings?


[steve at wow-wow ~]$ python2.6
Python 2.6.6 (r266:84292, Dec 21 2010, 18:12:50)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from random import shuffle
>>> data = ['a'*n for n in range(1000)]
>>> shuffle(data)
>>> from timeit import Timer
>>>
>>> t_key = Timer('sorted(data, key=lambda a: len(a))',
... 'from __main__ import data')
>>> t_cmp = Timer('sorted(data, cmp=lambda a,b: cmp(len(a), len(b)))',
... 'from __main__ import data')
>>>
>>> min(t_key.repeat(number=1000, repeat=5))
0.89357517051696777
>>> min(t_cmp.repeat(number=1000, repeat=5))
7.6032949066162109


That's almost ten times slower.

Of course, the right way to do that specific sort is:

>>> t_len = Timer('sorted(data, key=len)', 'from __main__ import data')
>>> min(t_len.repeat(number=1000, repeat=5))
0.64559602737426758

which is even easier and faster. But even comparing a pure Python key 
function to the cmp function, it's obvious that cmp is nearly always 
slower.

Frankly, trying to argue that cmp is faster, or nearly as fast, is a 
losing proposition. In my opinion, the only strategy that has even a 
faint glimmer of hope is to find a convincing use-case where speed does 
not matter.

Or, an alternative approach would be for one of the cmp-supporters to 
take the code for Python's sort routine, and implement your own sort-with-
cmp (in C, of course, a pure Python solution will likely be unusable) and 
offer it as a download. For anyone who knows how to do C extensions, this 
shouldn't be hard: just grab the code in Python 2.7 and make it a stand-
alone function that can be imported. 

If you get lots of community interest in this, that is a good sign that 
the solution is useful and practical, and then you can push to have it 
included in the standard library or even as a built-in.

And if not, well, at least you will be able to continue using cmp in your 
own code.



-- 
Steven



More information about the Python-list mailing list