Guido rethinking removal of cmp from sort method

Antoon Pardon Antoon.Pardon at rece.vub.ac.be
Mon Mar 28 08:39:04 EDT 2011


On Fri, Mar 25, 2011 at 10:06:59PM +0000, Steven D'Aprano wrote:
> On Fri, 25 Mar 2011 10:21:35 +0100, Antoon Pardon wrote:
> 
> > On Thu, Mar 24, 2011 at 11:49:53PM +0000, Steven D'Aprano wrote:
> >> On Thu, 24 Mar 2011 17:47:05 +0100, Antoon Pardon wrote:
> >> 
> >> > However since that seems to be a problem for you I will be more
> >> > detailed. The original poster didn't ask for cases in which cmp was
> >> > necessary, he asked for cases in which not using cmp was cumbersome.
> >> 
> >> I'm the original poster, and that's not what I said. I said:
> >> 
> >> "If anyone has any use-cases for sorting with a comparison function
> >> that either can't be written using a key function, or that perform
> >> really badly when done so, this would be a good time to speak up."
> >> 
> >> You'll notice that I said nothing about whether writing the code was
> >> easy or cumbersome, and nothing about readability.
> > 
> > Well fine. I should have realised the question was just a pretense and
> > that there really never was any intention to consider the reactions,
> > because the answer is already fixed. Of course a key function can always
> > be written, it may just need a specific class to implement the specific
> > order. Likewise there is no reason to expect the order-functions to
> > preform worse when implemented in a class, rather than in a function.
> 
> The reason Guido is considering re-introducing cmp is that somebody at 
> Google approached him with a use-case where a key-based sort did not 
> work. The use-case was that the user had masses of data, too much data 
> for the added overhead of Decorate-Sort-Undecorate (which is what key 
> does), but didn't care if it took a day or two to sort.

The question is: Why do you need a use case for this? Should the decision
to remove the cmp-argument, rest only on the fact whether or not current
users are processing such masses of data without considering the possibility
of future users processing such masses of data?

Isn't it obvious that if you force the users into using more memory, that
sooner or later, there will be a user for which this demand of extra memory
will be a problem?

The question is also, whether this really is a use case. Why don't you
advise this person to buy a computer with more memory? One could simply
see this as a problem of too little resources instead of a problem for
python.

IMO, the dev-team can look at this in two different ways. (1) If the
machine has enough resource to contain enormous lists of data, then
it should be possible to sort these data with only little extra memory.
or (2) Sorting will take extra memory proportional to the length of the
list you will try to sort. If your computer doesn't have this extra
memory, your computer just lacks the resource to sort this large
data with python.

It is this design decision that should guide whether or not to have
a cmp-argument. Not the fact whether or not there are now users with
a data set that still fits into a python program on their computer
but for which this data set is too large for python to sort on that
computer.

> So there is at least one use-case for preferring slowly sorting with a 
> comparison function over key-based sorting. I asked if there any others. 
> It seems not.

How about the possibility of cases where it isn't a choice between slowly
sorting with a comparison function over a faster key-based sorting but
where the choice is over sorting with a comparison function over a slower
and more memory consuming key-based sorting?

I tried to sort lists of 10000 elemens. Each element a tuple two items that
was to be sorted first according to the first item in ascending order, then
according to the second item in descending order. This was done on python 2.6
so I had to write my own cmp_to_key function.

I then sorted these lists of 10000 elements a 1000 times with the following
methods.

1) a cmp-function
2) key = cmp_to_key
3) key = wrapper class around cmp-function
4) key = class specifically written to implement the ordering.

This was done, once with number tuples and and once with string tuples.
The result in seconds

  numbers   strings

1   63.8      74.3
2  155.3     169.6
3  156.1     170.2
4   96.2     104.4

Now you can of course ignore those numbers because they don't come from a real
world example and thus consider such numbers unimportant until someone in the
future has a case similar enough with the above and wonders why sorting his
data is so slow.



More information about the Python-list mailing list