[Python-ideas] a sorting protocol dunder method?

Steven D'Aprano steve at pearwood.info
Mon Dec 4 08:16:14 EST 2017


On Sun, Dec 03, 2017 at 06:53:45PM -0800, Bruce Leban wrote:

> I think you're arguing against this for the wrong reason. Chris was talking
> about custom classes having the *option* of making them sortable by
> providing a key method in the class definition.

I never imagined that it would be a required method. Of course it is 
optional. But by adding interpreter support, we're blessing something 
which I think is a misguided design choice as the One Obvious Way.

We're taking something which belongs in the report generator or 
collection, the knowledge of how to sort a collection of unordered 
values, and baking it into the values themselves. (Effectively making 
them ordered!)

That's the wrong design. Your report needs to know about your values, 
your values shouldn't need to know how the report is formatted.

Its like saying that you want an Email object, and a SMTP_Server object, 
but to make it more convenient for the SMTP_Server object, we should 
give the Email objects themselves a method that knows how to talk to 
port 25 and send themselves. Then the SMTP_Server just calls 
email.send() on each method. How convenient.

The same applies here. Sure, it is convenient to just call 
bank_accounts.sort() (to re-use the example given by Carl) and it 
magically works, but as soon as your report changes and you want the 
bank accounts sorted according to their account name instead of balance, 
you have to either provide a key function, or change the __key__ method. 
Obviously changing the __key__ method will break any other reports that 
rely on it, so you end up using a key function anyway.

I would mind this less if it isn't blessed by the interpreter. There are 
lots of classes which are excessively coupled to other things. I've 
written a few of them myself, so I understand the temptation.

Sometimes that design might even be justified under "Practicality beats 
purity". But I don't think this is one of those cases: I don't see this 
as important enough or common enough to build it into the language as an 
actual dunder method.

If people like Chris' idea, just add a sortkey() method to your class, 
and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will 
Just Work. It is explicit, backwards compatible and doesn't need to wait 
for Python 3.8 or whenever this (mis)feature gets (hypothetically) 
added.


> On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> 
> > Am I imagining the performance benefits?
> >
> 
> Maybe. Looking strictly at O(-) cost, there's no difference between a key
> function and comparison operators. Sure it might potentially only make O(n)
> calls to the key function and O(n log n) calls to compare the keys vs. O(n
> log n) calls to the comparator functions but that might not actually be
> faster.

It is unlikely that calling a key function followed by key comparisons 
would be faster than just calling the key comparisons. Using a key 
function is effectively the old DSU idiom:

    values = [(key(x), x) for x in values]  # O(n)
    values.sort()                           # O(n log n)
    values = [t[1] for t in values]         # O(n)

so you make two extra passes through the list. The only way that could 
be faster is if key(x).__lt__ is sufficiently cheap compared to x.__lt__ 
that it saves more than the cost of those two extra passes (plus the 
overhead from dealing with the extra tuples).

You might be thinking of the old Python 1 and early Python 2 cmp 
argument to sort. The comparator function can end up calling x.__lt__ up 
to O(n**2) times if I remember correctly, so it is quite expensive.


> There certainly are cases where implementing a key function would
> be quite slow.

The biggest problem with a key function is that it trades off memory for 
time. If you are constrained by memory, but don't care how slow your 
sort is, an old-school comparison function might suit you better. But 
for those cases, functools.cmp_to_key may help.



-- 
Steve


More information about the Python-ideas mailing list