[Python-ideas] a sorting protocol dunder method?
Steven D'Aprano
steve at pearwood.info
Mon Dec 4 08:16:14 EST 2017
On Sun, Dec 03, 2017 at 06:53:45PM -0800, Bruce Leban wrote:
> I think you're arguing against this for the wrong reason. Chris was talking
> about custom classes having the *option* of making them sortable by
> providing a key method in the class definition.
I never imagined that it would be a required method. Of course it is
optional. But by adding interpreter support, we're blessing something
which I think is a misguided design choice as the One Obvious Way.
We're taking something which belongs in the report generator or
collection, the knowledge of how to sort a collection of unordered
values, and baking it into the values themselves. (Effectively making
them ordered!)
That's the wrong design. Your report needs to know about your values,
your values shouldn't need to know how the report is formatted.
Its like saying that you want an Email object, and a SMTP_Server object,
but to make it more convenient for the SMTP_Server object, we should
give the Email objects themselves a method that knows how to talk to
port 25 and send themselves. Then the SMTP_Server just calls
email.send() on each method. How convenient.
The same applies here. Sure, it is convenient to just call
bank_accounts.sort() (to re-use the example given by Carl) and it
magically works, but as soon as your report changes and you want the
bank accounts sorted according to their account name instead of balance,
you have to either provide a key function, or change the __key__ method.
Obviously changing the __key__ method will break any other reports that
rely on it, so you end up using a key function anyway.
I would mind this less if it isn't blessed by the interpreter. There are
lots of classes which are excessively coupled to other things. I've
written a few of them myself, so I understand the temptation.
Sometimes that design might even be justified under "Practicality beats
purity". But I don't think this is one of those cases: I don't see this
as important enough or common enough to build it into the language as an
actual dunder method.
If people like Chris' idea, just add a sortkey() method to your class,
and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will
Just Work. It is explicit, backwards compatible and doesn't need to wait
for Python 3.8 or whenever this (mis)feature gets (hypothetically)
added.
> On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>
> > Am I imagining the performance benefits?
> >
>
> Maybe. Looking strictly at O(-) cost, there's no difference between a key
> function and comparison operators. Sure it might potentially only make O(n)
> calls to the key function and O(n log n) calls to compare the keys vs. O(n
> log n) calls to the comparator functions but that might not actually be
> faster.
It is unlikely that calling a key function followed by key comparisons
would be faster than just calling the key comparisons. Using a key
function is effectively the old DSU idiom:
values = [(key(x), x) for x in values] # O(n)
values.sort() # O(n log n)
values = [t[1] for t in values] # O(n)
so you make two extra passes through the list. The only way that could
be faster is if key(x).__lt__ is sufficiently cheap compared to x.__lt__
that it saves more than the cost of those two extra passes (plus the
overhead from dealing with the extra tuples).
You might be thinking of the old Python 1 and early Python 2 cmp
argument to sort. The comparator function can end up calling x.__lt__ up
to O(n**2) times if I remember correctly, so it is quite expensive.
> There certainly are cases where implementing a key function would
> be quite slow.
The biggest problem with a key function is that it trades off memory for
time. If you are constrained by memory, but don't care how slow your
sort is, an old-school comparison function might suit you better. But
for those cases, functools.cmp_to_key may help.
--
Steve
More information about the Python-ideas
mailing list