[Python-ideas] a sorting protocol dunder method?

Steven D'Aprano steve at pearwood.info
Sun Dec 3 19:34:20 EST 2017


On Sun, Dec 03, 2017 at 03:06:02PM -0800, Chris Barker wrote:

> Recent python has moved toward a "key" function for customized sorting:
> 
> list.sort(key=key_fun)
> 
> key is also used (according to
> https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in:
> 
> min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby()
> 
> with this fairly broad use, it seems it's becoming a fairly universal
> protocol for ordering.

Be careful: there are two different concepts here, which are only 
loosely related:

- ordering values, that is, whether or not we can say that x < y

- sorting values in a collection.

By default, we sort by the inherent order of the values. But if the 
values have no inherent order (they are unordered), we can sort 
unordered items in a collection by providing an appropriate key 
function. Hence why I say they are loosely related.

For example, we can sort the normally unordered complex numbers by 
providing a key function:

py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: (z.real, z.imag))
[1j, (1+8j), (3-2j), (5+2j)]

But conceptually, I'm imposing an order on an otherwise unordered data 
type. Complex numbers inherently have no order: it makes no sense to say 
that 1+8j is less than 3-2j. But since the items in the list have to be 
in *some* one-dimensional order, I can choose whichever order makes 
sense for *this* collection:

py> sorted([1+8j, 0+1j, 5+2j, 3-2j], key=lambda z: abs(z))
[1j, (3-2j), (5+2j), (1+8j)]

Another collection of the same values might be ordered differently. It 
doesn't make sense to put that functionality into the complex numbers 
themselves: complex numbers are unordered, and any order we impose on 
them comes from the collection, not the individual numbers.


> However, if you are writing a custom class, and want to make it "sortable",
> you need to define (some of) the total comparison operators, which
> presumably are then called O(n logn) number of times for comparisons when
> sorting.
> 
> Or provide a sort key function when you actually do the sorting, which
> requires some inside knowledge of the objects you are sorting.

This is conflating the two distinct concepts: the comparison operators 
apply to the values in the collection; the key function applies to the 
collection itself (although it does need to have inside knowledge of the 
items in the collection).


> But what if there was a sort key magic method:
> 
>  __key__ or __sort_key__ (or whatever)
> 
> that would be called by the sorting functions if:
> 
> no key function was specified
> 
> and
> 
> it exists
> 
> It seems this would provide a easy way to make custom classes sortable that
> would be nicer for end users (not writing key functions), and possibly more
> performant in the "usual" case.

I'm not sure how you can talk about performance of the __key__ method
without knowing the implementation :-)

If this __key__ method is called like __lt__, then the big O() behaviour 
will be the same, worst case, O(n log n). If it is called like the key 
function, then the big O() behaviour will be the same as the key 
function now. Either way, you're just changing the name of the function 
called, not how it is called.


> In fact, it's striking me that there may well be classes that are defining
> the comparison magic methods not because they want the objects to "work"
> with the comparison operators, but because that want them to work with sort
> and min, and max, and...

It is conceivable, I suppose, but if I were designing an unordered data 
type (like complex) I wouldn't implement ordering operators to allow 
sorting, I'd provide a separate key function.

But that assumes that there's only ever one way to order a collection of 
unordered items. I don't think that's a safe assumption. Again, look at 
complex above, there are at least three obvious ways:

- order by magnitude;
- order by real part first, then imaginary part;
- order by the absolute values of the real and imaginary parts 
  (so that 1+2j and -1-2j sort together).

I don't think it makes sense to bake into an unodered data type a single 
way of ordering. If there was such a natural order, then the data type 
wouldn't be unordered and you should just define the comparison 
operators.


> hmm, perhaps a __key__ method could even be used by the comparison
> operators, though that could result in pretty weird results when comparing
> two different types.

If the comparison operators fell back to calling __key__ when __lt__ etc 
aren't defined, that would effectively force unordered types like 
complex to be ordered.


> So: has this already been brought up and rejected?
> 
> Am I imagining the performance benefits?

Probably.

> Is sorting-related functionally too special-case to deserve a protocol?

Yes.



-- 
Steve


More information about the Python-ideas mailing list