[Python-ideas] a sorting protocol dunder method?

Barry Scott barry at barrys-emacs.org
Wed Dec 6 14:47:22 EST 2017



> On 5 Dec 2017, at 01:06, Chris Barker <chris.barker at noaa.gov> wrote:
> 
> wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-)
> 
> This could be a good idea -- just putting it here for the record as it's mentioned elsewhere.
> 
> I can't think of a way to profile this easily -- we know that having a key function can be helpful, but that doesn't take into account the extra method lookup -- maybe a key function that involves a method lookup??
> 
> If it defines both, it isn't clear which will be used for sorting.
> Should __lt__ take priority, or __key__? Whichever we choose, somebody
> is going to be upset and confused by the choice.
> 
> __sort_key__ would take priority -- that is a no brainer, it's the sort key, it's used for sorting. And __lt__ giving a different result is no more surprising, and probably less surprising, than total ordering being violated in  any other way.

If by no brainer you mean the performance of __sort-key__ is always better of __lt__ then I will wask for a proof in the form of benchmarks with enough use-case coverage.

> [I got very similar results as Barry with his version: about 5X faster with the key function]
> 
> def outer_key(item):
>     return item.key()
> 
> so we get a function lookup each time it's used.
> 
> However, I'm confused by the results -- essentially NO Change. That extra method lookup is coming essentially for free. And this example is using a tuple as the key, so not the very cheapest possible to sort key.
> 
> Did I make a mistake? is that lookup somehow cached?
> 
> In [36]: run sort_key_test.py
> 10000
> key       0.012529s  10000 calls
> outer_key 0.012139s  10000 calls
> lt        0.048057s 119877 calls
> 
> each run gives different results, but the lt method is always on order of 5X slower for this size list. Sometimes out_key is faster, mostly a bit slower, than key.
> 
> Also, I tried making a "simpler" __lt__ method:
> 
> return (self.value1, self.value2) < (other.value1, other.value2)
> 
> but it was bit slower than the previous one -- interesting.

This is more expensive to execute then my version for 2 reasons.
1) my __lt__ did not need to create any tuples.
2) my __lt__ can exit after only looking at the value1's

> 
> Then I tried a simpler (but probably common) simple attribute sort:
> 
>     def __lt__(self, other):
>         global lt_calls
>         lt_calls += 1
> 
>         return self.value1 < other.value1
> 
>     def key(self):
>         global key_calls
>         key_calls += 1
> 
>         return self.value1
> 
> And that results in about a 6X speedup
> 
> In [42]: run sort_key_test.py
> 10000
> key       0.005157s  10000 calls
> outer_key 0.007000s  10000 calls
> lt        0.041454s 119877 calls
> time ratio: 5.922036784741144
> 
> 
> And, interestingly (t me!) there is even a performance gain for only  a 10 item list! (1.5X or so, but still)

My guess is that this is because the __lt__ test on simple types is very fast in python.

Barry


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171206/6c5352d6/attachment.html>


More information about the Python-ideas mailing list