on sorting things

Chris Angelico rosuav at gmail.com
Thu Dec 19 16:23:44 EST 2019


On Fri, Dec 20, 2019 at 8:06 AM Eli the Bearded <*@eli.users.panix.com> wrote:
>
> In comp.lang.python, Peter Otten  <__peter__ at web.de> wrote:
> > Eli the Bearded wrote:
> >> But what caught my eye most, as someone relatively new to Python but
> >> with long experience in C in Perl, is sorting doesn't take a
>
> s/C in /C and/
>
> Ugh.
>
> >> *comparison* function, it takes a *key generator* function, and that
> >> function is supposed to transform the thing into something that the
> >> native comparison knows how to compare.
> >>
> >> This seems a strange choice, and I'm wondering if someone can explain
> >> the benefits of doing it that way to me.
> >
> > Python 2 started with a comparison function and then grew a key function.
> > With a key function you still have to compare items, you are just breaking
> > the comparison into two steps:
>
> [snip]
>
> Thanks for that good explanation. The benchmark comparison makes it
> very thorough.
>
> In my mind I gravitate towards the complicated sorts of sort that can be
> quickly compared for some sorts of keys and not as quickly for others.
>
> Consider a sort that first compares file size and if the same number of
> bytes, then compares file checksum. Any decently scaled real world
> implementation would memoize the checksum for speed, but only work it out
> for files that do not have a unique file size. The key method requires
> it worked out in advance for everything.
>
> But I see the key method handles the memoization under the hood for you,
> so those simpler, more common sorts of sort get an easy to see benefit.
>

I guess that's a strange situation that might actually need this kind
of optimization, but if you really do have that situation, you can
make a magical key that behaves the way you want.

class SizeChecksum:
    def __init__(self, fn):
        self.size = os.stat(fn).st_size
        self._checksum = None
    @property
    def checksum(self):
        if self._checksum is not None: return self._checksum
        ...
    def __lt__(self, other):
        if self.size != other.size: return self.size < other.size
        return self.checksum < other.checksum

I can't remember exactly which comparison operators list.sort() uses,
so you'd want to add another function here for an equality check, or
maybe <=, or whichever is used. In any case, what matters is that the
weird situations can still be handled, albeit by largely falling back
on comparison semantics. But for the vast majority of situations, all
you need is a key function that returns a number, a string, or a tuple
of numbers and strings.

ChrisA


More information about the Python-list mailing list