[Python-Dev] decorate-sort-undecorate

Thu Oct 16 10:25:54 EDT 2003

[Just]
> On the sf tracker item you [Raymond] write:
>
>     def sort(self, cmp=None, key=None, reverse=None):
>         if cmp is not None and key is not None:
>             cmp = cmpwrapper(cmp)
>         if key is not None:
>             self[:] = [sortwrapper(key(x), x) for x in self]
>         if reverse is not None:
>             self.reverse()
>         self.sort(cmp)
>         if key is not None:
>             self[:] = [x.getvalue() for x in self]
>         if reverse is not None:
>             self.reverse()
>
> Is there consensus at all about the necessity of that first reverse
> call? To me it's not immediately obvious that the reverse option
> should maintain the _original_ stable order. In my particular
> application I would actually want reverse to do just that: reverse
> the result of the sort. Easy enough to work around of course: I could
> do the reverse myself after the sort. But it does feel odd: sort()
> now _has_ a reverse feature, but I can't use it...

"reverse" here is being used in the sense of "flip the sense of the cmp()
result", so that instead of using cmp(x, y), it (conceptually) uses the
negation of cmp(x, y).  This swaps "less than" with "greater than" outcomes,
but leaves "equal" outcomes alone.  In that sense, Raymond's is a clever and
correct implementation.  I don't know that it helps Alex's use case, though
(multi-key sort where some keys want ascending and others descending; those
are still tricky to write directly in one bite, although the reverse
argument makes them easy to do by chaining sorts one key at a time).

> (Also: how does timsort perform when fed a (partially) sorted list
> compared to a reversed sorted list?

I'll need a concrete example to figure out exactly what that's intended to
mean.  The algorithm is equally happy with descending runs as with ascending
runs, although the former need a little time to transform them to ascending
runs, and the all-equal case counts as an ascending run.

> If there's a significant difference there, than that first reverse
> call may actually hurt performance in some cases. Not that I care
> much about that...)

Say we're doing [1, 2, 3].sort(reverse=True).  Raymond first reverses it:

    [3, 2, 1]

In one pass, using two compares (N-1 for a list of length N), the algorithm
recognizes that the whole thing is a single descending run.  It then
reverses it in one pass (swapping elements starting at both ends and moving
toward the middle):

   [1, 2, 3]

and it's done.  Raymond then reverses it again:

   [3, 2, 1]

So there are 3 reversals in all.  Reversals are cheap, since they just swap
pointers in a tight little C loop, and never call back into Python.