[Python-Dev] Sorting

Tim Peters tim.one@comcast.net
Mon, 22 Jul 2002 00:01:48 -0400


Just FYI.  I ripped out the complications I added to the mergesort variant
that tried to speed many-equal-keys cases, and worked on its core competency
(intelligent merging) instead.  There's a reason <wink>:  this kick started
while investigating ways to speed Zope B-Tree operations when they're used
as sets, equal keys are impossible in that context, but intelligent merging
can really help.  So whatever the fate of this sort, some of the code will
live on in Zope's B-Tree routines.

The result is that non-trivial cases of near-order got a nice boost, while
~sort got even slower again.  I added a new test +sort, which replaces the
last 10 values of a sorted array with random values.  samplesort has a
special case for this, limited to a maximum of 15 trailing out-of-order
entries.  timsort has no special case for this but does it significantly
faster than the samplesort hack anyway, has no limit on how many such
trailing entries it can exploit, and couldn't care less whether such entries
are at the front or the end of the array; I expect it would be (just) a
little slower if they were in the middle.  As shown below, timsort does a
+sort almost as fast as for a wholly-sorted array.  Ditto now for 3sort too,
which perturbs order by doing 3 random exchanges in a sorted array.

It's become a very interesting sort implementation, handling more kinds of
near-order at demonstrably supernatural speed than anything else I'm aware
of.  ~sort isn't an example of near-order.  Quite the contrary, it has a
number of inversions quadratic in N, and N/4 runs; the only reason ~sort
goes faster than *sort now is-- believe it or not --a surprising benefit
from a memory optimization.

Key:
    *sort: random data
    \sort: descending data
    /sort: ascending data
    3sort: ascending, then 3 random exchanges
    +sort: ascending, then 10 random at the end
    ~sort: many duplicates
    =sort: all equal
    !sort: worst case scenario

C:\Code\python\PCbuild>python -O sortperf.py 15 20 1
samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.18   0.02   0.01   0.14   0.01   0.07   0.01   0.17
16   65536   0.24   0.02   0.02   0.22   0.02   0.08   0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   0.09   1.13
19  524288   2.53   0.18   0.17   2.30   0.24   0.74   0.17   2.47
20 1048576   5.47   0.37   0.35   5.17   0.45   1.51   0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.01   0.01   0.14   0.01   0.02
16   65536   0.23   0.02   0.02   0.03   0.02   0.21   0.03   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.46   0.04   0.09
18  262144   1.16   0.09   0.09   0.12   0.09   1.01   0.08   0.18
19  524288   2.53   0.18   0.17   0.18   0.18   2.20   0.17   0.36
20 1048576   5.48   0.36   0.35   0.36   0.37   4.78   0.35   0.73