Sorting (was RE: A better self)

Tim Peters tim.one at comcast.net
Sun Aug 4 00:06:35 EDT 2002


[Terry Reedy]
> Few of us appreciate the days that Tim Peters spent writing a portable
> sort program that is significantly better than standard quicksort for
> several classes of input (such as a sorted list with new items
> appended to the end).

The samplesort hybrid that went into Python 1.5.2 is actually significantly
faster than Python's best previous quicksort on all inputs ever tested.
See?  I'm more underappreciated than even you realize <wink>.  quicksort
isn't really a good way to sort when comparisons are very expensive compared
to the cost of swapping elements (as is true in Python).

> He is currently working on another that may be even better.  (Three
> horn toots for Tim.)

That's checked in now (yay!), and will be "the sort" in Python 2.3.
Extensive information about it can be found in the patch report and its
attachments:

     http://www.python.org/sf/587076

Finn Bock was so encouraged by the numbers that he already recoded it in
Java for Jython (and did a wonderful porting job -- made me jealous I had to
code it in C!).  Jython *was* using Python 1.5's best-ever quicksort
algorithm, not 1.5.2's much hairier samplesort hybrid.  To get a feel for
the improvement from 1.5's quicksort (a hybrid of median-of-3 quicksort and
insertion sort) to this one, I'll include the Jython numbers Finn posted to
Python-Dev.

This is output from Python's Lib/test/sortperf.py when run with arguments
"15 19 1".  *sort is trying random data, and all the others have *some* kind
of structure in the data.  For example, /sort is an already-sorted array,
\sort a reverse-sorted array, %sort takes a sorted array and replaces a
randomly-selected 1% of its entries with new random values, and !sort was
hand-crafted as a bad case for 1.5's quicksort algorithm (BTW, there are no
*known* bad cases for the samplesort hybrid, although in theory some must
exist; the new 2.3 sort is N log N worst case, so has no bad cases even in
theory).  For the meaning of the other columns, see the comments in
sortperf.py.

The entries in the table are in seconds, and each line is testing lists with
2**i elements.  "timsort" is the name I gave to the new algorithm in a burst
of modesty <wink>:

"""
quicksort/insertionsort:

 i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   0.66   0.50   0.30   0.29   0.40   0.42   0.32   0.28   2.53
16   1.36   1.10   0.67   0.67   0.94   1.05   0.75   0.60   6.12
17   3.12   2.38   1.47   1.47   1.88   2.12   1.62   1.28  15.43
18   6.52   5.14   3.22   3.22   4.52   5.56   3.35   2.73  36.04
19  14.32  11.07   6.99   6.99   8.71  11.72   7.33   5.86  87.80

timsort:

 i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   0.44   0.05   0.03   0.03   0.04   0.06   0.17   0.02   0.06
16   0.82   0.08   0.06   0.07   0.07   0.11   0.32   0.05   0.11
17   1.76   0.18   0.13   0.13   0.13   0.23   0.64   0.11   0.22
18   3.87   0.34   0.26   0.29   0.27   0.49   1.29   0.21   0.45
19   8.91   0.70   0.53   0.54   0.54   1.07   2.62   0.43   0.90
"""

The C implementations are quite a bit faster than this, but I have to hand
it to Java:  if someone ported this algorithm to Python, you might not wait
for the results <wink>.  There are lots of timings from different platforms
in the patch report, comparing the C implementations of the samplesort
hybrid and the new algorithm.

highly-ordered-ly y'rs  - tim





More information about the Python-list mailing list