[Python-Dev] Sorting

Tim Peters tim.one@comcast.net
Sat, 27 Jul 2002 04:02:48 -0400


    http://www.python.org/sf/587076

has collected timings on 5 boxes so far.  I also noted that msort() gets a
32% speedup on my box when sorting a 1.33-million line snapshot of the
Python-Dev archive.  This is a puzzler to account for, since you wouldn't
think there's significant pre-existing lexicographic order in a file like
that.  McIlroy noted similar results from experiments on text, PostScript
and C source files in his adaptive mergesort (which is why I tried sorting
Python-Dev to begin with), but didn't offer a hypothesis.

Performance across platforms is a hoot so far, with Neal's box even seeing a
~6% speedup on *sort.

Skip's Pentium III acts most like my Pentium III, which shouldn't be
surprising.  Ours are the only reports where !sort is faster than *sort for
samplesort, and also where ~sort under samplesort is faster than ~sort under
timsort.

~sort (only 4 distinct values, repeated N/4 times) remains the most puzzling
of the tests by far.  Relative to its performance under samplesort,

    sf userid    ~sort speedup under timsort (negative means slower)
    ---------    ---------------------------------------------------
    montanaro    -23%
    tim_one      - 6%
    jacobs99     +18%
    lemburg      +25%
    nascheme     +30%

Maybe it's a big win for AMD boxes, and a mixed bag for Intel boxes.  Or
maybe it's a win for newer boxes, and a loss for older boxes.  Or maybe it's
a bigger win the higher the clock rate (it hurt the most on the slowest box,
and helped the most on the fastest).  Since it ends up doing a sequence of
perfectly balanced merges from start to finish, I thought perhaps it has to
do with OS and/or chip intelligence in read-ahead cache optimizations -- but
*sort also ends up doing a sequence of perfectly balanced merges, and
doesn't behave at all like ~sort across boxes.  ~sort does exercise the
galloping code much more than other tests (*sort rarely gets into galloping
mode; ~sort never gets out of galloping mode), so maybe it really has most
to do with cache design.

Whatever, it's starting to look like a no-brainer -- except for the
extremely mixed ~sort results, the numbers so far are great.