[Python-Dev] RE: companies data for sorting comparisons

Tim Peters tim.one@comcast.net
Sun, 28 Jul 2002 13:52:33 -0400


Turns out there was one comparison per merge step I wasn't extracting
maximum value from.  Changing the code to suck all I can out of it doesn't
make a measurable difference on sortperf results, except for a tiny
improvement on ~sort on my box, but makes a difference on the Exchange case
of Kevin's data.  Here using

    values = [(x.get(fieldname), i, x) for i, x in enumerate(data)]

as the list to sort, and times are again in milliseconds:

Sorting on field 'Address' -- 6589 records
    via  sort:  41.24  41.39  41.41  41.42  86.71
    via msort:  42.90  43.01  43.07  43.15  43.75

Sorting on field 'Company' -- 6635 records
    via  sort:  40.24  40.34  40.42  40.43  42.58
    via msort:  30.42  30.45  30.58  30.66  30.66

Sorting on field 'Exchange' -- 6579 records
    via  sort:  59.64  59.70  59.71  59.72  59.81
    via msort:  27.06  27.11  27.19  27.29  27.54

Sorting on field 'NumberOfEmployees' -- 6531 records
    via  sort:  47.61  47.65  47.73  47.75  47.76
    via msort:  48.55  48.57  48.61  48.73  48.92

Sorting on field 'Phone' -- 6589 records
    via  sort:  48.00  48.03  48.32  48.32  48.39
    via msort:  49.60  49.64  49.68  49.79  49.85

Sorting on field 'Profile' -- 6635 records
    via  sort:  58.63  58.70  58.80  58.85  58.92
    via msort:   8.47   8.48   8.51   8.59   8.68

Sorting on field 'Symbol' -- 6635 records
    via  sort:  39.93  40.13  40.16  40.28  41.37
    via msort:   6.20   6.23   6.23   6.43   6.98

Sorting on field 'Web' -- 6632 records
    via  sort:  46.75  46.77  46.86  46.87  47.05
    via msort:  36.44  36.66  36.69  36.69  36.96

'Profile' is slower than the rest for samplesort because the strings it's
comparing are Yahoo URLs with a long common prefix -- the compares just take
longer in that case.  I'm not sure why 'Exchange' takes so long for
samplesort (it's a case with lots of duplicate primary keys, but the
distribution is highly skewed, not uniform as in ~sort).  In all cases now,
msort is a major-to-killer win, or a small (but real) loss.

I'll upload a new patch and new timsort.txt next.  Then I'm taking a week
off!  No, I wish it were for fun <wink/sigh>.