A Sort Optimization Technique: decorate-sort-dedecorate

Tim Peters tim.peters at gmail.com
Tue Aug 29 15:25:18 EDT 2006


[Joachim Durchholz]
>>> Wikipedia says it's going from 2NlogN to N. If a sort is massively
>>> dominated by the comparison, that could give a speedup of up to 100%
>>> (approximately - dropping the logN factor is almost irrelevant, what
>>> counts is losing that factor of 2).

[Gabriel Genellina]
>> In fact it's the other way - losing a factor of 2 is irrelevant,
>> O(2N)=O(N). The logN factor is crucial here.

[Joachim Durchholz]
> That's just a question of what you're interested in.
>
> If it's asymptotic behavior, then the O(logN) factor is a difference.
>
> If it's practical speed, a constant factor of 2 is far more relevant
> than any O(logN) factor.

Nope.  Even if you're thinking of base 10 logarithms, log(N, 10) > 2
for every N > 100.  Base 2 logarithms are actually most appropriate
here, and log(N, 2) > 2 for every N > 4.  So even if the "2" made
sense here (it doesn't -- see next paragraph), the log(N) term
dominates it for even relatively tiny values of N.

Now it so happens that current versions of Python (and Perl) use merge
sorts, where the worst-case number of comparisons is roughly N*log(N,
2), not Wikipedia's 2*N*log(N, 2) (BTW, the Wikipedia article
neglected to specify the base of its logarithms, but it's clearly
intended to be 2).  So that factor of 2 doesn't even exist in current
reality:  the factor of log(N, 2) is the /whole/ worst-case story
here, and, e.g., is near 10 when N is near 1000.  A factor of 10 is
nothing to sneeze at.

OTOH, current versions of Python (and Perl) also happen to use
"adaptive" merge sorts, which can do as few as N-1 comparisons in the
best case (the input is already sorted, or reverse-sorted).  Now I'm
not sure why anyone would sort a list already known to be sorted, but
if you do, DSU is a waste of time.  In any other case, it probably
wins, and usually wins big, and solely by saving a factor of (up to)
log(N, 2) key computations.

> (I'm not on the list, so I won't see responses unless specifically CC'd.)

Done.



More information about the Python-list mailing list