[Python-Dev] Sorting
Tim Peters
tim.one@comcast.net
Mon, 22 Jul 2002 13:28:11 -0400
[Barry Warsaw]
> Except that in
>
> http://mail.python.org/pipermail/python-dev/2002-July/026837.html
>
> Tim says:
>
> "Back on Earth, among Python users the most frequent complaint
> I've heard is that list.sort() isn't stable."
Yes, and because the current samplesort falls back to a stable sort when
lists are small, almost everyone who cares about this and tries to guess
about stability via trying small examples comes to a wrong conclusion.
> and here
>
> http://mail.python.org/pipermail/python-dev/2002-July/026854.html
>
> Tim seems <wink> to be arguing against stable sort as being the
> default due to code bloat.
I'm arguing there against having two highly complex and long-winded sorting
algorithms in the core. Pick one. In favor of samplesort:
+ It can be much faster in very-many-equal-elements cases (note that
~sort lists have only 4 distinct values, each repeated N/4 times
and spread uniformaly across the whole list).
+ While it requires some extra memory, that lives on the stack and
is O(log N). As a result, it can never raise MemoryError unless
a comparison function does.
+ It's never had a bug reported against it (so is stable in a different
sense <wink>).
In favor of timsort:
+ It's stable.
+ The code is more uniform and so potentially easier to grok, and
because it has no random component is easier to predict (e.g., it's
certain that it has no quadratic-time cases).
+ It's incredibly faster in the face of many more kinds of mild
disorder, which I believe are very common in the real world. As
obvious examples, you add an increment of new data to an already-
sorted file, or paste together several sorted files. timsort
screams in those cases, but they may as well be random to
samplesort, and the difference in runtime can easily exceed a
factor of 10. A factor of 10 is a rare and wonderful thing in
algorithm development.
Against timsort:
+ It can require O(N) temp storage, although the constant is small
compared to object sizes. That means it can raise MemoryError
even if a comparison function never does.
+ Very-many-equal-elements cases can be much slower, but that's partly
because it *is* stable, and preserving the order of equal elements
is exactly what makes stability hard to achieve in a fast sort
(samplesort can't be made stable efficiently).
> As Tim's Official Sysadmin, I'm only good at channeling him on one
> subject, albeit probably one he'd deem most important to his life:
> lunch. So I'm not sure if he's arguing for or against stable sort
> being the default. ;)
All else being equal, a stable sort is a better choice. Alas, all else
isn't equal. If Python had no sort method now, I'd pick timsort with scant
hesitation. Speaking of which, is it time for lunch yet <wink>?