[issue34561] Replace list sorting merge_collapse()?

Tim Peters report at bugs.python.org
Wed Sep 5 01:55:10 EDT 2018


Tim Peters <tim at python.org> added the comment:

A new version of the file models a version of the `powersort` merge ordering too.  It clearly dominates timsort and 2-merge in all cases tried, for this notion of "cost".

Against it, its code is much more complex, and the algorithm is very far from obvious.  The paper "motivates" it but doesn't really explain it in enough detail to make most of it clear (but refers to other papers where pieces of it are developed).

Curious:  unless a run is the last in an array, powersort never merges it immediately upon finding it.  Instead its start and length are used to compute a "power", in turn used to decide whether to merge some previous number of runs.  A dollar to anyone who can figure out what the heck the "power" computation is really doing in less than a full day without reading the paper ;-)  Then two dollars if you can prove that the "never equal!" assert holds.

It would require timing lots of C code on various cases to see whether the possible delay in merging new runs really matters.  It's unclear to me a priori, because it buys something too (less memory copying overall).

In any case, just switching to 2-merge would be easy, and that alone is enough to squash the contrived "bad cases" for the current approach.  `powersort` would too, but may also actually help a bit in ordinary cases.  Or not.

----------
Added file: https://bugs.python.org/file47785/runstack.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34561>
_______________________________________


More information about the Python-bugs-list mailing list