[issue35010] sort by partially reversed key tuple

Cyker Way report at bugs.python.org
Thu Oct 18 09:36:44 EDT 2018


Cyker Way <cykerway at gmail.com> added the comment:

Thank you very much for the great discussion here, especially Tim's great threads in *python-ideas* that give neat and insightful answers to this problem in different ways:

-   <https://mail.python.org/pipermail/python-ideas/2016-October/043045.html>

-   <https://mail.python.org/pipermail/python-ideas/2016-October/043126.html>

Since this topic is closed, future discussions probably should go to other python forums. But it might be good to draw some conclusions here for future reference.

First of all, either single-pass sort with a vector key or multi-pass sort with a scalar key may work better, depending on the input. However, in most cases, using multi-pass sort for such problem is the right way to go in the current python implementation. The multi-pass sort algorithm typically runs 2x faster or so than a single-pass sort algorithm. This is likely due to constants rather than asymptotic complexity. But when measured in real time, multi-pass sort algorithm clearly wins in most cases.

If your input is so special that it aligns much better with single-pass sort algorithms (overwhelming the constant advantage of multi-pass sort algorithm), you may use a single-pass sort algorithm. But there are actually different ways of implementing so. The algorithm posted in Tim's second thread on python-ideas is in fact different from mine in this bug thread, where Tim used a wrapper class for the keys and I used a wrapper class for the scalars. Since there are `n` keys but can be as many as `n * m` scalars, my method would be using more wrapper objects. So I expected it to run slower than Tim's. To my surprise, sometimes it works better. The reason is later found to be easy to understand: Wrapper objects are created only when a column needs to be reversed. The number of columns to be reversed is actually a factor controlling the running time. To give a fair evaluation, I created 500000 random rows with 8 columns and control the number of columns to be reversed. The evaluation shows my algorithm could have running time 80%-120% that of Tim's (excluding the special case where no column needs to be reversed). This shows a new direction of optimizing such algorithms.

Finally, this issue was rejected because the added benefits were deemed not enough to complicate the implementation. Considering the current multi-pass sort algorithm has a huge advantage and is easy to implement in python, this decision is fair. Users who care less about performance may write a key adapter in their own code if they want to stick with builtin sort functions. Users who do care about performance can use single-pass sort techniques mentioned in this issue in case multi-pass sort doesn't work well with their data.

----------
Added file: https://bugs.python.org/file47880/performance-2.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35010>
_______________________________________


More information about the Python-bugs-list mailing list