[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

David Mertz mertz at gnosis.cx
Sun Mar 5 12:51:57 EST 2017


On Sat, Mar 4, 2017 at 11:19 PM, Elliot Gorokhovsky <
elliot.gorokhovsky at gmail.com> wrote:

> (Summary of results: my patch at https://bugs.python.org/issue28685 makes
> list.sort() 30-50% faster in common cases, and at most 1.5% slower in the
> uncommon worst case.)
>

Thanks for the hard work, this looks very promising.


> While listsort.txt talks about benchmarking different kinds of structured
> lists, instead of just random lists, the results here would hold in those
> cases just as well, because this makes individual comparisons cheaper,
> instead of reducing the number of comparisons based on structure).
>

This point seems wrong on its face.  The main point of Timsort is to avoid
comparisons in mostly-sorted lists.  In makes a lot of difference whether
the initial data is random or mostly-sorted.

I presume the type homogeneity check takes a fixed O(N) time to perform.
In the random case, the sort will take O(N * log N) comparisons.  As a
multiplier, naive comparisons are clearly more expensive than type check.
But whether that type checking overhead is worthwhile obviously depends on
the number of comparisons, which might be O(N) if the data is sorted.

Real world data tends to be mostly-sorted.  So it would be useful for your
benchmarks to include:

A) Performance on completely sorted data
  i) Of homogenous type
  ii) Of mixed type
B) Performance on mostly-sorted data
  i) Of homogenous type
  ii) Of mixed type

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170305/f9ae8b56/attachment.html>


More information about the Python-ideas mailing list