[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

David Mertz mertz at gnosis.cx
Sun Mar 5 22:18:55 EST 2017


In my experience and/or estimation... well two things:

A. Mixtures of floats/ints seem a lot more common than the 10% being thrown
around.  I don't even consider that bad practice currently (it might become
bad practice if Elliot's patch is used, since keeping elements float-only
would allow much faster sorting).

B. I think a very large percentage of lists are heterogeneous.  But most of
the time when they are, it's not because there are several different
numeric types but rather because a list collects some sort of custom
objects.  Maybe those even all share some ancestor, but the point is they
are user/library defined classes that won't have a fast comparison like
ints or floats do.

B (part 2): I haven't actually read his patch, but I'm assuming that
Elliot's approach can start scanning the list, and as soon as it find a
complex/custom object at index 0 ignore the rest of the scan.  So for that
case, there is very little harm in his linear scan (over one item).

On Sun, Mar 5, 2017 at 7:09 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Mon, Mar 6, 2017 at 2:03 PM, Elliot Gorokhovsky
> <elliot.gorokhovsky at gmail.com> wrote:
> > On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico <rosuav at gmail.com> wrote:
> >>
> >>
> >> I would be rather curious to know how frequently a list consists of
> >> "numbers", but a mix of ints and floats. Does it happen a
> >> lot in real-world code?
> >>
> >
> > This is of course undecidable to verify statically, so we can't just
> crawl
> > PyPI... however, I would argue that using mixed float-int lists is
> > dangerous, and is more dangerous in Python 3 than in Python 2. So
> hopefully
> > this is not very common. However, even if 10% (surely a vast
> overestimate)
> > of sort calls are to mixed int-float lists, my patch would still yield a
> > significant savings on average.
>
> I agree that it's dangerous, but it is still common for programmers
> and Python alike to treat 10 as functionally identical to 10.0 -
> although as to being more dangerous in Py3, that's much of a muchness
> (for instance, the single-slash division operator in Py2 can
> potentially truncate, but in Py3 it's always going to give you a
> float). But, fair point. I very much doubt it's as high as 10%, so
> yeah, that would be advantageous.
>
> Also, the performance hit is so small, and even that is in the very
> worst case (a homogeneous list with one different type at the end). I
> like changes that make stuff run faster.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170305/aedb856f/attachment-0001.html>


More information about the Python-ideas mailing list