[Python-Dev] sum(...) limitation

Nick Coghlan ncoghlan at gmail.com
Tue Aug 12 02:19:06 CEST 2014


On 12 Aug 2014 03:03, "Chris Barker - NOAA Federal" <chris.barker at noaa.gov>
wrote:
>
> My confusion is still this:
>
> Repeated summation of strings has been optimized in cpython even
> though it's not the recommended way to solve that problem.

The quadratic behaviour of repeated str summation is a subtle, silent
error. It *is* controversial that CPython silently optimises some cases of
it away, since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.

It's considered worth the cost, since it dramatically improves the
performance of common naive code in a way that doesn't alter the semantics.

> So why not special case optimize sum() for strings? We are already
> special-case strings to raise an exception.
>
> It seems pretty pedantic to say: we cod make this work well, but we'd
> rather chide you for not knowing the "proper" way to do it.

Yes, that's exactly what this is - a nudge towards the right way to
concatenate strings without incurring quadratic behaviour. We *want* people
to learn that distinction, not sweep it under the rug. That's the other
reason the implicit optimisation is controversial - it hides an important
difference in algorithmic complexity from users.

> Practicality beats purity?

Teaching users the difference between linear time operations and quadratic
ones isn't about purity, it's about passing along a fundamental principle
of algorithm scalability.

We do it specifically for strings because they *do* have an optimised
algorithm available that we can point users towards, and concatenating
multiple strings is common.

Other containers don't tend to be concatenated like that in the first
place, so there's no such check pushing other iterables towards
itertools.chain.

Regards,
Nick.

>
> -Chris
>
>
>
>
> > Although that's not the whole story: in
> > practice even numerical sums get split into multiple functions because
> > floating point addition isn't associative, and so needs careful
> > treatment to preserve accuracy.  At that point I'm strongly +1 on
> > abandoning attempts to "rationalize" summation.
> >
> > I'm not sure how I'd feel about raising an exception if you try to sum
> > any iterable containing misbehaved types like float.  But not only
> > would that be a Python 4 effort due to backward incompatibility, but
> > it sorta contradicts the main argument of proponents ("any type
> > implementing __add__ should be sum()-able").
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140812/47f50f25/attachment.html>


More information about the Python-Dev mailing list