[Python-Dev] sum(...) limitation

David Wilson dw+python-dev at hmmz.org
Sat Aug 2 22:35:13 CEST 2014


On Sat, Aug 02, 2014 at 05:39:12PM +1000, Steven D'Aprano wrote:

> Repeated list and str concatenation both have quadratic O(N**2)
> performance, but people frequently build up strings with + and rarely
> do the same for lists. String concatenation with + is an attractive
> nuisance for many people, including some who actually know better but
> nevertheless do it. Also, for reasons I don't understand, many people
> dislike or cannot remember to use ''.join.

join() isn't preferable in cases where it damages readability while
simultaneously providing zero or negative performance benefit, such as
when concatenating a few short strings, e.g. while adding a prefix to a
filename.

Although it's true that join() is automatically the safer option, and
especially when dealing with user supplied data, the net harm caused by
teaching rote and ceremony seems far less desirable compared to fixing a
trivial slowdown in a script, if that slowdown ever became apparent.

Another (twisted) interpretation is that since the quadratic behaviour
is a CPython implementation detail, and there are alternatives where
__add__ is constant time, encouraging users to code against
implementation details becomes undesirable. In our twisty world, __add__
becomes *preferable* since the resulting programs more closely resemble
pseudo-code.

    $ cat t.py
    a = 'this '
    b = 'is a string'
    c = 'as we can tell'

    def x():
        return a + b + c

    def y():
        return ''.join([a, b, c])

    $ python -m timeit -s 'import t' 't.x()'
    1000000 loops, best of 3: 0.477 usec per loop

    $ python -m timeit -s 'import t' 't.y()'
    1000000 loops, best of 3: 0.695 usec per loop


David


More information about the Python-Dev mailing list