Steve D'Aprano, you're the "master". What's wrong with this concatenation statement?

Steven D'Aprano steve at pearwood.info
Thu May 12 11:06:18 EDT 2016


On Thu, 12 May 2016 07:36 pm, Ned Batchelder wrote:

> The CPython optimization depends on the string having only a single
> reference.  A seemingly unrelated change to the code can change the
> performance significantly:
> 
>     In [1]: %%timeit
>        ...: s = ""
>        ...: for x in xrange(100000):
>        ...:   s = s + str(x)
>        ...:
>     10 loops, best of 3: 33.5 ms per loop
> 
>     In [2]: %%timeit
>        ...: s = t = ""
>        ...: for x in xrange(100000):
>        ...:   s = t = s + str(x)
>        ...:
>     1 loop, best of 3: 1.57 s per loop


Nice demonstration!

But it is actually even worse than that. The optimization depends on memory
allocation details which means that some CPython interpreters cannot use
it, depending on the operating system and version.

Consequently, reliance on it can and has lead to embarrassments like this
performance bug which only affected *some* Windows users. In 2009, Chris
Withers asked for help debugging a problem where Python httplib was
hundreds of times slower than other tools, like wget and Internet Explorer:

https://mail.python.org/pipermail/python-dev/2009-August/091125.html

A few weeks later, Simon Cross realised the problem was probably the
quadratic behaviour of repeated string addition:

https://mail.python.org/pipermail/python-dev/2009-September/091582.html

leading to this quote from Antoine Pitrou:

"Given differences between platforms in realloc() performance, it might be
the reason why it goes unnoticed under Linux but degenerates under
Windows."

https://mail.python.org/pipermail/python-dev/2009-September/091583.html

and Guido's comment:

"Also agreed that this is an embarrassment."

https://mail.python.org/pipermail/python-dev/2009-September/091592.html

So beware of relying on the CPython string concatenation optimization in
production code!


Here's the tracker issue that added the optimization in the first place:

http://bugs.python.org/issue980695

The feature was controversial at the time (and remains slightly so):

https://mail.python.org/pipermail/python-dev/2004-August/046686.html


My opinion is that it is great for interactive use at the Python prompt, but
I would never use it in code I cared about.



-- 
Steven




More information about the Python-list mailing list