String concatenation performance with +=

Sammo sammo2828 at gmail.com
Sat Feb 14 00:44:12 EST 2009


Okay, this is what I have tried for string concatenation:

1. Using += implemented using simple operations (12 ms)
2. Using += implemented inside a class (4000+ ms)
3. Using "".join implemented using simple operations (4000+ ms)
4. Using "".join implemented inside a class (4000+ ms)

On Feb 14, 3:12 pm, Benjamin Peterson <benja... at python.org> wrote:
> Sammo <sammo2828 <at> gmail.com> writes:
>
> > String concatenation has been optimized since 2.3, so using += should
> > be fairly fast.
>
> This is implementation dependent and shouldn't be relied upon.
>
> > Note that I need to do something to mydata INSIDE the loop, so please
> > don't tell me to append moredata to a list and then use "".join after
> > the loop.
>
> Then why not just mutate the list and then call "".join?

AFAIK, using list mutation and "".join only improves performance if
the "".join is executed outside of the loop. In fact, in Python 2.5.2,
using "".join inside the loop is actually much slower compared to my
original test, which concatenates using +=.

My original test with simple operations took 12 ms to run:

import time
t = time.time()
mydata = ""
moredata = "A"*4096
for i in range(1000):
    mydata += moredata # 12.352 ms
    # do some stuff to mydata
    # ...
print "%0.3f ms"%(1000*(time.time() - t))

New code modified to mutate the list, then call "".join now takes 4417
ms to run. This is much slower!

import time
t = time.time()
mydata = []
moredata = "A"*4096
for i in range(1000):
    mydata.append(moredata)
    mydata = ["".join(mydata)]
    # do some stuff to mydata
    # ...

Using list mutation and "".join, implemented in a class. This took
4434 ms to run, which is again much slower than the original test.
Note that it is about the same speed as using += implemented in a
class.

import time
moredata = "A"*4096
class StringConcatTest:
    def __init__(self):
        self.mydata = []

    def feed(self, moredata):
        self.mydata.append(moredata)
        self.mydata = ["".join(self.mydata)]
        # do some stuff to self.mydata
        # ...

test = StringConcatTest()
t = time.time()
for i in range(1000):
    test.feed(moredata)
print "%0.3f ms"%(1000*(time.time() - t))


> > Why is the second test so much slower?
>
> Probably several reasons:
>
> 1. Function call overhead is quite large compared to these simple operations.
> 2. You are resolving attribute names.

The main problem I need help with is how to improve the performance of
the code implemented in a class. It is currently 350x slower than the
first test using simple operations, so surely there's got to be a way.



More information about the Python-list mailing list