String concatenation performance with +=

Nick Craig-Wood nick at craig-wood.com
Sat Feb 14 11:32:00 EST 2009


Sammo <sammo2828 at gmail.com> wrote:
>  String concatenation has been optimized since 2.3, so using += should
>  be fairly fast.
> 
>  In my first test, I tried concatentating a 4096 byte string 1000 times
>  in the following code, and the result was indeed very fast (12.352 ms
>  on my machine).
> 
>  import time
>  t = time.time()
>  mydata = ""
>  moredata = "A"*4096
>  for i in range(1000):
>      mydata += moredata # 12.352 ms
>  print "%0.3f ms"%(1000*(time.time() - t))
> 
>  However, I got a different result in my second test, which is
>  implemented in a class with a feed() method. This test took 4653.522
>  ms on my machine, which is 350x slower than the previous test!
> 
>  class StringConcatTest:
>      def __init__(self):
>          self.mydata = ""
> 
>      def feed(self, moredata):
>          self.mydata += moredata # 4653.522 ms
> 
>  test = StringConcatTest()
>  t = time.time()
>  for i in range(1000):
>      test.feed(moredata)
>  print "%0.3f ms"%(1000*(time.time() - t))
> 
>  Note that I need to do something to mydata INSIDE the loop, so please
>  don't tell me to append moredata to a list and then use "".join after
>  the loop.
> 
>  Why is the second test so much slower?

The optimized += depends on their being no other references to the
string.  Strings are immutable in python.  So append must return a new
string.  However the += operation was optimised to do an in-place
append if and only if there are no other references to the string.

You can see this demonstrated here

$ python -m timeit -s 'a="x"' 'a+="x"'
1000000 loops, best of 3: 0.231 usec per loop

$ python -m timeit -s 'a="x"; b=a' 's = a; a+="x"'
100000 loops, best of 3: 30.1 usec per loop

You are keeping the extra reference in a class instance like this

$ python -m timeit -s 'class A(object): pass' -s 'a=A(); a.a="x"' 'a.a+="x"'
100000 loops, best of 3: 30.7 usec per loop

Knowing that, this optimization suggests itself

$ python -m timeit -s 'class A(object): pass' -s 'a=A(); a.a="x"' 's = a.a; a.a = None; s += "x"; a.a = s'
1000000 loops, best of 3: 0.954 usec per loop

Or in your example

class StringConcatTest:
    def __init__(self):
        self.mydata = ""
    def feed(self, moredata):
        #self.mydata += moredata
        s = self.mydata
        del self.mydata
        s += moredata
        self.mydata = s

moredata = "A"*4096
test = StringConcatTest()
t = time.time()
for i in range(1000):
    test.feed(moredata)

print "%0.3f ms"%(1000*(time.time() - t))

Before it was 3748.012 ms on my PC, afterwards it was 52.737 ms

However that isn't a perfect solution - what if something had another
reference on self.mydata?

You really want a non-immutable string for this use.  array.array
is a possibility

$ python -m timeit -s 'import array' -s 'a = array.array("c")' 'a.extend("x")'
100000 loops, best of 3: 2.01 usec per loop

There are many other possibilities though like the mmap module.
-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list