String concatenation performance with +=
Nick Craig-Wood
nick at craig-wood.com
Sat Feb 14 11:32:00 EST 2009
Sammo <sammo2828 at gmail.com> wrote:
> String concatenation has been optimized since 2.3, so using += should
> be fairly fast.
>
> In my first test, I tried concatentating a 4096 byte string 1000 times
> in the following code, and the result was indeed very fast (12.352 ms
> on my machine).
>
> import time
> t = time.time()
> mydata = ""
> moredata = "A"*4096
> for i in range(1000):
> mydata += moredata # 12.352 ms
> print "%0.3f ms"%(1000*(time.time() - t))
>
> However, I got a different result in my second test, which is
> implemented in a class with a feed() method. This test took 4653.522
> ms on my machine, which is 350x slower than the previous test!
>
> class StringConcatTest:
> def __init__(self):
> self.mydata = ""
>
> def feed(self, moredata):
> self.mydata += moredata # 4653.522 ms
>
> test = StringConcatTest()
> t = time.time()
> for i in range(1000):
> test.feed(moredata)
> print "%0.3f ms"%(1000*(time.time() - t))
>
> Note that I need to do something to mydata INSIDE the loop, so please
> don't tell me to append moredata to a list and then use "".join after
> the loop.
>
> Why is the second test so much slower?
The optimized += depends on their being no other references to the
string. Strings are immutable in python. So append must return a new
string. However the += operation was optimised to do an in-place
append if and only if there are no other references to the string.
You can see this demonstrated here
$ python -m timeit -s 'a="x"' 'a+="x"'
1000000 loops, best of 3: 0.231 usec per loop
$ python -m timeit -s 'a="x"; b=a' 's = a; a+="x"'
100000 loops, best of 3: 30.1 usec per loop
You are keeping the extra reference in a class instance like this
$ python -m timeit -s 'class A(object): pass' -s 'a=A(); a.a="x"' 'a.a+="x"'
100000 loops, best of 3: 30.7 usec per loop
Knowing that, this optimization suggests itself
$ python -m timeit -s 'class A(object): pass' -s 'a=A(); a.a="x"' 's = a.a; a.a = None; s += "x"; a.a = s'
1000000 loops, best of 3: 0.954 usec per loop
Or in your example
class StringConcatTest:
def __init__(self):
self.mydata = ""
def feed(self, moredata):
#self.mydata += moredata
s = self.mydata
del self.mydata
s += moredata
self.mydata = s
moredata = "A"*4096
test = StringConcatTest()
t = time.time()
for i in range(1000):
test.feed(moredata)
print "%0.3f ms"%(1000*(time.time() - t))
Before it was 3748.012 ms on my PC, afterwards it was 52.737 ms
However that isn't a perfect solution - what if something had another
reference on self.mydata?
You really want a non-immutable string for this use. array.array
is a possibility
$ python -m timeit -s 'import array' -s 'a = array.array("c")' 'a.extend("x")'
100000 loops, best of 3: 2.01 usec per loop
There are many other possibilities though like the mmap module.
--
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list
mailing list