Speed of string += string

Carl Banks imbosol-1050185957 at aerojockey.com
Sat Apr 12 18:35:37 EDT 2003


Paul Moore wrote:
> Mads Orbesen Troest <mads at troest.NEVERMORE.dk> writes:
> 
>> Given that strings are immutable, is it exceedingly slow (and memory 
>> spiking) to do a lot of "string += string" operations in one's code? IOW, 
>> does each and every += evaluation lead to a new concatenated copy of the 
>> previous string (now freed for garbage collection) and the new string, so 
>> that the longer the string to which stuff is appended is, the longer times 
>> each += operation takes?
> 
> I don't know the underlying details too well (and I'm not sure they
> are all that important) but yes, string += string does construct a new
> string each time, and hence is relatively expensive.
> 
> The normal idiom for building a string in chunks is to create a list
> of the chunks, and then join them once at the end. So you do:
> 
>    chunks = []
>    while whatever():
>        chunks.append(next_bit_of_string())
>    result = ''.join(chunks)
> 
> You could also use StringIO. However, all this does is basically what
> I describe above, plus extra complexity to handle all the methods you
> don't need. So you're better sticking with the above idiom.


According to my benchmark, cStringIO is a bit faster than joining a
list of strings when appending one character at a time (as opposed to,
say, one line at a time, where it appears to be slightly slower).
Also, it certainly uses less memory than a list of strings, so I'd say
sometimes it is better to use cStringIO.

Surprisingly, += doesn't appear to be a major time waster for small
(meaning about 10000 bytes) strings.  I'm guessing it's because the
malloc my system (Debian Linux) is optimized for small memory
allocations.

    from cStringIO import StringIO
    from time import time
    import sys
    
    def benchmark(n,m):
        c = 'c'*m
        
        start = time()
        s = ''        
        for i in xrange(n):
                s += c
        end = time()
        print "+=:         %.3f seconds" % (end-start)

        start = time()
        l = []
        for i in xrange(n):
                l.append(c)
        end = time()
        s = ''.join(l)
        print "list:       %.3f seconds" % (end-start)

        start = time()
        f = StringIO()
        for i in xrange(n):
                f.write(c)
        s = f.getvalue()
        end = time()
        print "cStringIO:  %.3f seconds" % (end-start)
    
    benchmark(int(sys.argv[1]),int(sys.argv[2]))


On my system, with n=10000, m=1:
+=:         0.041 seconds
list:       0.023 seconds
cStringIO:  0.019 seconds

With n=100000, m=1:
+=:         8.002 seconds
list:       0.239 seconds
cStringIO:  0.187 seconds

With n=10000, m=40:
+=:         19.681 seconds
list:       0.022 seconds
cStringIO:  0.026 seconds

I'm running Python 2.1.3 on Linux.


-- 
CARL BANKS






More information about the Python-list mailing list