faster way for adding many strings

Alex Martelli aleax at aleax.it
Tue Sep 17 06:05:21 EDT 2002


Ulli Stein wrote:

> What is the faster way: appending many strings to a list and then joining
> them, or writing to
> StringIO and then getvalue()?

Best ways to answer such questions is to try!

For example, put in timit.py:

import time, cStringIO, array

manystrings = [str(x) for x in xrange(10000)]
repeater = [None] * 100


def directly(manystrings):
    return ''.join(manystrings)

def withappend(manystrings):
    alist = []
    for x in manystrings: alist.append(x)
    return ''.join(manystrings)

def withwritelines(manystrings):
    aux = cStringIO.StringIO()
    aux.writelines(manystrings)
    return aux.getvalue()

def withwrite(manystrings):
    aux = cStringIO.StringIO()
    for x in manystrings: aux.write(x)
    return aux.getvalue()

def witharray(manystrings):
    aux = array.array('c')
    for x in manystrings:
        aux += array.array('c', x)
    return aux.tostring()

def timit(func):
    start = time.clock()
    for x in repeater:
        func(manystrings)
    stend = time.clock()
    return '%.2f %s' % (stend-start, func.__name__)

for func in directly, withappend, withwritelines, withwrite, witharray:
    print timit(func)


Now run it:

[alex at lancelot examples]$ python -O timit.py
0.21 directly
1.43 withappend
0.26 withwritelines
1.11 withwrite
19.51 witharray
[alex at lancelot examples]$ python -O timit.py
0.22 directly
1.48 withappend
0.22 withwritelines
1.15 withwrite
19.67 witharray

>From this we see that, on my machine, if I have the strings
to join already, joining them directly is a bit faster; but
if I have to append them as they come, then cStringIO is a
bit faster -- nothing much to it one way or another, but may
perhaps be meaningful in a hot-spot of a program.  The array
solution, as coded above, is clearly out of the fray (but
might be interesting if you had some estimation of total
size and could size the result array directly, rather than
have to use += -- I'll leave that to you).

Having a way to measure means you can try directly on the
platforms and python versions that matter to you.  Such timing
issues may often change between releases and/or platforms,
after all!


Alex




More information about the Python-list mailing list