Python speed-up

Jeremy Fincher tweedgeezer at hotmail.com
Thu Sep 23 12:11:38 EDT 2004


Gerrit <gerrit at nl.linux.org> wrote in message news:<mailman.3739.1095881975.5135.python-list at python.org>...
> Why isn't cStringIO faster than concatenating strings?

It is, when you're doing it repetitively.  Using cStringIO or str.join
is O(n) in the number of strings, whereas using += repetitively is
O(n**2) in the number of strings.

> Using python2.4:
> 
> $ python timeit.py -s 'from cStringIO import StringIO; s=StringIO()' "s.write('foo')"
> 1000000 loops, best of 3: 0.555 usec per loop
> 
> $ python timeit.py -s 's=""' "s+='foo'"
> 1000000 loops, best of 3: 0.383 usec per loop
> 
> $ python timeit.py -s 'L=[]' "L.append('foo')"
> 1000000 loops, best of 3: 0.32 usec per loop
> 
> $ python timeit.py -s 'from StringIO import StringIO; s=StringIO()' "s.write('foo')"
> 100000 loops, best of 3: 3.19 usec per loop

Don't let the "100000 loops" confuse you, in this case you're only
appending *one* string, and the significantly lower constant time of
str.__iadd__ (aka +=) shows more than the lower complexity of the
other methods.

> I was about to advise cStringIO as being much faster than concatenating
> strings, but just before I was about to send the e-mail, I spied with my
> little eye that the numbers were swapped - concatenating strings is
> actually *faster* than using cStringIO! What's happening?

Perhaps this will be more elucidating:

$ python ~/timeit.py 's = ""' 'for i in xrange(1):' '  s += "foo"'
100000 loops, best of 3: 4.38 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(1):' '  sio.write("foo")' 's =
sio.getvalue()'
100000 loops, best of 3: 9.77 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(10):' '  s += "foo"'
10000 loops, best of 3: 15.8 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(10):' '  sio.write("foo")' 's =
sio.getvalue()'
10000 loops, best of 3: 36.6 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(100):' '  s += "foo"'
1000 loops, best of 3: 154 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(100):' '  sio.write("foo")' 's =
sio.getvalue()'
1000 loops, best of 3: 264 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(1000):' '  s += "foo"'
100 loops, best of 3: 2.23e+03 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(1000):' '  sio.write("foo")' 's =
sio.getvalue()'
100 loops, best of 3: 2.62e+03 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(10000):' '  s += "foo"'
10 loops, best of 3: 7.14e+05 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(10000):' '  sio.write("foo")' 's =
sio.getvalue()'
10 loops, best of 3: 2.66e+04 usec per loop

Do be sure not to use -s before the 's = ""' argument or before the
'sio = StringIO()' argument, otherwise you'll prove your point well,
but not fairly :)

Jeremy



More information about the Python-list mailing list