String += time trials. (was: Re: Speeding up: s += "string")

Francis Avila francisgavila at yahoo.com
Fri May 9 16:30:18 EDT 2003


"Erik Max Francis" <max at alcyone.com> wrote in message
news:3EBAD05C.64CE7E2F at alcyone.com...
> As Lulu pointed out, the better solution is to build a list -- because
> appending to a list is amortized O(1) -- and then joining the string all
> in one go.

Is it the best solution?
(all tests done on a Pentium 133 (i586), 32mb, Debian 3.0r1, Linux 2.4.18)
(Quick observation: list operations seem to have gotten slower between 2.1
and 2.2, especially extend.)

Unless I'm missing something, it seems that by the time things get slow
enough to use list joining rather than string catenation, you're better off
using cStringIO anyway.

Feature requests, based on this:
*StringIO could be a drop-in replacement for strings, if += were overloaded
to mean append-to-buffer and str(<*StringIO object>) returned <*StringIO
object>.getvalue() instead of repr().  Thus one can start out using strings,
and once the number of catenations is found to be a bottleneck, just change
the first s = '' to s = cStringIO.StringIO().  No other changes would be
necessary.


[ Oh, and for the below, 's/Loop/List/'. I don't know what I was thinking. ]

$ python2.1 stringcattrials.py 1 10 100 1000 10000 100000
----------------------------------------
1 catenations per trial:
String catenation took   0.000119
Loop extension took      0.000170
Loop appending took      0.000144
cStringIO writing took   0.000141
StringIO writing took    0.000390
----------------------------------------
10 catenations per trial:
String catenation took   0.000237
Loop extension took      0.000578
Loop appending took      0.000461
cStringIO writing took   0.000363
StringIO writing took    0.001774
----------------------------------------
100 catenations per trial:
String catenation took   0.001938
Loop extension took      0.004385
Loop appending took      0.003403
cStringIO writing took   0.002447
StringIO writing took    0.015654
----------------------------------------
1000 catenations per trial:
String catenation took   0.021864
Loop extension took      0.043683
Loop appending took      0.032654
cStringIO writing took   0.045036
StringIO writing took    0.156615
----------------------------------------
10000 catenations per trial:
String catenation took   1.469739
Loop extension took      0.455751
Loop appending took      0.332568
cStringIO writing took   0.238696
StringIO writing took    1.577647
----------------------------------------
100000 catenations per trial:
String catenation took   166.015264
Loop extension took      4.530110
Loop appending took      3.125364
cStringIO writing took   2.413575
StringIO writing took    15.582441


$ python2.2 stringcattrials.py 1 10 100 1000 10000 100000
----------------------------------------
1 catenations per trial:
String catenation took   0.000153
Loop extension took      0.000243
Loop appending took      0.000169
cStringIO writing took   0.000190
StringIO writing took    0.000491
----------------------------------------
10 catenations per trial:
String catenation took   0.000275
Loop extension took      0.000909
Loop appending took      0.000459
cStringIO writing took   0.000422
StringIO writing took    0.002277
----------------------------------------
100 catenations per trial:
String catenation took   0.001982
Loop extension took      0.007511
Loop appending took      0.003010
cStringIO writing took   0.002826
StringIO writing took    0.018513
----------------------------------------
1000 catenations per trial:
String catenation took   0.022112
Loop extension took      0.096759
Loop appending took      0.032322
cStringIO writing took   0.025085
StringIO writing took    0.205307
----------------------------------------
10000 catenations per trial:
String catenation took   1.494656
Loop extension took      0.789164
Loop appending took      0.318447
cStringIO writing took   0.259390
StringIO writing took    2.023437
----------------------------------------
100000 catenations per trial:
String catenation took   170.029309
Loop extension took      8.083905
Loop appending took      3.116776
cStringIO writing took   2.746015
StringIO writing took    19.161821


$ cat stringcattrials.py
#! /usr/bin/env python

import os, sys
from time import time
import StringIO, cStringIO
import sys

def trials(numcats=1000):
    """Do the trials, performing numcats catenations per trial"""
    loop = tuple(range(numcats))

    s = ''
    start = time()
    for i in loop:
        s += '0'
    stop = time()

    print "String catenation took \t %3.6f" % (stop-start)


    s = []
    start = time()
    for i in loop:
        s += '0'
    ''.join(s)
    stop = time()

    print "Loop extension took \t %3.6f" % (stop-start)


    s = []
    start = time()
    for i in loop:
        s.append('0')
    ''.join(s)
    stop = time()

    print "Loop appending took \t %3.6f" % (stop-start)


    s = cStringIO.StringIO()
    start = time()
    for i in loop:
        s.write('0')
    s.getvalue()
    stop = time()

    print "cStringIO writing took \t %3.6f" % (stop-start)


    s = []
    start = time()
    for i in loop:
        s += '0'
    ''.join(s)
    stop = time()

    print "Loop extension took \t %3.6f" % (stop-start)


    s = []
    start = time()
    for i in loop:
        s.append('0')
    ''.join(s)
    stop = time()

    print "Loop appending took \t %3.6f" % (stop-start)


    s = cStringIO.StringIO()
    start = time()
    for i in loop:
        s.write('0')
    s.getvalue()
    stop = time()

    print "cStringIO writing took \t %3.6f" % (stop-start)


    s = StringIO.StringIO()
    start = time()
    for i in loop:
        s.write('0')
    s.getvalue()
    stop = time()

    print "StringIO writing took \t %3.6f" % (stop-start)

if __name__ == '__main__':
    nlist = sys.argv[1:]                # list of loop repeat counts
    if nlist:
        for i in nlist:
            print '-'*40
            print i, "catenations per trial:"
            trials(int(i))
    else:
        trials()





More information about the Python-list mailing list