Fastest technique for string concatenation

Will Hall wrsh07 at gmail.com
Tue Oct 5 15:39:10 EDT 2010


On Oct 3, 8:19 am, Roy Smith <r... at panix.com> wrote:
> My local news feed seems to have lost the early part of this thread, so
> I'm afraid I don't know who I'm quoting here:
>
> > My understanding is that appending to a list and then joining
> > this list when done is the fastest technique for string
> > concatenation. Is this true?
>
> > The 3 string concatenation techniques I can think of are:
>
> > - append to list, join
> > - string 'addition' (s = s + char)
> > - cStringIO
>
> There is a fourth technique, and that is to avoid concatenation in the
> first place.   One possibility is to use the common append/join pattern
> mentioned above:
>
> vector = []
> while (stuff happens):
>    vector.append(whatever)
> my_string = ''.join(vector)
>
> But, it sometimes (often?) turns out that you don't really need
> my_string.  It may just be a convenient way to pass the data on to the
> next processing step.  If you can arrange your code so the next step can
> take the vector directly, you can avoid creating my_string at all.
>
> For example, if all you're going to do is write the string out to a file
> or network socket, you could user vectored i/o, with something like
> python-writev (http://pypi.python.org/pypi/python-writev/1.1).  If
> you're going to iterate over the string character by character, you
> could write an iterator which does that without the intermediate copy.  
> Something along the lines of:
>
>     def each(self):
>         for s in self.vector:
>             for c in s:
>                 yield c
>
> Depending on the amount of data you're dealing with, this could be a
> significant improvement over doing the join().

Okay. I've never responded to one of these before, so please correct
me if I'm making any large blunders.  I'd just recently read Guido's
Python Patterns -- An Optimization Anecdote, and I was wondering why a
similar method to the one he suggests wouldn't work here?

My suggestion:
def arrayConcat():
    output = array.array('c', source).tostring()

Am I missing something, or will this work?

Thanks,
Will



More information about the Python-list mailing list