String concatenation

Peter Hansen peter at engcorp.com
Mon Jun 21 02:02:48 EDT 2004


Jonas Galvez wrote:

> Is it true that joining the string elements of a list is faster than
> concatenating them via the '+' operator?
> 
> "".join(['a', 'b', 'c'])
> 
> vs
> 
> 'a'+'b'+'c'
> 
> If so, can anyone explain why?

It's because the latter one has to build a temporary
string consisting of 'ab' first, then the final string
with 'c' added, while the join can (and probably does) add up
all the lengths of the strings to be joined and build the final
string all in one go.

Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
probably on par with the join technique for both performance
and lack of readability.

Note much more importantly, however, that you should probably
not pick the join approach over the concatenation approach
based on performance.  Concatenation is more readable in the
above case (ignoring the fact that it's a contrived example),
as you're being more explicit about your intentions.

The reason joining lists is popular is because of the
terribly bad performance of += when one is gradually building
up a string in pieces, rather than appending to a list and
then doing join at the end.

So

   l = []
   l.append('a')
   l.append('b')
   l.append('c')
   s = ''.join(l)

is _much_ faster (therefore better) in real-world cases than

   s = ''
   s += 'a'
   s += 'b'
   s += 'c'

With the latter, if you picture longer and many more strings,
and realize that each += causes a new string to be created
consisting of the contents of the two old strings joined together,
steadily growing longer and requiring lots of wasted copying,
you can see why it's very bad on memory and performance.

The list approach doesn't copy the strings at all, but just
holds references to them in a list (which does grow in a
similar but much more efficient manner).  The join figures
out the sizes of all of the strings and allocates enough
space to do only a single copy from each.

Again though, other than the += versus .append() case, you should
probably not pick ''.join() over + since readability will
suffer more than your performance will improve.

-Peter



More information about the Python-list mailing list