Efficient string concatenation methods

Bengt Richter bokr at oz.net
Mon May 3 00:06:30 EDT 2004


On 2 May 2004 18:52:27 -0700, python at rcn.com (Raymond Hettinger) wrote:
[...]
>Nice work.  Beautifully presented.
>
>A few thoughts:
>
>* The relative speed of list.append() and list comprehensions versus
>other approaches will improve dramatically in Py2.4.  IOW, the
>rankings are version specific.  Also, append works faster if you
>precompute the attribute lookup before the loop:  build =
>str_list.append.
>
>* The underlying issue has little to do with the immutability of
>strings.  The underlying C code is fearless and would mutate the heck
>out of it if that is what it took without being visible through the
>API.  The real issue is how much space to allocate for the final
>string.  When iterating, as you do, there is little forewarning about
>how many more entries to expect or the length of each.  If too little
>space is allocated, then room for a larger string needs to be created
>and the previous result recopied (giving the O(n**2) behavior you're
>trying to get rid of).
Perhaps a capacity keyword arg for the str constructor?, E.g.,
s = str(capacity=1000)
s => ''
s += 'abc' # just mutates the capacious object if ref count is one.
...etc

>
>* For arrays, the advantage of mutability does not play out through
>the fromstring() method (which suffers the previously mentioned
>allocation issues).  The way to capitalize is to make a long array all
>at once and then make slice assignments into it to add the strings. 
>This is essentially what str.join() does under the hood (it loops over
>the input once and adds up the string lengths, allocates all necessary
>space, and re-loops making slice assignments) -- this is how it gets
>its O(n) behavior.  For arrays, I suspect the overhead of tracking the
>slice pointer would overwhelm the benefits for small values of n.  An
>interesting comparison is to time "ans=[]; build=ans.append; for i in
>xrange(n): build(i)"  versus "ans=[None]*n: for i in xrange(n):
>ans[i]=i".
>
ans = list(capacity=whatever) # ?

Similarly for arrays.

Regards,
Bengt Richter



More information about the Python-list mailing list