Speeding up: s += "string"

Alex Martelli aleax at aleax.it
Tue Apr 15 04:28:36 EDT 2003


Lulu of the Lotus-Eaters wrote:

> The topic of the O() problems of the "s += 'string'" operation has been
> discussed in several recent threads.  The aspect that seems not to have
> really come up is "why not FIX it?"
> 
> This particular slowness is not merely academic.  It's bitten me.  In
> one of his posts, I think the Martellibot even said it bit him (in the
> mists of past time). 

Yes (not in Python, the first time, but yes).

> Probably most Python programmers fell into this
> trap at least once.  I got past it, and now usually use the
> "lst.append(string); "".join(lst)" solution.  But the idea of making a
> string longer by... well, adding more to it... seems really obvious.
> Almost the "one obvious way to do it."
> 
> It seems to me that the behavior CAN be improved.  If strings were to
> use a preallocation strategy--similar to what lists do--MANY string

...then such strings would use more memory than they need to.

> appends could avoid a memory copy on the original string.  The intial
> value could stay where it was, and the extra characters could live in
> the already-available contiguous space.  Obviously, the utilized length
> would need to be stored somewhere.  Once in a while, of course, the
> preallocation would fill up; in that case you'd need to do a costly
> strcopy()... but this could be avoided MOST of the time, ISTM.

If you want s1+=s2 to be amortized O(1) for very long s1 and short s2,
you need s1's spare capacity to increase _exponentially_ when it does
need to increase.  Can be quite costly in terms of memory.


> I quite confess to not *really* understanding optimization issues.
> Doing the preallocation right could be complicated.  It's not like you
> want to preallocate a megabyte for every one-char string.  I suppose
> you'd want to determine how often a particular string already underwent
> appends, and what size strings were typically appended.  And since
> strings are immutable, it's not the self-same *objects* that underwent
> the change, but the underlying contiguous memory areas.  Still... it
> doesn't seem undoable.

Once you did it, it would be nearly impossible to retract it -- thus,
it would likely NOT be "undoable".  Which isn't what you MEAN, I think;
you probably mean it's not "unfeasible", or something like that:-).

If I wanted to pursue this kind of things, I think I'd try putting
a Boost Python wrapper on top of STL's "rope" implementation of
strings (note: STL and the Standard C++ Library are DIFFERENT things).

This would give mutable strings that allow, not just s1+=s2, but all 
sort of modifications, with very different performance characteristics
from what one normally thinks of.  Measuring performance of text
processing with those vs normal Python strings might be interesting...


Alex





More information about the Python-list mailing list