PATCH: Speed up direct string concatenation by 20+%!

Carl Friedrich Bolz cfbolz at gmx.de
Fri Sep 29 14:54:59 EDT 2006


Larry Hastings wrote:
[snip]
> The core concept: adding two strings together no longer returns a pure
> "string" object.  Instead, it returns a "string concatenation" object
> which holds references to the two strings but does not actually
> concatenate
> them... yet.  The strings are concatenated only when someone requests
> the
> string's value, at which point it allocates all the space it needs and
> renders the concatenated string all at once.
[snip]

Sounds cool. If I understand it correctly you can write even meaner (and
probably even more useless in practive) benchmarks where you safe some
of the intermediate results of the concatenation, thus also explointing
the better space behaviour:


all = []

s = ""

for i in range(1000):
    s = s + (str(i) + " ") * 1000
    all.append(s)


This should take around 2GB of RAM (so maybe you shouldn't even run it
:-) ) on an unpatched CPython but a lot less with your patch.

<sidenode>
This is exactly the sort of experiment that is extremely easy
to do with PyPy. In fact, some other PyPyers and me wrote a very similar
optimization, which can be compiled in if wanted (or not). The whole
implementation took roughly 100 lines of code, of which 65 are in a
separate file not touching anything else, the rest being minimally
invasive changes. We also implemented a similar optimization for string
slicing (only if the slice has a substantial length and a step of 1).
</sidenode>

Cheers,

Carl Friedrich Bolz




More information about the Python-list mailing list