[Python-Dev] Usage of += on strings in loops in stdlib

Maciej Fijalkowski fijall at gmail.com
Wed Feb 13 09:12:24 CET 2013


On Wed, Feb 13, 2013 at 10:02 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> I added a _PyUnicodeWriter internal API to optimize str%args and
> str.format(args). It uses a buffer which is overallocated, so it's
> basically like CPython str += str optimization. I still don't know how
> efficient it is on Windows, since realloc() is slow on Windows (at
> least on old Windows versions).
>
> We should add an official and public API to concatenate strings. I
> know that PyPy has already its own API. Example:
>
> writer = UnicodeWriter()
> for item in data:
>     writer += item   # i guess that it's faster than writer.append(item)
> return str(writer) # or writer.getvalue() ?
>
> I don't care of the exact implementation of UnicodeWriter, it just
> have to be as fast or faster than ''.join(data).
>
> I don't remember if _PyUnicodeWriter is faster than StringIO or
> slower. I created an issue for that:
> http://bugs.python.org/issue15612
>
> Victor

it's in __pypy__.builders (StringBuilder and UnicodeBuilder). The API
does not really matter, as long as there is a way to preallocate
certain size (which I don't think there is in StringIO for example).
bytearray comes close but has a relatively inconvinient API and any
pure-python bytearray wrapper will not be fast on CPython.


More information about the Python-Dev mailing list