PATCH: Speed up direct string concatenation by 20+%!

Larry Hastings larry at hastings.org
Fri Sep 29 12:15:22 EDT 2006


Fredrik Lundh wrote:
> >> what's in "s" when that loop is done?
> > It's equivalent to " 'a' * 10000000 ".  (I shan't post it here.)
> but what *is* it ?  an ordinary PyString object with a flattened buffer,
> or something else ?

At the exact moment that the loop is done, it's a
PyStringConcatenationObject * which points to a deep one-sided tree of
more PyStringConcatenationObject * objects.  Its ob_sval is NULL, which
means that the first time someone asks for its value (via the macro
PyString_AS_STRING()) it will be computed.  When it's computed, the
interpreter will allocate a buffer of 10000001 bytes and walk the tree,
filling the buffer with ten million 'a's followed by a zero.  It'll
then dereference all its children.

The PyStringConcatenationObject struct is a child of PyStringObject,
and external users can ignore the difference as long as they use the
macros in stringobject.h (e.g. using PyString_AS_STRING(), rather than
casting to PyStringObject and using ob_sval directly).

Sorry for misunderstanding the nature of your question the first time,


/larry/




More information about the Python-list mailing list