[issue14744] Use _PyUnicodeWriter API in str.format() internals

STINNER Victor report at bugs.python.org
Thu May 24 11:57:30 CEST 2012


STINNER Victor <victor.stinner at gmail.com> added the comment:

>> For Python 3.3, _PyUnicodeWriter API is faster than the Py_UCS4 buffer API and PyAccu API in quite all cases, with a speedup between 30% and 100%. But there are some cases where the _PyUnicodeWriter API is slower:
>
> Perhaps most of these problems can be solved if instead of the boolean
> flag (overallocate/no overallocate) to use the Py_ssize_t parameter that
> indicates by how much should you overallocate (it is the length of the
> suffix in the format).

There is not only a flag (flags.overallocate): there is also the
min_length, which is used and helps for str%args and str.format(args).

My patch contains a lot of "tricks" to limit overallocation, e.g.
don't overallocate if we are writing the last part of the output.

Computing exactly the size of the buffer gives the best performance
because it avoids a resize in _PyUnicodeWriter_Finish(). I tried for
example to modify PyUnicode_Format() to parse the format string twice:
first to compute the size of the output buffer, second to write
characters. In my experience, parsing the format string twice is more
expensive than reallocating the buffer (PyUnicode_READ is expensive),
especially on short and simple format strings.

I tried different methods to allocate the buffer of _PyUnicodeWriter:
change the overallocation factor (0%, 25%, 50%, 100%), only
overallocate +100 characters, etc. But I failed to find something
better than the proposed patch.

At least I can say than always disabling overallocation slows done
many cases: when there is a suffix after an argument, or when there
are more than one argument.

Feel free to experiment other methods to estimate the size of the output buffer.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14744>
_______________________________________


More information about the Python-bugs-list mailing list