[Python-Dev] [Python-checkins] cpython: Implement PEP 393.

Sat Oct 1 15:26:01 CEST 2011

Am 29.09.2011 01:21, schrieb Eric V. Smith:
> Is there some reason str.format had such major surgery done to it?

Yes: I couldn't figure out how to do it any other way. The formatting
code had a few basic assumptions which now break (unless you keep using
the legacy API). Primarily, the assumption is that there is a notion of
a "STRINGLIB_CHAR" which is the element of a string representation. With
PEP 393, no such type exists anymore - it depends on the individual
object what the element type for the representation is.

In other cases, I worked around that by compiling the stringlib three
times, for Py_UCS1, Py_UCS2, and Py_UCS4. For one, this gives
considerable code bloat, which I didn't like for the formatting code
(as that is already a considerable amount of code). More importantly,
this approach wouldn't have worked well, anyway, since the formatting
combines multiple Unicode objects (especially with the OutputString
buffer), and different inputs may have different representations. On
top of that, OutputString needs widening support, starting out with
a narrow string, and widening step-by-step as input strings are more
wide than the current output (or not, if the input strings are all
ASCII).

It would have been possible to keep the basic structure by doing
all formatting in Py_UCS4. This would cost a significant memory and
runtime overhead.

> In addition, there are outstanding patches that are now broken.

I'm sorry about that. Try applying them to the new files, though - patch
may still be able to figure out how to integrate them, as the
algorithms and function structure hasn't changed.

> I'd prefer it return to how it used to be, and just the minimum changes
> required for PEP 393 be made to it.

Please try for yourself. On string_format.h, I think there is zero
chance, unless you want to compromise and efficiency (in addition to
the already-present compromise on code cleanliness, due the the fact
that the code is more general than it needs to be).

On formatter.h, it may actually be possible to restore what it was - in
particular if you can make a guarantee that all number formatting always
outputs ASCII-strings only (which I'm not so sure about, as the
thousands separator could be any character, in principle). Without that
guarantee, it may indeed be reasonable to compile formatter.h in
Py_UCS4, since the resulting strings will be small, so the overhead is
probably negligible.

Regards,
Martin