[Python-Dev] PEP 460 reboot

Nick Coghlan ncoghlan at gmail.com
Tue Jan 14 13:46:25 CET 2014


On 14 January 2014 19:54, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Guido van Rossum writes:
>  > And that is precisely my point. When you're using a format string,
>  > all of the format string (not just the part between { and }) had
>  > better use ASCII or an ASCII superset. And this (rightly)
>  > constrains the output to an ASCII superset as well.
>
> Except that if you interpolate something like Shift JIS, much of the
> ASCII really isn't ASCII.  That's a general issue, of course, if you
> do something that requires iterated format strings, but it's far more
> likely to appear to work most of the time with those encodings.
>
> Of course you can say "if it hurts, don't do that", but ....

Right, that's the danger I was worried about, but the problem is that
there's at least *some* minimum level of ASCII compatibility that
needs to be assumed in order to define an interpolation format at all
(this is the point I originally missed). For printf-style formatting,
it's % along with the various formatting characters and other syntax
(like digits, parentheses, variable names and "."), with the format
method it's braces, brackets, colons, variable names, etc. The
mini-language parser has to assume in encoding in order to interpret
the format string, and that's *all* done assuming an ASCII compatible
format string (which must make life interesting if you try to use an
ASCII incompatible coding cookie for your source code - I'm actually
not sure what the full implications of that *are* for bytes literals
in Python 3).

The one remaining way I could potentially see a formatb method working
is along the lines of what Glenn (I think) suggested: just like struct
definitions, the formatb specifier would have to consist *solely* of
substitution fields. However, that's getting awfully close to being
just an alternate spelling for the struct module or bytes.join at that
point, which hardly makes for a compelling case to add two new methods
to a builtin type.

Given that one of the concepts with the Python 3 transition was to
take certain problematic constructs (like ASCII compatible
interpolation directly to binary without a separate encoding step)
away and decide whether or not we were happy to live without them, I
think this one has proven to have sufficient staying power to finally
bring it back in Python 3.5 (especially given the gain in lowering the
barrier to porting Python 2 code that makes heavy use of interpolation
to ASCII compatible binary formats).

It's certainly a decision that has its downsides, with the potential
impact on users of ASCII incompatible encodings (mostly in Asia) being
the main one, but I think the increased convenience in working with
ASCII compatible binary protocols and file formats is worth the cost.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list