[Python-Dev] PEP 460 reboot

Mon Jan 13 22:54:56 CET 2014

Nick Coghlan wrote:
> By allowing format characters that *do* assume ASCII, the entire
> construct is rendered unsafe - you have to look inside the format
> string to determine if it is assuming ASCII compatibility or not, thus
> the entire construct must be deemed as assuming ASCII compatibility at
> the level of static semantic analysis.

I don't see how any of the currently proposed formatting
operations make a data-dependent ASCII assumption.

When you write b"%d" % x, you're
not assuming that x is ASCII, you're assuming that it's
an *integer*. The %d conversion of an integer is defined
to produce only ASCII characters, and it works on any
integer, so there's no data-dependent assumption there.

Something that *would* involve such an assumption would
be if b"%s" % 'hello' were defined to encode 'hello' as
ASCII. But Guido has proposed not doing that, and instead
interpolating ascii('hello'). Since ascii() is defined to
return only ASCII characters, and works on any string,
there is again no data-dependent assumption.

My preference would be for b"%s" % 'hello' to raise an
exception, but that would still be data-independent.

As for having to look inside the format string to know
what types are expected, that's no different from any
other formatting operation. All it means is that static
type analysis in Python is hard, but we already knew
that.

> Allowing these ASCII assuming format codes in the core bytes
> interpolation introduces *exactly* the same problem as is present in
> the Python 2 text model: code that *appears* to support arbitrary
> binary data, but is in fact assuming ASCII compatibility.

Can you provide an example of code using Guido's
currently approved formatting semantics that would
fail when given arbitrary binary data? I don't see
how it can happen.

-- 
Greg