[Python-Dev] PEP 461 updates

Thu Jan 16 11:11:04 CET 2014

On 16 Jan 2014 11:45, "Carl Meyer" <carl at oddbird.net> wrote:
>
> Hi Ethan,
>
> I haven't chimed into this discussion, but the direction it's headed
> recently seems right to me. Thanks for putting together a PEP. Some
> comments on it:
>
> On 01/15/2014 05:13 PM, Ethan Furman wrote:
> > ============================
> > Abstract
> > ========
> >
> > This PEP proposes adding the % and {} formatting operations from str to
> > bytes [1].
>
> I think the PEP could really use a rationale section summarizing _why_
> these formatting operations are being added to bytes; namely that they
> are useful when working with various ASCIIish-but-not-properly-text
> network protocols and file formats, and in particular when porting code
> dealing with such formats/protocols from Python 2.
>
> Also I think it would be useful to have a section summarizing the
> primary objections that have been raised, and why those objections have
> been overruled (presuming the PEP is accepted). For instance: the main
> objection, AIUI, has been that the bytes type is for pure bytes-handling
> with no assumptions about encoding, and thus we should not add features
> to it that assume ASCIIness, and that may be attractive nuisances for
> people writing bytes-handling code that should not assume ASCIIness but
> will once they use the feature.

Close, but not quite - the concern was that this was a feature that didn't
*inherently* imply a restriction to ASCII compatible data, but only did so
when the numeric formatting codes were used. This made it a source of value
dependent compatibility errors based on the format string, akin to the kind
of value dependent errors seen when implicitly encoding arbitrary text as
ASCII.

Guido's successful counter was to point out that the parsing of the format
string itself assumes ASCII compatible data, thus placing at least the
mod-formatting operation in the same category as the currently existing
valid-for-sufficiently-ASCII-compatible-data only operations.

Current discussions suggest to me that the argument against implicit
encoding operations that introduce latent data driven defects may still
apply to bytes.format though, so I've reverted to being -1 on that.

Cheers,
Nick.

>And the refutation: that the bytes type
> already provides some operations that assume ASCIIness, and these new
> formatting features are no more of an attractive nuisance than those;
> since the syntax of the formatting mini-languages themselves itself
> assumes ASCIIness, there is not likely to be any temptation to use it
> with binary data that cannot.
>
> Although it can be hard to arrive at accurate and agreed-on summaries of
> the discussion, recording such summaries in the PEP is important; it may
> help save our future selves and colleagues from having to revisit all
> these same discussions and megathreads.
>
> > Overriding Principles
> > =====================
> >
> > In order to avoid the problems of auto-conversion and value-generated
> > exceptions,
> > all object checking will be done via isinstance, not by values contained
> > in a
> > Unicode representation.  In other words::
> >
> >   - duck-typing to allow/reject entry into a byte-stream
> >   - no value generated errors
>
> This seems self-contradictory; "isinstance" is type-checking, which is
> the opposite of duck-typing. A duck-typing implementation would not use
> isinstance, it would call / check for the existence of a certain magic
> method instead.
>
> I think it might also be good to expand (very) slightly on what "the
> problems of auto-conversion and value-generated exceptions" are; that
> is, that the benefit of Python 3's model is that encoding is explicit,
> not implicit, making it harder to unwittingly write code that works as
> long as all data is ASCII, but fails as soon as someone feeds in
> non-ASCII text data.
>
> Not everyone who reads this PEP will be steeped in years of discussion
> about the relative merits of the Python 2 vs 3 models; it doesn't hurt
> to spell out a few assumptions.
>
>
> > Proposed semantics for bytes formatting
> > =======================================
> >
> > %-interpolation
> > ---------------
> >
> > All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
> > will be supported, and will work as they do for str, including the
> > padding, justification and other related modifiers, except locale.
> >
> > Example::
> >
> >    >>> b'%4x' % 10
> >    b'   a'
> >
> > %c will insert a single byte, either from an int in range(256), or from
> > a bytes argument of length 1.
> >
> > Example:
> >
> >     >>> b'%c' % 48
> >     b'0'
> >
> >     >>> b'%c' % b'a'
> >     b'a'
> >
> > %s is restricted in what it will accept::
> >
> >   - input type supports Py_buffer?
> >     use it to collect the necessary bytes
> >
> >   - input type is something else?
> >     use its __bytes__ method; if there isn't one, raise an exception [2]
> >
> > Examples:
> >
> >     >>> b'%s' % b'abc'
> >     b'abc'
> >
> >     >>> b'%s' % 3.14
> >     Traceback (most recent call last):
> >     ...
> >     TypeError: 3.14 has no __bytes__ method
> >
> >     >>> b'%s' % 'hello world!'
> >     Traceback (most recent call last):
> >     ...
> >     TypeError: 'hello world' has no __bytes__ method, perhaps you need
> > to encode it?
> >
> > .. note::
> >
> >    Because the str type does not have a __bytes__ method, attempts to
> >    directly use 'a string' as a bytes interpolation value will raise an
> >    exception.  To use 'string' values, they must be encoded or otherwise
> >    transformed into a bytes sequence::
> >
> >       'a string'.encode('latin-1')
> >
> > format
> > ------
> >
> > The format mini language codes, where they correspond with the
> > %-interpolation codes,
> > will be used as-is, with three exceptions::
> >
> >   - !s is not supported, as {} can mean the default for both str and
> > bytes, in both
> >     Py2 and Py3.
> >   - !b is supported, and new Py3k code can use it to be explicit.
> >   - no other __format__ method will be called.
> >
> > Numeric Format Codes
> > --------------------
> >
> > To properly handle int and float subclasses, int(), index(), and float()
> > will be called on the
> > objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G).
> >
> > Unsupported codes
> > -----------------
> >
> > %r (which calls __repr__), and %a (which calls ascii() on __repr__) are
> > not supported.
> >
> > !r and !a are not supported.
> >
> > The n integer and float format code is not supported.
> >
> >
> > Open Questions
> > ==============
> >
> > Currently non-numeric objects go through::
> >
> >   - Py_buffer
> >   - __bytes__
> >   - failure
> >
> > Do we want to add a __format_bytes__ method in there?
> >
> >   - Guaranteed to produce only ascii (as in b'10', not b'\x0a')
> >   - Makes more sense than using __bytes__ to produce ascii output
> >   - What if an object has both __bytes__ and __format_bytes__?
> >
> > Do we need to support all the numeric format codes?  The floating point
> > exponential formats seem less appropriate, for example.
> >
> >
> > Proposed variations
> > ===================
> >
> > It was suggested to let %s accept numbers, but since numbers have their
own
> > format codes this idea was discarded.
> >
> > It has been suggested to use %b for bytes instead of %s.
> >
> >   - Rejected as %b does not exist in Python 2.x %-interpolation, which
is
> >     why we are using %s.
> >
> > It has been proposed to automatically use .encode('ascii','strict') for
str
> > arguments to %s.
> >
> >   - Rejected as this would lead to intermittent failures.  Better to
> > have the
> >     operation always fail so the trouble-spot can be correctly fixed.
> >
> > It has been proposed to have %s return the ascii-encoded repr when the
> > value
> > is a str  (b'%s' % 'abc'  --> b"'abc'").
> >
> >   - Rejected as this would lead to hard to debug failures far from the
> > problem
> >     site.  Better to have the operation always fail so the trouble-spot
> > can be
> >     easily fixed.
> >
> >
> > Footnotes
> > =========
> >
> > .. [1] string.Template is not under consideration.
> > .. [2] TypeError, ValueError, or UnicodeEncodeError?
>
> TypeError seems right to me. Definitely not UnicodeEncodeError - refusal
> to implicitly encode is not at all the same thing as an encoding error.
>
> Carl
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140116/917474e0/attachment-0001.html>