[Python-Dev] PEP 460 reboot

R. David Murray rdmurray at bitdance.com
Mon Jan 13 18:42:36 CET 2014


On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 12 Jan 2014 18:11:47 -0800
> Guido van Rossum <guido at python.org> wrote:
> > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> > > On 01/12/2014 04:47 PM, Guido van Rossum wrote:
> > >> %s seems the trickiest: I think with a bytes argument it should just
> > >> insert those bytes (and the padding modifiers should work too), and
> > >> for other types it should probably work like %a, so that it works as
> > >> expected for numeric values, and with a string argument it will return
> > >> the ascii()-variant of its repr(). Examples:
> > >>
> > >> b'%s' % 42 == b'42'
> > >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x'
> > >> enclosed in single quotes)
> > >
> > > I'm not sure about the quotes.  Would anyone ever actually want those in the
> > > byte stream?
> > 
> > Perhaps not, but it's a hint that you should probably think about an
> > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of
> > it as payback time. :-)
> 
> What is the use case for embedding a quoted ASCII-encoded representation
> in a byte stream?

There is no use case in the sense you are asking, just like there is no
real use case for '%s' % b'x' producing "b'x'".  But the real use case
is exactly the same: to let you know your code is screwed up without
actually blowing up with a encoding Exception.

For the record, I like Guido's logic and proposal.  I don't understand
Nick's objection, since I don't see the difference between the situation
here where a string gets interpolated into bytes as 'xxx' and the
corresponding situation where bytes gets interpolated into a string
as b'xxx'.  Why struggle to keep bytes interpolation "pure" if string
interpolation isn't?

Guido's proposal makes the language more symmetric, and thus more
consistent and less surprising.  Exactly the hallmarks of Python's design
sense, IMO.  (Big surprise, right? :)

Of course, this point of view *is* based on the idea that when you are
doing interpolation using %/.format, you are in fact primarily concerned
with ASCII compatible byte streams.  This is a Practicality sort of
argument.  It is, after all, by far the most common use case when
doing interpolation[*].

If you wanted to do a purist version of this symmetry, you'd have bytes(x)
calling __bytes__ if it was defined and falling back to calling a
__brepr__ otherwise.

But what would __brepr__ implement?  The variety of format codes in
the struct module argues that there is no "one obvious" binary
repr for most types.  (Those that have one would implement __bytes__).
And what would be the __brepr__ of an arbitrary 'object'?

Faced with the impracticality of defining __brepr__ usefully in any "pure
bytes" form, it seems sensible to admit that the most useful __brepr__
is the ascii() encoding of the __repr__.  Which naturally produces 'xxx'
as the __brepr__ of a string.

This does cause things to get a little un-pretty when you are operating
at the python prompt:

    >>> b'%s' % object
    b'"<class \\\'object\\\'>"'

But then again that is most likely really not what you mean to do, so
it becomes a big red flag...just like b'xxx' is a small red flag when
you accidentally interpolate unencoded bytes into a string.

--David

PS: When I first read Guido's remark that the result of interpolating a
string should be 'xxx', I went Wah?  I had to reason my way through to
it as above, but to him it was just the natural answer.  Guido isn't
always right, but this kind of automatic language design consistency
is one reason he's the BDFL.

[*] I still think that you mostly want to design your library so that
you are handling the text parts as text and the bytes parts as bytes,
and encoding/gluing them as appropriate at the IO boundary.  But if Guido
says his real code would benefit by being able to interpolate ASCII into
bytes at certain points, I'll believe him.


More information about the Python-Dev mailing list