[Python-Dev] PEP 460 reboot

Mon Jan 13 19:40:16 CET 2014

On Mon, Jan 13, 2014 at 12:31 PM, Antoine Pitrou <solipsis at pitrou.net>wrote:

> On Mon, 13 Jan 2014 08:36:05 -0800
> Ethan Furman <ethan at stoneleaf.us> wrote:
>
> > On 01/13/2014 08:09 AM, Antoine Pitrou wrote:
> > > On Mon, 13 Jan 2014 07:59:10 -0800
> > > Guido van Rossum <guido at python.org> wrote:
> > >> On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> > >>> What is the use case for embedding a quoted ASCII-encoded
> representation
> > >>> in a byte stream?
> > >>
> > >> It doesn't crash but produces undesired output (always, not only when
> > >> the data is non-ASCII) that gives the developer a hint to think about
> > >> encoding to bytes.
> > >
> > > But why is it better to give a hint by producing undesired output
> (which
> > > may actually go unnoticed for some time and produce issues down the
> > > road), rather than simply by raising TypeError?
> >
> > You mean crash all the time?  I'd be fine with that for both the str case
> > and the bytes case.  But's probably too late
> > to change the str case, and the bytes case should mirror what str does.
>
> Let me add something else: str and bytes don't have to be symmetrical.
> In Python 2, str and unicode were symmetrical, they allowed exactly the
> same operations and were composable.
> In Python 3, str and bytes are different beasts; they have different
> operations *and* different semantics (for example, bytes interoperates
> with bytearray and memoryview, while str doesn't).
>

This is also why the int type doesn't have a __bytes__ method (ignoring the
use of an integer to bytes()): it's universally defined what str(10) should
return, but who know what you want when you would want the bytes of 10
(e.g. base-2, ASCII, UTF-16, etc.).

>
> So bytes formatting really needn't (and shouldn't, IMO) mirror str
> formatting.
>

I think one of the things about Guido's proposal that bugs me is that it
breaks the mental model of the .format() method from str in terms of how
the mini-language works. For str.format() you have the conversion and the
format spec (e.g. "{!r}" and "{:d}", respectively). You apply the
conversion by calling the appropriate built-in, e.g. 'r' calls repr(). The
format spec semantically gets passed with the object to format() which
calls the object's __format__() method: ``format(number, 'd')``.

Now Guido's suggestion has two parts that affect the mini-language for
.format(). One is that for bytes.format() the default conversion is bytes()
instead of str(), which is fine (probably want to add 'b' as a conversion
value as well to be consistent). But the other bit is that the format spec
goes from semantically meaning ``format(thing, format_spec)`` to
``format(thing, format_spec).encode('ascii', 'strict')`` for at least
numbers. That implicitness bugs me as I have always thought of format specs
just leading to a call to format(). I think I can live with it, though, as
long as it is **consistently** applied across the board for bytes.format();
every use of a format spec leads to calling ``format(thing,
format_spec).encode('ascii', 'strict')`` no matter what type 'thing' would
be and it is clearly documented that this is done to ease porting and
handle the common case then I can live with it.

This even gives people in-place ASCII encoding for strings by always using
'{:s}' with text which they can do when they port their code to run under
both Python 2 and 3. So you should be able to do ``b'Content-Type:
{:s}'.format('image/jpeg')`` and have it give ASCII. If you want more
explicit encoding to latin-1 then you need to do it explicitly and not rely
on the mini-language to do tricks for you.

IOW I want to treat the format mini-language as a language and thus not
have any special-casing or massive shifts in meaning between str.format()
and bytes.format() so my mental model doesn't have to contort based on
whether it's str or bytes. My preference is not have any, but if Guido is
going say PBP here then I want absolute consistency across the board in how
bytes.format() tweaks things.

As for %s for the % operator calling ascii(), I think that will be a
porting nightmare of finding out why your bytes suddenly stopped being
formatted properly and then having to crawl through all of your code for
that one use of %s which is getting bytes in. By raising a TypeError you
will very easily detect where your screw-up occurred thanks to the
traceback; do so otherwise feels too much like implicit type conversion and
ask any JavaScript developer how that can be a bad thing.

-Brett

>
> (the only reason I used "%s" in PEP 460 is to allow a migration path
> from 2.x bytes-formatting to 3.x bytes-formatting; in a really "pure"
> proposal it would have been called something else)
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140113/6616f399/attachment.html>