[Python-Dev] PEP 461 - Adding % and {} formatting to bytes

Wed Jan 15 17:50:33 CET 2014

On Wed, Jan 15, 2014 at 10:52 AM, Eric V. Smith <eric at trueblade.com> wrote:

> On 1/15/2014 9:45 AM, Brett Cannon wrote:
>
> > That's too vague; % interpolation does not support other format
> > operators in the same way as str.format() does. % interpolation has
> > specific code to support %d, etc. But str.format() gets supported for
> > {:d} not from special code but because e.g. float.__format__('d') works.
> > So you can't say "bytes.format() supports {:d} just like %d works with
> > string interpolation" since the mechanisms are fundamentally different.
> >
> > This is why I have argued that if you specify it as "if there is a
> > format spec specified, then the return value from calling __format__()
> > will have str.decode('ascii', 'strict') called on it" you get the
> > support for the various number-specific format specs for free. It also
> > means if you pass in a string that you just want the strict ASCII bytes
> > of then you can get it with {:s}.
> >
> > I also think that a 'b' conversion be added to bytes.format(). This
> > doesn't have the same issue as %b if you make {} implicitly mean {!b} in
> > Python 3.5 as {} will mean what is the most accurate for bytes.format()
> > in either version. It also allows for explicit support where you know
> > you only want a byte and allows {!s} to mean you only want a string (and
> > thus throw an error otherwise).
> >
> > And all of this means that much like %s only taking bytes, the only way
> > for bytes.format() to accept a non-byte argument is for some format spec
> > to be specified to trigger the .encode('ascii', 'strict') call.
>
> Agreed. With %-formatting, you can start with the format strings and
> then decide what you want to do with the passed in objects. But with
> .format, it's the other way around: you have to look at the passed in
> objects being formatted, and then decide what the format specifier means
> to that type.
>
> So, for .format, you could say "hey, that object's an int, and I happen
> to know how to format ints, outside of calling it's .__format__". Or you
> could even call its __format__ because you know that it will only be
> ASCII. But to take this approach, you're going to have to hard-code the
> types. And subclasses are probably out, since there you don't know what
> the subclass's __format__ will return. It could be non-ASCII.
>
> >>> class Int(int):
> ...   def __format__(self, fmt):
> ...     return u'foo'
> ...
> >>> '{}'.format(Int(3))
> 'foo'
>
> So basically I think we'll have to hard-code the types that .format()
> will support, and never call __format__, or only call __format__ if we
> know that it's a exact type where we know that __format__ will return
> (strict ASCII).
>
> Either that, or we're back to encoding the result of __format__ and
> accepting that sometimes it might throw errors, depending on the values
> being passed into format().
>

I say accept that an error might get thrown as there is precedent of
specifying a format spec that an object's __format__() method doesn't
recognize::

  >>> '{:s}'.format(1)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ValueError: Unknown format code 's' for object of type 'int'

IOW I'm actively trying to avoid type-restricting the semantics for
bytes.format() for a consistent, clear mental model. Remembering that "any
format spec leads to calling .encode('ascii', 'strict') on the result" is
simple compared to "ASCII bytes will be returned for ints and floats when
passed in, otherwise all other types follow these rules".

As the zen says:

  Errors should never pass silently.
  Special cases aren't special enough to break the rules.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140115/a856608e/attachment-0001.html>