[Python-Dev] PEP 461 updates

Fri Jan 17 06:46:15 CET 2014

On 17 January 2014 11:51, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 01/16/2014 05:32 PM, Greg wrote:
>>
>>
>> I don't think it matters whether the internal details of that
>> debate make sense to the rest of us. The main thing is that
>> a consensus seems to have been reached on bytes formatting
>> being basically a good thing.
>
>
> And a good thing, too, on both counts!  :)
>
> A few folks have suggested not implementing .format() on bytes;  I've been
> resistant, but then I remembered that format is also a function.
>
> http://docs.python.org/3/library/functions.html?highlight=ascii#format
> ======================================================================
> format(value[, format_spec])
>
>     Convert a value to a “formatted” representation, as controlled by
> format_spec. The interpretation of format_spec will depend on the type of
> the value argument, however there is a standard formatting syntax that is
> used by most built-in types: Format Specification Mini-Language.
>
>     The default format_spec is an empty string which usually gives the same
> effect as calling str(value).
>
>     A call to format(value, format_spec) is translated to
> type(value).__format__(format_spec) which bypasses the instance dictionary
> when searching for the value’s __format__() method. A TypeError exception is
> raised if the method is not found or if either the format_spec or the return
> value are not strings.
> ======================================================================
>
> Given that, I can relent on .format and just go with .__mod__ .  A low-level
> service for a low-level protocol, what?  ;)

Exactly - while I'm a fan of the new extensible formatting system and
strongly prefer it to printf-style formatting for text, it also has a
whole lot of complexity that is hard to translate to the binary
domain, including the format() builtin and __format__ methods.

Since the relevant use cases appear to be already covered adequately
by prinft-style formatting, attempting to translate the flexible text
formatting system as well just becomes additional complexity we don't
need.

I like Stephen Turnbull's suggestion of using "binary formats with
ASCII segments" to distinguish the kind of formats we're talking about
from ASCII compatible text encodings, and I think Python 3.5 will end
up with a suite of solutions that suitably covers all use cases, just
by bringing back printf-style formatting directly to bytes:

* format(), str.format(), str.format_map(): a rich extensible text
formatting system, including date interpolation support
* str.__mod__: retained primarily for backwards compatibility, may
occasionally be used as a text formatting optimisation tool (since the
inflexibility means it will likely always be marginally faster than
the rich formatting system for the cases that it covers)
* bytes.__mod__, bytearray.__mod__: restored in Python 3.5 to simplify
production of data in variable length binary formats that contain
ASCII segments
* the struct module: rich (but not extensible) formatting system for
fixed length binary formats

In Python 2, the binary format with ASCII segments use case was
intermingled with general purpose text formatting on the str type,
which is I think the main reason it has taken us so long to convince
ourselves it is something that is genuinely worth bringing back in a
more limited form in Python 3, rather than just being something we
wanted back because we were used to having it in Python 2.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia