[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

Sun Feb 23 22:25:45 CET 2014

On 02/23/2014 03:30 AM, Victor Stinner wrote:
>
> First, this is a warning in reST syntax:
>
> System Message: WARNING/2 (pep-0461.txt, line 53)

Yup, fixed that.

>> This area of programming is characterized by a mixture of binary data and
>> ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back a
>> restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
>> writing new wire format code, and in porting Python 2 wire format code.
>
> You may give some examples here: HTTP (Latin1 headers, binary body),
> SMTP, FTP, etc.
>
>> All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
>> ``%g``, etc.) will be supported, and will work as they do for str, including
>> the padding, justification and other related modifiers.
>
> IMO you should give the exhaustive list here and we should only
> support one formatter for integers: %d. Python 2 supports "%d", "%u"
> and "%i" with "%u" marked as obsolete. Python 3.5 should not
> reintroduce obsolete formatters. If you want to use the same code base
> for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same
> rule apply for 2to3 tool: modify your source code to be compatible
> with Python 3.

A link is provided to the exhaustive list.  Including it verbatim here detracts from the overall readablity.

I agree that having only one decimal format code would be nice, or even two if the second one did something different, 
and that three seems completely over the top -- unfortunately, Python 3.4 still supports all three (%d, %i, and %u). 
Not supporting two of them would just lead to frustration.  There is also no reason to exclude %o nor %x and making the 
programmer reach for oct() and hex().  We're trying to simplify %-interpolation, not garner exclamations of "What were 
they thinking?!?"  ;)

>> ``%s`` is restricted in what it will accept::
>>
>>    - input type supports ``Py_buffer`` [6]_?
>>      use it to collect the necessary bytes
>>
>>    - input type is something else?
>>      use its ``__bytes__`` method [7]_ ; if there isn't one, raise a
>> ``TypeError``
>
> Hum, you may mention that bytes(n: int) creates a bytes string of n
> null bytes, but b'%s' % 123 will raise an error because
> int.__bytes__() is not defined. Just to be more explicit.

I added a line stating that %s does not accept numbers, but I'm not sure how bytes(n: int) is relevant?

>> ``%a`` will call :func:``ascii()`` on the interpolated value's
>> :func:``repr()``.
>> This is intended as a debugging aid, rather than something that should be
>> used
>> in production.  Non-ascii values will be encoded to either ``\xnn`` or
>> ``\unnnn``
>> representation.
>
> (You forgot "/Uhhhhhhhh" representation (it's an antislah, but I don't
> see the key on my Mac keyboard?).)

Hard to forget what you don't know.  ;)  Will ascii() ever emit an antislash representation?

> What is the use case of this *new* formatter? How do you use it?

An aid to debugging -- need to see what's what at that moment?  Toss it into %a.  It is not intended for production 
code, but is included to hopefully circumvent the inappropriate use of __bytes__ methods on classes.

> print(b'%a" % 123) may emit a BytesWarning and may lead to bugs.

Why would it emit a BytesWarning?

> I would like to help you to implement the PEP. IMO we should share as
> much code as possible with PyUnicodeObject. Something using the
> stringlib and maybe a new PyBytesWriter API which would have an API
> close to PyUnicodeWriter API. We should also try to share code between
> PyBytes_Format() and PyBytes_FromFormat().

Thanks.  I'll holler when I get that far.  :)

--
~Ethan~