[Python-ideas] Fixing the Python 3 bytes constructor

Fri Mar 28 12:59:33 CET 2014

On 28 March 2014 21:22, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 28 Mar 2014 20:27:33 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> One of the current annoyances with the bytes type in Python 3 is the
>> way the constructor handles integers:
>>
>> >>> bytes(3)
>> b'\x00\x00\x00'
>>
>> It would be far more consistent with the behaviour of other bytes
>> interfaces if the result of that call was instead b'\x03'.
>
> Which other bytes interfaces are you talking about?

The ones where a length 1 bytes object and the corresponding integer
are interchangeable. For example, containment testing:

>>> b"a" in b"abc"
True
>>> 97 in b"abc"
True

Compare:

>>> len(bytes(b"a"))
1
>>> len(bytes(97))
97

That's the inconsistency that elevates the current constructor
behaviour from weird to actively wrong for me - it doesn't match the
way other bytes interfaces have evolved over the course of the Python
3 series.

>> However, during a conversation today, a possible solution occurred to
>> me: a "bytes.chr" class method, that served as an alternate
>> constructor. That idea results in the following 3 part proposal:
>>
>> 1. Add "bytes.chr" such that "bytes.chr(x)" is equivalent to the PEP
>> 361 defined "b'%c' % x"
>
> You mean bytes.chr(x) is equivalent to bytes([x]). The intent is
> slightly more obvious indeed, so I'd inclined to be +0, but Python
> isn't really a worse language if it doesn't have that alternative
> spelling.
>
>> Anyway, what do people think? Does anyone actually *like* the way the
>> bytes constructor in Python 3 currently handles integers and want to
>> keep it forever?
>
> I don't like it, but I also don't think it's enough of a nuisance to be
> deprecated.
> (the indexing behaviour of bytes objects is far more annoying)

Oops, I forgot to explain the context where this idea came up: I was
trying to figure out how to iterate over a bytes object or wrap an
indexing operation to get a length 1 byte sequence rather than an
integer.

Currently: probably muck about with lambda or a comprehension

With this change (using Steven D'Aprano's suggested name):

    for x in map(bytes.byte, data):
        # x is a length 1 bytes object, not an int

    x = bytes.byte(data[0]) # ditto

bytes.byte could actually apply the same policy as some other APIs and
also accept ASCII text code points in addition to length 1 bytes
objects and integers below 256.

Since changing the iteration and indexing behaviour of bytes and
bytearray within the Python 3 series isn't feasible, this idea is
about making the current behaviour easier to deal with.

And yes, this is definitely going to need a PEP :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia