[Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

Fri Jun 10 13:40:53 EDT 2016

On 9 June 2016 at 19:21, Barry Warsaw <barry at python.org> wrote:
> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
>
>>Deprecation of current "zero-initialised sequence" behaviour
>>------------------------------------------------------------
>>
>>Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
>>argument and interpret it as meaning to create a zero-initialised sequence of
>>the given size::
>>
>>     >>> bytes(3)
>>     b'\x00\x00\x00'
>>     >>> bytearray(3)
>>     bytearray(b'\x00\x00\x00')
>>
>>This PEP proposes to deprecate that behaviour in Python 3.6, and remove it
>>entirely in Python 3.7.
>>
>>No other changes are proposed to the existing constructors.
>
> Does it need to be *actually* removed?  That does break existing code for not
> a lot of benefit.  Yes, the default constructor is a little wonky, but with
> the addition of the new constructors, and the fact that you're not proposing
> to eventually change the default constructor, removal seems unnecessary.
> Besides, once it's removed, what would `bytes(3)` actually do?  The PEP
> doesn't say.

Raise TypeError, presumably. However, I agree this isn't worth the
hassle of breaking working code, especially since truly ludicrous
values will fail promptly with MemoryError - it's only a particular
range of values that fit within the limits of the machine, but also
push it into heavy swapping that are a potential problem.

> Also, since you're proposing to add `bytes.byte(3)` have you considered also
> adding an optional count argument?  E.g. `bytes.byte(3, count=7)` would yield
> b'\x03\x03\x03\x03\x03\x03\x03'.  That seems like it could be useful.

The purpose of bytes.byte() in the PEP is to provide a way to
roundtrip ord() calls with binary inputs, since the current spelling
is pretty unintuitive:

    >>> ord("A")
    65
    >>> chr(ord("A"))
    'A'
    >>> ord(b"A")
    65
    >>> bytes([ord(b"A")])
    b'A'

That said, perhaps it would make more sense for the corresponding
round-trip to be:

    >>> bchr(ord("A"))
    b'A'

With the "b" prefix on "chr" reflecting the "b" prefix on the output.
This also inverts the chr/unichr pairing that existed in Python 2
(replacing it with bchr/chr), and is hence very friendly to
compatibility modules like six and future (future.builtins already
provides a chr that behaves like the Python 3 one, and bchr would be
much easier to add to that than a new bytes object method).

In terms of an efficient memory-preallocation interface, the
equivalent NumPy operation to request a pre-filled array is
"ndarray.full":
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.full.html
(there's also an inplace mutation operation, "fill")

For bytes and bytearray though, that has an unfortunate name collision
with "zfill", which refers to zero-padding numeric values for fixed
width display.

If the PEP just added bchr() to complement chr(), and [bytes,
bytearray].zeros() as a more discoverable alternative to passing
integers to the default constructor, I think that would be a decent
step forward, and the question of pre-initialising with arbitrary
values can be deferred for now (and perhaps left to NumPy
indefinitely)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia