[Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

Tue Jun 7 16:28:13 EDT 2016

Minor changes: updated version numbers, add punctuation.

The current text seems to take into account Guido's last comments.

Thoughts before asking for acceptance?

PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15 2014-08-16

Abstract
========

During the initial development of the Python 3 language specification, 
the core ``bytes`` type for arbitrary binary data started as the mutable 
type that is now referred to as ``bytearray``. Other aspects of 
operating in the binary domain in Python have also evolved over the 
course of the Python 3 series.

This PEP proposes four small adjustments to the APIs of the ``bytes``, 
``bytearray`` and ``memoryview`` types to make it easier to operate 
entirely in the binary domain:

* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
   ``memoryview.iterbytes`` alternative iterators

Proposals
=========

Deprecation of current "zero-initialised sequence" behaviour
------------------------------------------------------------

Currently, the ``bytes`` and ``bytearray`` constructors accept an 
integer argument and interpret it as meaning to create a 
zero-initialised sequence of the given size::

     >>> bytes(3)
     b'\x00\x00\x00'
     >>> bytearray(3)
     bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.6, and remove 
it entirely in Python 3.7.

No other changes are proposed to the existing constructors.

Addition of explicit "zero-initialised sequence" constructors
-------------------------------------------------------------

To replace the deprecated behaviour, this PEP proposes the addition of 
an explicit ``zeros`` alternative constructor as a class method on both 
``bytes`` and ``bytearray``::

     >>> bytes.zeros(3)
     b'\x00\x00\x00'
     >>> bytearray.zeros(3)
     bytearray(b'\x00\x00\x00')

It will behave just as the current constructors behave when passed a 
single integer.

The specific choice of ``zeros`` as the alternative constructor name is 
taken from the corresponding initialisation function in NumPy (although, 
as these are 1-dimensional sequence types rather than N-dimensional 
matrices, the constructors take a length as input rather than a shape 
tuple).

Addition of explicit "single byte" constructors
-----------------------------------------------

As binary counterparts to the text ``chr`` function, this PEP proposes 
the addition of an explicit ``byte`` alternative constructor as a class 
method on both ``bytes`` and ``bytearray``::

     >>> bytes.byte(3)
     b'\x03'
     >>> bytearray.byte(3)
     bytearray(b'\x03')

These methods will only accept integers in the range 0 to 255 (inclusive)::

     >>> bytes.byte(512)
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
     ValueError: bytes must be in range(0, 256)

     >>> bytes.byte(1.0)
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
     TypeError: 'float' object cannot be interpreted as an integer

The documentation of the ``ord`` builtin will be updated to explicitly 
note that ``bytes.byte`` is the inverse operation for binary data, while 
``chr`` is the inverse operation for text data.

Behaviourally, ``bytes.byte(x)`` will be equivalent to the current 
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is 
expected to be easier to discover and easier to read (especially when 
used in conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher 
order functions like ``map``.

Addition of optimised iterator methods that produce ``bytes`` objects
---------------------------------------------------------------------

This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain 
an optimised ``iterbytes`` method that produces length 1 ``bytes`` 
objects rather than integers::

     for x in data.iterbytes():
         # x is a length 1 ``bytes`` object, rather than an integer

The method can be used with arbitrary buffer exporting objects by 
wrapping them in a ``memoryview`` instance first::

     for x in memoryview(data).iterbytes():
         # x is a length 1 ``bytes`` object, rather than an integer

For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::

     memview.tobytes() == b''.join(memview.iterbytes())

This allows the raw bytes of the memory view to be iterated over without 
needing to make a copy, regardless of the defined shape and format.

The main advantage this method offers over the ``map(bytes.byte, data)`` 
approach is that it is guaranteed *not* to fail midstream with a 
``ValueError`` or ``TypeError``. By contrast, when using the ``map`` 
based approach, the type and value of the individual items in the 
iterable are only checked as they are retrieved and passed through the 
``bytes.byte`` constructor.

Design discussion
=================

Why not rely on sequence repetition to create zero-initialised sequences?
-------------------------------------------------------------------------

Zero-initialised sequences can be created via sequence repetition::

     >>> b'\x00' * 3
     b'\x00\x00\x00'
     >>> bytearray(b'\x00') * 3
     bytearray(b'\x00\x00\x00')

However, this was also the case when the ``bytearray`` type was 
originally designed, and the decision was made to add explicit support 
for it in the type constructor. The immutable ``bytes`` type then 
inherited that feature when it was introduced in PEP 3137.

This PEP isn't revisiting that original design decision, just changing 
the spelling as users sometimes find the current behaviour of the binary 
sequence constructors surprising. In particular, there's a reasonable 
case to be made that ``bytes(x)`` (where ``x`` is an integer) should 
behave like the ``bytes.byte(x)`` proposal in this PEP. Providing both 
behaviours as separate class methods avoids that ambiguity.

References
==========

.. [1] Initial March 2014 discussion thread on python-ideas
    (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
.. [2] Guido's initial feedback in that thread
    (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
    (http://bugs.python.org/issue20895)
.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
    (http://bugs.python.org/issue21644)
.. [5] August 2014 discussion thread on python-dev
    (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)

Copyright
=========

This document has been placed in the public domain.