|Title:||Minor API improvements for binary sequences|
|Author:||Nick Coghlan <ncoghlan at gmail.com>|
|Post-History:||2014-03-30 2014-08-15 2014-08-16|
- Design discussion
During the initial development of the Python 3 language specification, the core bytes type for arbitrary binary data started as the mutable type that is now referred to as bytearray. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series.
This PEP proposes four small adjustments to the APIs of the bytes, bytearray and memoryview types to make it easier to operate entirely in the binary domain:
- Deprecate passing single integer values to bytes and bytearray
- Add bytes.zeros and bytearray.zeros alternative constructors
- Add bytes.byte and bytearray.byte alternative constructors
- Add bytes.iterbytes, bytearray.iterbytes and memoryview.iterbytes alternative iterators
Currently, the bytes and bytearray constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size:
>>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.5, and remove it entirely in Python 3.6.
No other changes are proposed to the existing constructors.
To replace the deprecated behaviour, this PEP proposes the addition of an explicit zeros alternative constructor as a class method on both bytes and bytearray:
>>> bytes.zeros(3) b'\x00\x00\x00' >>> bytearray.zeros(3) bytearray(b'\x00\x00\x00')
It will behave just as the current constructors behave when passed a single integer.
The specific choice of zeros as the alternative constructor name is taken from the corresponding initialisation function in NumPy (although, as these are 1-dimensional sequence types rather than N-dimensional matrices, the constructors take a length as input rather than a shape tuple)
As binary counterparts to the text chr function, this PEP proposes the addition of an explicit byte alternative constructor as a class method on both bytes and bytearray:
>>> bytes.byte(3) b'\x03' >>> bytearray.byte(3) bytearray(b'\x03')
These methods will only accept integers in the range 0 to 255 (inclusive):
>>> bytes.byte(512) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: bytes must be in range(0, 256) >>> bytes.byte(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'float' object cannot be interpreted as an integer
The documentation of the ord builtin will be updated to explicitly note that bytes.byte is the inverse operation for binary data, while chr is the inverse operation for text data.
Behaviourally, bytes.byte(x) will be equivalent to the current bytes([x]) (and similarly for bytearray). The new spelling is expected to be easier to discover and easier to read (especially when used in conjunction with indexing operations on binary sequence types).
As a separate method, the new spelling will also work better with higher order functions like map.
This PEP proposes that bytes, bytearray and memoryview gain an optimised iterbytes method that produces length 1 bytes objects rather than integers:
for x in data.iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer
The method can be used with arbitrary buffer exporting objects by wrapping them in a memoryview instance first:
for x in memoryview(data).iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer
For memoryview, the semantics of iterbytes() are defined such that:
memview.tobytes() == b''.join(memview.iterbytes())
This allows the raw bytes of the memory view to be iterated over without needing to make a copy, regardless of the defined shape and format.
The main advantage this method offers over the map(bytes.byte, data) approach is that it is guaranteed not to fail midstream with a ValueError or TypeError. By contrast, when using the map based approach, the type and value of the individual items in the iterable are only checked as they are retrieved and passed through the bytes.byte constructor.
Zero-initialised sequences can be created via sequence repetition:
>>> b'\x00' * 3 b'\x00\x00\x00' >>> bytearray(b'\x00') * 3 bytearray(b'\x00\x00\x00')
However, this was also the case when the bytearray type was originally designed, and the decision was made to add explicit support for it in the type constructor. The immutable bytes type then inherited that feature when it was introduced in PEP 3137.
This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that bytes(x) (where x is an integer) should behave like the bytes.byte(x) proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity.
|||Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)|
|||Guido's initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)|
|||Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895)|
|||Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644)|
|||August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)|
This document has been placed in the public domain.