Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 467 -- Minor API improvements for binary sequences

PEP:467
Title:Minor API improvements for binary sequences
Author:Nick Coghlan <ncoghlan at gmail.com>, Ethan Furman <ethan at stoneleaf.us>
Status:Draft
Type:Standards Track
Created:2014-03-30
Python-Version:3.8
Post-History:2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01

Abstract

During the initial development of the Python 3 language specification, the core bytes type for arbitrary binary data started as the mutable type that is now referred to as bytearray. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series.

This PEP proposes five small adjustments to the APIs of the bytes and bytearray types to make it easier to operate entirely in the binary domain:

  • Deprecate passing single integer values to bytes and bytearray
  • Add bytes.fromsize and bytearray.fromsize alternative constructors
  • Add bytes.fromord and bytearray.fromord alternative constructors
  • Add bytes.getbyte and bytearray.getbyte byte retrieval methods
  • Add bytes.iterbytes and bytearray.iterbytes alternative iterators

And one built-in:

* bchr

Proposals

Deprecation of current "zero-initialised sequence" behaviour without removal

Currently, the bytes and bytearray constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size:

>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely.

No other changes are proposed to the existing constructors.

Addition of explicit "count and byte initialised sequence" constructors

To replace the deprecated behaviour, this PEP proposes the addition of an explicit fromsize alternative constructor as a class method on both bytes and bytearray whose first argument is the count, and whose second argument is the fill byte to use (defaults to \x00):

>>> bytes.fromsize(3)
b'\x00\x00\x00'
>>> bytearray.fromsize(3)
bytearray(b'\x00\x00\x00')
>>> bytes.fromsize(5, b'\x0a')
b'\x0a\x0a\x0a\x0a\x0a'
>>> bytearray.fromsize(5, b'\x0a')
bytearray(b'\x0a\x0a\x0a\x0a\x0a')

fromsize will behave just as the current constructors behave when passed a single integer, while allowing for non-zero fill values when needed.

Addition of "bchr" function and explicit "single byte" constructors

As binary counterparts to the text chr function, this PEP proposes the addition of a bchr function and an explicit fromord alternative constructor as a class method on both bytes and bytearray:

>>> bchr(ord("A"))
b'A'
>>> bchr(ord(b"A"))
b'A'
>>> bytes.fromord(65)
b'A'
>>> bytearray.fromord(65)
bytearray(b'A')

These methods will only accept integers in the range 0 to 255 (inclusive):

>>> bytes.fromord(512)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: integer must be in range(0, 256)

>>> bytes.fromord(1.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer

While this does create some duplication, there are valid reasons for it:

  • the bchr builtin is to recreate the ord/chr/unichr trio from Python 2 under a different naming scheme
  • the class method is mainly for the bytearray.fromord case, with bytes.fromord added for consistency

The documentation of the ord builtin will be updated to explicitly note that bchr is the primary inverse operation for binary data, while chr is the inverse operation for text data, and that bytes.fromord and bytearray.fromord also exist.

Behaviourally, bytes.fromord(x) will be equivalent to the current bytes([x]) (and similarly for bytearray). The new spelling is expected to be easier to discover and easier to read (especially when used in conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher order functions like map.

Addition of "getbyte" method to retrieve a single byte

This PEP proposes that bytes and bytearray gain the method getbyte which will always return bytes:

>>> b'abc'.getbyte(0)
b'a'

If an index is asked for that doesn't exist, IndexError is raised:

>>> b'abc'.getbyte(9)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index out of range

Addition of optimised iterator methods that produce bytes objects

This PEP proposes that bytes and bytearray``gain an optimised ``iterbytes method that produces length 1 bytes objects rather than integers:

for x in data.iterbytes():
    # x is a length 1 ``bytes`` object, rather than an integer

For example:

>>> tuple(b"ABC".iterbytes())
(b'A', b'B', b'C')

Design discussion

Why not rely on sequence repetition to create zero-initialised sequences?

Zero-initialised sequences can be created via sequence repetition:

>>> b'\x00' * 3
b'\x00\x00\x00'
>>> bytearray(b'\x00') * 3
bytearray(b'\x00\x00\x00')

However, this was also the case when the bytearray type was originally designed, and the decision was made to add explicit support for it in the type constructor. The immutable bytes type then inherited that feature when it was introduced in PEP 3137.

This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that bytes(x) (where x is an integer) should behave like the bytes.fromord(x) proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity.

Open Questions

Do we add iterbytes to memoryview, or modify memoryview.cast() to accept 's' as a single-byte interpretation? Or do we ignore memory for now and add it later?

References

[1]Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
[2]Guido's initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
[3]Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895)
[4]Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644)
[5]August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
[6]June 2016 discussion thread on python-dev (https://mail.python.org/pipermail/python-dev/2016-June/144875.html)
Source: https://github.com/python/peps/blob/master/pep-0467.txt