[Python-checkins] peps: PEP 467: descope dramatically based on Guido's feedback
nick.coghlan
python-checkins at python.org
Thu Apr 3 14:33:47 CEST 2014
http://hg.python.org/peps/rev/435fa0278b73
changeset: 5452:435fa0278b73
user: Nick Coghlan <ncoghlan at gmail.com>
date: Thu Apr 03 22:33:36 2014 +1000
summary:
PEP 467: descope dramatically based on Guido's feedback
files:
pep-0467.txt | 303 ++++++++++++--------------------------
1 files changed, 95 insertions(+), 208 deletions(-)
diff --git a/pep-0467.txt b/pep-0467.txt
--- a/pep-0467.txt
+++ b/pep-0467.txt
@@ -22,28 +22,35 @@
This PEP proposes a number of small adjustments to the APIs of the ``bytes``
and ``bytearray`` types to make their behaviour more internally consistent
-and to make it easier to operate entirely in the binary domain.
+and to make it easier to operate entirely in the binary domain, as well as
+changes to their documentation to make it easier to grasp their dual roles
+as containers of "arbitrary binary data" and "binary data with ASCII
+compatible segments".
Background
==========
-Over the course of Python 3's evolution, a number of adjustments have been
-made to the core ``bytes`` and ``bytearray`` types as additional practical
-experience was gained with using them in code beyond the Python 3 standard
-library and test suite. However, to date, these changes have been made
-on a relatively ad hoc tactical basis as specific issues were identified,
-rather than as part of a systematic review of the APIs of these types. This
-approach has allowed inconsistencies to creep into the API design as to which
-input types are accepted by different methods. Additional inconsistencies
-linger from an earlier pre-release design where there was *no* separate
+To simplify the task of writing the Python 3 documentation, the ``bytes``
+and ``bytearray`` types were documented primarily in terms of the way they
+differed from the Unicode based Python 3 ``str`` type. Even when I
+`heavily revised the sequence documentation
+<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
+simplifying shortcut.
+
+However, it turns out that this approach to the documentation of these types
+has a problem: it doesn't adequately introduce users to their hybrid nature,
+where they can be manipulated *either* as a "sequence of integers" type,
+*or* as ``str``-like types that assume ASCII compatible data.
+
+In addition to the documentation issues, there are some lingering design
+quirks from an earlier pre-release design where there was *no* separate
``bytearray`` type, and instead the core ``bytes`` type was mutable (with
-no immutable counterpart), as well as from the origins of these types in
-the text-like behaviour of the Python 2 ``str`` type.
+no immutable counterpart).
-This PEP aims to provide the missing systematic review, with the goal of
-ensuring that wherever feasible (given backwards compatibility constraints)
-these current inconsistencies are addressed for the Python 3.5 release.
+Finally, additional experience with using the existing Python 3 binary
+sequence types in real world applications has suggested it would be
+beneficial to make it easier to convert integers to length 1 bytes objects.
Proposals
@@ -55,10 +62,13 @@
factors:
* removing remnants of the original design of ``bytes`` as a mutable type
-* more consistently accepting length 1 ``bytes`` objects as input where an
- integer between ``0`` and ``255`` inclusive is expected, and vice-versa
-* allowing users to easily convert integer output to a length 1 ``bytes``
+* allowing users to easily convert integer values to a length 1 ``bytes``
object
+* consistently applying the following analogies to the type API designs
+ and documentation:
+
+ * ``bytes``: tuple of integers, with additional str-like methods
+ * ``bytearray``: list of integers, with additional str-like methods
Alternate Constructors
@@ -83,95 +93,69 @@
b'\x00\x00\x00'
This PEP proposes that the current handling of integers in the bytes and
-bytearray constructors by deprecated in Python 3.5 and removed in Python
-3.6, being replaced by two more type appropriate alternate constructors
-provided as class methods. The initial python-ideas thread [ideas-thread1]_
-that spawned this PEP was specifically aimed at deprecating this constructor
-behaviour.
+bytearray constructors by deprecated in Python 3.5 and targeted for
+removal in Python 3.7, being replaced by two more explicit alternate
+constructors provided as class methods. The initial python-ideas thread
+[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
+this constructor behaviour.
-For ``bytes``, a ``byte`` constructor is proposed that converts integers
-(as indicated by ``operator.index``) in the appropriate range to a ``bytes``
-object, converts objects that support the buffer API to bytes, and also
-passes through length 1 byte strings unchanged::
+Firstly, a ``byte`` constructor is proposed that converts integers
+in the range 0 to 255 (inclusive) to a ``bytes`` object::
>>> bytes.byte(3)
b'\x03'
- >>> bytes.byte(bytearray(bytes([3])))
- b'\x03'
- >>> bytes.byte(memoryview(bytes([3])))
- b'\x03'
- >>> bytes.byte(bytes([3]))
- b'\x03'
+ >>> bytearray.byte(3)
+ bytearray(b'\x03')
>>> bytes.byte(512)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)
- >>> bytes.byte(b"ab")
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- TypeError: bytes.byte() expected a byte, but buffer of length 2 found
One specific use case for this alternate constructor is to easily convert
the result of indexing operations on ``bytes`` and other binary sequences
from an integer to a ``bytes`` object. The documentation for this API
should note that its counterpart for the reverse conversion is ``ord()``.
+The ``ord()`` documentation will also be updated to note that while
+``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
+``bytearray.byte`` are the counterparts for binary input.
-For ``bytearray``, a ``from_len`` constructor is proposed that preallocates
-the buffer filled with a particular value (default to ``0``) as a direct
+Secondly, a ``zeros`` constructor is proposed that serves as a direct
replacement for the current constructor behaviour, rather than having to use
sequence repetition to achieve the same effect in a less intuitive way::
- >>> bytearray.from_len(3)
+ >>> bytes.zeros(3)
+ b'\x00\x00\x00'
+ >>> bytearray.zeros(3)
bytearray(b'\x00\x00\x00')
- >>> bytearray.from_len(3, 6)
- bytearray(b'\x06\x06\x06')
-This part of the proposal was covered by an existing issue
-[empty-buffer-issue]_ and a variety of names have been proposed
-(``empty_buffer``, ``zeros``, ``zeroes``, ``allnull``, ``fill``). The
-specific name currently proposed was chosen by analogy with
-``dict.fromkeys()`` and ``itertools.chain.from_iter()`` to be completely
-explicit that it is an alternate constructor rather than an in-place
-mutation, as well as how it differs from the standard constructor.
+The chosen name here is taken from the corresponding initialisation function
+in NumPy (although, as these are sequence types rather than N-dimensional
+matrices, the constructors take a length as input rather than a shape tuple)
-
-Open questions
-^^^^^^^^^^^^^^
-
-* Should ``bytearray.byte()`` also be added? Or is
- ``bytearray(bytes.byte(x))`` sufficient for that case?
-* Should ``bytes.from_len()`` also be added? Or is sequence repetition
- sufficient for that case?
-* Should ``bytearray.from_len()`` use a different name?
-* Should ``bytes.byte()`` raise ``TypeError`` or ``ValueError`` for binary
- sequences with more than one element? The ``TypeError`` currently proposed
- is copied (with slightly improved wording) from the behaviour of ``ord()``
- with sequences containing more than one code point, while ``ValueError``
- would be more consistent with the existing handling of out-of-range
- integer values.
-* ``bytes.byte()`` is defined above as accepting length 1 binary sequences
- as individual bytes, but this is currently inconsistent with the main
- ``bytes`` constructor::
-
- >>> bytes([b"a", b"b", b"c"])
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- TypeError: 'bytes' object cannot be interpreted as an integer
-
- Should the ``bytes`` constructor be changed to accept iterables of length 1
- bytes objects in addition to iterables of integers? If so, should it
- allow a mixture of the two in a single iterable?
+While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
+useful duo amongst the new constructors, ``bytes.zeros`` and
+`bytearray.byte`` are provided in order to maintain API consistency between
+the two types.
Iteration
---------
-Iteration over ``bytes`` objects and other binary sequences produces
-integers. Rather than proposing a new method that would need to be added
-not only to ``bytes``, ``bytearray`` and ``memoryview``, but potentially
-to third party types as well, this PEP proposes that iteration to produce
-length 1 ``bytes`` objects instead be handled by combining ``map`` with
-the new ``bytes.byte()`` alternate constructor proposed above::
+While iteration over ``bytes`` objects and other binary sequences produces
+integers, it is sometimes desirable to iterate over length 1 bytes objects
+instead.
+
+To handle this situation more obviously (and more efficiently) than would be
+the case with the ``map(bytes.byte, data)`` construct enabled by the above
+constructor changes, this PEP proposes the addition of a new ``iterbytes``
+method to ``bytes``, ``bytearray`` and ``memoryview``::
+
+ for x in data.iterbytes():
+ # x is a length 1 ``bytes`` object, rather than an integer
+
+Third party types and arbitrary containers of integers that lack the new
+method can still be handled by combining ``map`` with the new
+``bytes.byte()`` alternate constructor proposed above::
for x in map(bytes.byte, data):
# x is a length 1 ``bytes`` object, rather than an integer
@@ -179,139 +163,42 @@
# 0 to 255 inclusive
-Consistent support for different input types
---------------------------------------------
+Open questions
+^^^^^^^^^^^^^^
-The ``bytes`` and ``bytearray`` methods inspired by the Python 2 ``str``
-type generally expect to operate on binary subsequences: other objects
-implementing the buffer API. By contrast, the mutating APIs added to
-the ``bytearray`` interface expect to operate on individual elements:
-integer in the range 0 to 255 (inclusive).
+* The fallback case above suggests that this could perhaps be better handled
+ as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()``
+ if defined, but otherwise fell back to ``map(bytes.byte, data)``::
-In Python 3.3, the binary search operations (``in``, ``count()``,
-``find()``, ``index()``, ``rfind()`` and ``rindex()``) were updated to
-accept integers in the range 0 to 255 (inclusive) as their first argument,
-in addition to the existing support for binary subsequences.
+ for x in iterbytes(data):
+ # x is a length 1 ``bytes`` object, rather than an integer
+ # This works with *any* container of integers in the range
+ # 0 to 255 inclusive
-This results in behaviour like the following in Python 3.3+::
- >>> data = bytes([1, 2, 3, 4])
- >>> 3 in data
- True
- >>> b"\x03" in data
- True
- >>> data.count(3)
- 1
- >>> data.count(b"\x03")
- 1
+Documentation clarifications
+----------------------------
- >>> data.replace(3, 4)
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- TypeError: expected bytes, bytearray or buffer compatible object
- >>> data.replace(b"\x03", b"\x04")
- b'\x01\x02\x04\x04'
+In an attempt to clarify the `documentation
+<https://docs.python.org/dev/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__
+of the ``bytes`` and ``bytearray`` types, the following changes are
+proposed:
- >>> mutable = bytearray(data)
- >>> mutable
- bytearray(b'\x01\x02\x03\x04')
- >>> mutable.append(b"\x05")
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- TypeError: an integer is required
- >>> mutable.append(5)
- >>> mutable
- bytearray(b'\x01\x02\x03\x04\x05')
+* the documentation of the *sequence* behaviour of each type is moved to
+ section for that individual type. These sections will be updated to
+ explicitly make the ``tuple of integers`` and ``list of integers``
+ analogies, as well as to make it clear that these parts of the API work
+ with arbitrary binary data
+* the current "Bytes and bytearray operations" section will be updated to
+ "Handling binary data with ASCII compatible segments", and will explicitly
+ list *all* of the methods that are included.
+* clarify that due to their origins in the API of the immutable ``str``
+ type, even the ``bytearray`` versions of these methods do *not* operate
+ in place, but instead create a new object.
-
-This PEP proposes extending the behaviour of accepting integers as being
-equivalent to the corresponding length 1 binary sequence to several other
-``bytes`` and ``bytearray`` methods that currently expect a ``bytes``
-object for certain parameters. In essence, if a value is an acceptable
-input to the new ``bytes.byte`` constructor defined above, then it would
-be acceptable in the roles defined here (in addition to any other already
-supported inputs):
-
-* ``startswith()`` prefix(es)
-* ``endswith()`` suffix(es)
-
-* ``center()`` fill character
-* ``ljust()`` fill character
-* ``rjust()`` fill character
-
-* ``strip()`` character to strip
-* ``lstrip()`` character to strip
-* ``rstrip()`` character to strip
-
-* ``partition()`` separator argument
-* ``rpartition()`` separator argument
-
-* ``split()`` separator argument
-* ``rsplit()`` separator argument
-
-* ``replace()`` old value and new value
-
-In addition to the consistency motive, this approach also makes it easier
-to work with the indexing behaviour , as the result of an indexing operation
-can more easily be fed back in to other methods.
-
-For ``bytearray``, some additional changes are proposed to the current
-integer based operations to ensure they remain consistent with the proposed
-constructor changes::
-
-* ``append()``: updated to be consistent with ``bytes.byte()``
-* ``remove()``: updated to be consistent with ``bytes.byte()``
-* ``+=``: updated to be consistent with ``bytes()`` changes (if any)
-* ``extend()``: updated to be consistent with ``bytes()`` changes (if any)
-
-The general principle behind these changes is to restore the flexible
-"element-or-subsequence" behaviour seen in the ``str`` API, even though
-Python 3 actually represents subsequences and individual elements as
-distinct types in the binary domain.
-
-
-Acknowledgement of surprising behaviour of some ``bytearray`` methods
----------------------------------------------------------------------
-
-Several of the ``bytes`` and ``bytearray`` methods have their origins in the
-Python 2 ``str`` API. As ``str`` is an immutable type, all of these
-operations are defined as returning a *new* instance, rather than operating
-in place. This contrasts with methods on other mutable types like ``list``,
-where ``list.sort()`` and ``list.reverse()`` operate in-place and return
-``None``, rather than creating a new object.
-
-Backwards compatibility constraints make it impractical to change this
-behaviour at this point, but it may be appropriate to explicitly call out
-this quirk in the documentation for the ``bytearray`` type. It affects the
-following methods that could reasonably be expected to operate in-place on
-a mutable type:
-
-* ``center()``
-* ``ljust()``
-* ``rjust()``
-* ``strip()``
-* ``lstrip()``
-* ``rstrip()``
-* ``replace()``
-* ``lower()``
-* ``upper()``
-* ``swapcase()``
-* ``title()``
-* ``capitalize()``
-* ``translate()``
-* ``expandtabs()``
-* ``zfill()``
-
-Note that the following ``bytearray`` operations *do* operate in place, as
-they're part of the mutable sequence API in ``bytearray``, rather than being
-inspired by the immutable Python 2 ``str`` API:
-
-* ``+=``
-* ``append()``
-* ``extend()``
-* ``reverse()``
-* ``remove()``
-* ``pop()``
+A patch for at least this part of the proposal will be prepared before
+submitting the PEP for approval, as writing out these docs completely may
+suggest additional opportunities for API consistency improvements.
References
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list