[Python-Dev] PEP 467: last round (?)

Fri Sep 2 13:54:06 EDT 2016

Some quick comments below, a few more later:

On Thu, Sep 1, 2016 at 10:36 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> One more iteration. PEPs repo not updated yet. Changes are renaming of
> methods to be ``fromsize()`` and ``fromord()``, and moving ``memoryview``
to
> an Open Questions section.
>
>
> PEP: 467
> Title: Minor API improvements for binary sequences
> Version: $Revision$
> Last-Modified: $Date$
> Author: Nick Coghlan <ncoghlan at gmail.com>, Ethan Furman <
ethan at stoneleaf.us>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-03-30
> Python-Version: 3.6
> Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
>
>
> Abstract
> ========
>
> During the initial development of the Python 3 language specification, the
> core ``bytes`` type for arbitrary binary data started as the mutable type
> that is now referred to as ``bytearray``. Other aspects of operating in
> the binary domain in Python have also evolved over the course of the
Python
> 3 series.
>
> This PEP proposes five small adjustments to the APIs of the ``bytes`` and
> ``bytearray`` types to make it easier to operate entirely in the binary
> domain:
>
> * Deprecate passing single integer values to ``bytes`` and ``bytearray``
> * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative
constructors
> * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
> * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
> * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative
iterators

I wonder if from_something with an underscore is more consistent (according
to a quick search perhaps yes).

What about bytes.getchar and iterchars? A 'byte' in python 3 seems to be an
integer. (I would still like a .chars property that gives a sequence view
with __getitem__ and __len__ so that the getchar and iterchars methods are
not needed)

chrb seems to be more in line with some bytes versions in for instance os
than bchr.

Do we really need chrb? Better to introduce from_int or from_ord also in
str and recommend that over chr?

-- Koos (mobile)

>
> Proposals
> =========
>
> Deprecation of current "zero-initialised sequence" behaviour without
removal
>
----------------------------------------------------------------------------
>
> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> argument and interpret it as meaning to create a zero-initialised sequence
> of the given size::
>
> >>> bytes(3)
> b'\x00\x00\x00'
> >>> bytearray(3)
> bytearray(b'\x00\x00\x00')
>
> This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
> it in place for at least as long as Python 2.7 is supported, possibly
> indefinitely.
>
> No other changes are proposed to the existing constructors.
>
>
> Addition of explicit "count and byte initialised sequence" constructors
> -----------------------------------------------------------------------
>
> To replace the deprecated behaviour, this PEP proposes the addition of an
> explicit ``fromsize`` alternative constructor as a class method on both
> ``bytes`` and ``bytearray`` whose first argument is the count, and whose
> second argument is the fill byte to use (defaults to ``\x00``)::
>
> >>> bytes.fromsize(3)
> b'\x00\x00\x00'
> >>> bytearray.fromsize(3)
> bytearray(b'\x00\x00\x00')
> >>> bytes.fromsize(5, b'\x0a')
> b'\x0a\x0a\x0a\x0a\x0a'
> >>> bytearray.fromsize(5, b'\x0a')
> bytearray(b'\x0a\x0a\x0a\x0a\x0a')
>
> ``fromsize`` will behave just as the current constructors behave when
passed
> a single
> integer, while allowing for non-zero fill values when needed.
>
>
> Addition of "bchr" function and explicit "single byte" constructors
> -------------------------------------------------------------------
>
> As binary counterparts to the text ``chr`` function, this PEP proposes
> the addition of a ``bchr`` function and an explicit ``fromord``
alternative
> constructor as a class method on both ``bytes`` and ``bytearray``::
>
> >>> bchr(ord("A"))
> b'A'
> >>> bchr(ord(b"A"))
> b'A'
> >>> bytes.fromord(65)
> b'A'
> >>> bytearray.fromord(65)
> bytearray(b'A')
>
> These methods will only accept integers in the range 0 to 255
(inclusive)::
>
> >>> bytes.fromord(512)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: integer must be in range(0, 256)
>
> >>> bytes.fromord(1.0)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: 'float' object cannot be interpreted as an integer
>
> While this does create some duplication, there are valid reasons for it::
>
> * the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python
> 2 under a different naming scheme
> * the class method is mainly for the ``bytearray.fromord`` case, with
> ``bytes.fromord`` added for consistency
>
> The documentation of the ``ord`` builtin will be updated to explicitly
note
> that ``bchr`` is the primary inverse operation for binary data, while
> ``chr``
> is the inverse operation for text data, and that ``bytes.fromord`` and
> ``bytearray.fromord`` also exist.
>
> Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
> ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
> expected to be easier to discover and easier to read (especially when used
> in conjunction with indexing operations on binary sequence types).
>
> As a separate method, the new spelling will also work better with higher
> order functions like ``map``.
>
>
> Addition of "getbyte" method to retrieve a single byte
> ------------------------------------------------------
>
> This PEP proposes that ``bytes`` and ``bytearray`` gain the method
> ``getbyte``
> which will always return ``bytes``::
>
> >>> b'abc'.getbyte(0)
> b'a'
>
> If an index is asked for that doesn't exist, ``IndexError`` is raised::
>
> >>> b'abc'.getbyte(9)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> IndexError: index out of range
>
>
> Addition of optimised iterator methods that produce ``bytes`` objects
> ---------------------------------------------------------------------
>
> This PEP proposes that ``bytes`` and ``bytearray``gain an optimised
> ``iterbytes`` method that produces length 1 ``bytes`` objects rather than
> integers::
>
> for x in data.iterbytes():
> # x is a length 1 ``bytes`` object, rather than an integer
>
> For example::
>
> >>> tuple(b"ABC".iterbytes())
> (b'A', b'B', b'C')
>
>
> Design discussion
> =================
>
> Why not rely on sequence repetition to create zero-initialised sequences?
> -------------------------------------------------------------------------
>
> Zero-initialised sequences can be created via sequence repetition::
>
> >>> b'\x00' * 3
> b'\x00\x00\x00'
> >>> bytearray(b'\x00') * 3
> bytearray(b'\x00\x00\x00')
>
> However, this was also the case when the ``bytearray`` type was originally
> designed, and the decision was made to add explicit support for it in the
> type constructor. The immutable ``bytes`` type then inherited that feature
> when it was introduced in PEP 3137.
>
> This PEP isn't revisiting that original design decision, just changing the
> spelling as users sometimes find the current behaviour of the binary
> sequence
> constructors surprising. In particular, there's a reasonable case to be
made
> that ``bytes(x)`` (where ``x`` is an integer) should behave like the
> ``bytes.fromint(x)`` proposal in this PEP. Providing both behaviours as
> separate
> class methods avoids that ambiguity.
>
>
> Open Questions
> ==============
>
> Do we add ``iterbytes`` to ``memoryview``, or modify
> ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation?
Or
> do we ignore memory for now and add it later?
>
>
> References
> ==========
>
> .. [1] Initial March 2014 discussion thread on python-ideas
> (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
> .. [2] Guido's initial feedback in that thread
> (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
> .. [3] Issue proposing moving zero-initialised sequences to a dedicated
API
> (http://bugs.python.org/issue20895)
> .. [4] Issue proposing to use calloc() for zero-initialised binary
sequences
> (http://bugs.python.org/issue21644)
> .. [5] August 2014 discussion thread on python-dev
> (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
> .. [6] June 2016 discussion thread on python-dev
> (https://mail.python.org/pipermail/python-dev/2016-June/144875.html)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160902/3c0211c6/attachment-0001.html>