[Python-Dev] bytes-like objects

Mon Oct 6 06:34:47 CEST 2014

On 6 October 2014 10:15, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> I wrote:
>
>>> But you can't
>>> create an object that supports the buffer protocol by implementing
>>> Python methods.
>
>
> Another thing is that an object implementing the buffer
> interface doesn't have to look anything at all like a
> bytes object from Python, so calling it "bytes-like"
> could be rather confusing.

"bytes-like object" is already used that way - e.g. memoryview is a
bytes-like object, as is array.arrray. Even numpy.ndarray, PIL images,
etc can all be accessed as bytes-like objects.

The term itself isn't new - we've been using it to progressively
eliminate awkward docs references to "objects that support the buffer
protocol" for years. David's post was just to say that the process of
adopting it is now largely complete, as the exception messages that
mentioned the buffer protocol have also been updated.

As near as I can figure out, some of the the reasons it appears to
work better than relying on the "buffer" term include:

1. Anyone learning Python 3 will know "bytes", and "A bytes-like
object is one that can be treated like a bytes object by adapting it
with memoryview" is a relatively simple step to take (although we
could likely be more explicit about that in the glossary entry).
2. When reporting errors, it conveys more clearly that passing a bytes
object will work, since the assumption that "bytes instances are bytes
like objects" is both obvious and correct. It also isn't too hard to
figure out that "str instances are not bytes like objects". With those
two figured out as category anchors, it becomes easier to start
grouping other types by comparing them with the two archetypal
examples. By contrast, is not *at all* obvious why bytes supports the
buffer protocol, while str does not.
3. "buffer" is a completely new term for most users, and one that
refers to an implementation detail of memoryview, moreso than
something developers actually need to care about. Using it directly in
error messages and documentation is to make the abstraction leak in a
way that raises unnecessary barriers to entry.
4. As a term, "buffer" in Python 3 also suffers from meaning something
*different* from what the buffer builtin refers to in Python 2 (the
fact str implemented the buffer protocol in Python 2 doesn't help).

In many way, it's similar to "file-like object" - those vary wildly in
how much of the file API they support, based on underlying technical
details (like whether it's backed by a file descriptor or not, whether
it's a binary or text stream, whether it's seekable, etc). However, by
saying "file-like object", we make the interface easy to learn to use
productively, as there are some obvious immediate inclusions (like the
actual file objects returned by open()) and exclusions (like strings
and bytes objects). Navigating the grey area can then be postponed
until later (and for many users, it will never be necessary to
navigate it at all).

If anything, bytes-like object is *better* defined than file-like
object, since the one thing that is expected to work for *any*
bytes-like object is "raw_data = memoryview(obj)". Everything beyond
that is negotiable (including the rest of the bytes API, and whether
or not the object is immutable or not).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia