[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

INADA Naoki songofacandy at gmail.com
Wed Oct 12 00:08:05 EDT 2016


Hi.

While there were no reply to my previous post on Python-ideas ML,
Now I'm sure about bytes.frombuffer() is worth enough.

Let's describe why I think it's important.


Background
=========

>From Python 3.4, bytearray is good solution for I/O buffer, thanks to
#19087 [1].
Actually, asyncio uses bytearray as I/O buffer often.

When bytearray is used for read buffer, we can parse received data on bytearray
directly, and consume it.  For example, read until '\r\n' is easier
than io.BytesIO().

Sample code:

    def read_line(buf: bytearray) -> bytes:
        try:
            n = buf.index(b'\r\n')
        except ValueError:
            return b''

        line = bytes(buf)[:n]  # bytearray -> bytes -> bytes
        del buf[:n+2]
        return line


    buf = bytearray(b'foo\r\nbar\r\nbaz\r\n')

    while True:
        line = read_line(buf)
        if not line:
            break
        print(line)

As you can see, redundant temporary bytes is used.
This is not ideal for performance and memory efficiency.

Since code like this is typically in lower level code (e.g. asyncio),
performance and
efficiency is important.

[1] https://bugs.python.org/issue19087

(Off topic: bytearray is nice for write buffer too. written =
s.send(buf); del buf[:written];)


Memoryview problem
=================

To avoid redundant copy of `line = bytes(buf)[:n]`, current solution
is using memoryview.

First code I wrote is: `line = bytes(memoryview(buf)[:n])`.

On CPython, it works fine.  But `del buff[:n+2]` in next line may fail
on other Python
implementations.  Changing bytearray size is inhibited while
memoryview is alive.

So right code is:

with memoryview(buf) as m:
    line = bytes(m[:n])

The problem of memoryview approach is:

* Overhead: creating temporary memoryview, __enter__, and __exit__. (see below)

* It isn't "one obvious way": Developers including me may forget to
use context manager.
  And since it works on CPython, it's hard to point it out.


Quick benchmark:

(temporary bytes)
$ python3 -m perf timeit -s 'buf =
bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]'
....................
Median +- std dev: 652 ns +- 19 ns

(temporary memoryview without "with"
$ python3 -m perf timeit -s 'buf =
bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])'
....................
Median +- std dev: 886 ns +- 26 ns

(temporary memoryview with "with")
$ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- '
with memoryview(buf) as m:
    bytes(m[:3])
'
....................
Median +- std dev: 1.11 us +- 0.03 us


Proposed solution
===============

Adding one more constructor to bytes:

    # when length=-1 (default), use until end of *byteslike*.
    bytes.frombuffer(byteslike, length=-1, offset=0)

With ths API

    with memoryview(buf) as m:
        line = bytes(m[:n])

becomes

    line = bytes.frombuffer(buf, n)


Thanks,

-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-Dev mailing list