[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

Nick Coghlan ncoghlan at gmail.com
Wed Oct 12 01:07:49 EDT 2016


I don't think it makes sense to add any more ideas to PEP 467. That
needed to be a PEP because it proposed breaking backwards
compatibility in a couple of areas, and because of the complex history
of Python 3's "bytes-as-tuple-of-ints" and Python 2's "bytes-as-str"
semantics.

Other enhancements to the binary data handling APIs in Python 3 can be
considered on their own merits.

On 12 October 2016 at 14:08, INADA Naoki <songofacandy at gmail.com> wrote:
> Memoryview problem
> =================
>
> To avoid redundant copy of `line = bytes(buf)[:n]`, current solution
> is using memoryview.
>
> First code I wrote is: `line = bytes(memoryview(buf)[:n])`.
>
> On CPython, it works fine.  But `del buff[:n+2]` in next line may fail
> on other Python
> implementations.  Changing bytearray size is inhibited while
> memoryview is alive.
>
> So right code is:
>
> with memoryview(buf) as m:
>     line = bytes(m[:n])
>
> The problem of memoryview approach is:
>
> * Overhead: creating temporary memoryview, __enter__, and __exit__. (see below)
>
> * It isn't "one obvious way": Developers including me may forget to
> use context manager.
>   And since it works on CPython, it's hard to point it out.

To add to the confusion, there's also
https://docs.python.org/3/library/stdtypes.html#memoryview.tobytes
giving:

    line = memoryview(buf)[:n].tobytes()

However, folks *do* need to learn that many mutable data types will
lock themselves against modification while you have a live memory view
on them, so it's important to release views promptly and reliably when
we don't need them any more.

> Quick benchmark:
>
> (temporary bytes)
> $ python3 -m perf timeit -s 'buf =
> bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(buf)[:3]'
> ....................
> Median +- std dev: 652 ns +- 19 ns
>
> (temporary memoryview without "with"
> $ python3 -m perf timeit -s 'buf =
> bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- 'bytes(memoryview(buf)[:3])'
> ....................
> Median +- std dev: 886 ns +- 26 ns
>
> (temporary memoryview with "with")
> $ python3 -m perf timeit -s 'buf = bytearray(b"foo\r\nbar\r\nbaz\r\n")' -- '
> with memoryview(buf) as m:
>     bytes(m[:3])
> '
> ....................
> Median +- std dev: 1.11 us +- 0.03 us

This is normal though, as memory views trade lower O(N) costs (reduced
data copying) for higher O(1) setup costs (creating and managing the
view, indirection for data access).

> Proposed solution
> ===============
>
> Adding one more constructor to bytes:
>
>     # when length=-1 (default), use until end of *byteslike*.
>     bytes.frombuffer(byteslike, length=-1, offset=0)
>
> With ths API
>
>     with memoryview(buf) as m:
>         line = bytes(m[:n])
>
> becomes
>
>     line = bytes.frombuffer(buf, n)

Does that need to be a method on the builtin rather than a separate
helper function, though? Once you define:

    def snapshot(buf, length=None, offset=0):
        with memoryview(buf) as m:
            return m[offset:length].tobytes()

then that can be replaced by a more optimised C implementation without
users needing to care about the internal details.

That is, getting back to a variant on one of Serhiy's suggestions in
the last PEP 467 discussion, it may make sense for us to offer a
"buffertools" library that's specifically aimed at supporting
efficient buffer manipulation operations that minimise data copying.
The pure Python implementations would work entirely through
memoryview, but we could also have selected C accelerated operations
if that showed a noticeable improvement on asyncio's benchmarks.

Regards,
Nick.

P.S. The length/offset API design is also problematic due to the way
it differs from range() & slice(), but I don't think it makes sense to
get into that kind of detail before discussing the larger question of
adding a new helper module for working efficiently with memory buffers
vs further widening the method API for the builtin bytes type

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list