[Python-ideas] Marvelous movable memoryview

Fri Jul 27 03:37:10 CEST 2012

On Fri, Jul 27, 2012 at 8:40 AM, Matt Chaput <matt at whoosh.ca> wrote:
> I could avoid copying with memoryviews...
>
>   for i in xrange(0, len(s) - 3):
>       x = memoryview(s)[i:i + 3]
>       myfile.write(x)
>
> ...but this is actually much slower (3x slower in some quick tests). I'm
> guessing it's because of all the object creation (while string slicing
> probably uses fast paths).

memorview objects are pretty big - they have to store a lot of
pointers and other objects that describe their view of the underlying
buffer. Bytes objects, on the other hand, have minimal overhead.

(These numbers are for 3.3, which uses a revamped memoryview implementation)

>>> import sys
>>> x = b"foo"
>>> sys.getsizeof(x)
36
>>> sys.getsizeof(memoryview(x))
184

There's also the execution speed overhead that comes from the
indirection when accessing the contents.

Thus, using views instead of copying really only starts to pay off
once you're talking about comparatively large chunks of data:

>>> x *= 1000
>>> sys.getsizeof(x)
3033
>>> sys.getsizeof(memoryview(x))
184

> Shouldn't I be able to do this?
>
>   m = memoryview(s)
>   for i in xrange(0, len(s) - 3):
>       m.start = i
>       m.end = i + 3
>       myfile.write(m)

No, because making memorviews mutable would be a huge increase in
complexity (and they're complex enough already - only with Stefan
Krah's work in 3.3 have we finally worked most of the kinks out of the
implementation).

What you *can* do with a memoryview, though, is slice it, and the
resulting object will be a memoryview that references a subset of the
original object. This can be done with full slicing flexibility in
3.3, or in a more limited fashion in earlier versions.

For example, processing a data sequence in chunks in 3.3 without
copying and without inadvertently keeping a potentially large data
object alive (and/or locked into immutability) by hanging on to a
buffer reference:

  chunk_len = 512 # For small chunks, copying is likely faster. Measure it!
  with memorview(data) as m:
      for offset in range(0, len(m), chunk_len):
          with m[offset:offset+chunk_len] as x:
              process_chunk(x)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia