[Numpy-discussion] PyBUF_SIMPLE/PyBUF_FORMAT: casts to unsigned bytes

Stefan Krah stefan-usenet at bytereef.org
Tue Aug 23 08:10:48 EDT 2011


Hello,

PEP-3118 presumably intended that a PyBUF_SIMPLE request should cast the
original buffer's data type to 'B' (unsigned bytes). Here is a one-dimensional
example that currently occurs in Lib/test/test_multiprocessing:

>>> import array, io
>>> a = array.array('i', [1,2,3,4,5])
>>> m = memoryview(a)
>>> m.format
'i'
>>> buf = io.BytesIO(bytearray(5*8))
>>> buf.readinto(m)

buf.readinto() calls PyObject_AsWriteBuffer(), which requests a simple buffer
from the memoryview, thus casting the 'i' data type to the implied type 'B'.

The consumer can see that a cast has occurred because the new buffer's
format field is NULL.


This seems fine for the one-dimensional case. Numpy currently also allows
such casts for multidimensional contiguous and non-contiguous arrays.
See below for the examples; I don't want to distract from the main
point of the post, which is this:



I'm seeking a clear specification for the Python documentation that determines
under what circumstances casts to 'B' should succeed. I'll formulate the points
as statements for clarity, but in fact they are also questions:

1) An exporter of a C-contiguous array with ndim <= 1 MUST honor
   a PyBUF_SIMPLE request, setting format, shape and strides to NULL
   and itemsize to 1.

   As a corner case, an array with ndim = 0, format = "L" (or other)
   would also morph into a buffer of unsigned bytes. test_ctypes
   currently makes use of this.

2) An exporter of a C-contiguous buffer with ndim > 1 MUST honor
   a PyBUF_SIMPLE request, setting format, shape, and strides to NULL
   and itemsize to 1.

3) An exporter of a buffer that is not C-contiguous MUST raise BufferError
   in response to a PyBUF_SIMPLE request.


Why am I looking for such rigid rules? The problem with memoryview is
that it has to act as a re-exporter itself.

For several reasons (performance of chained memoryviews, garbage collection,
early release, etc.) it has been decided that the new memoryview object has
a managed buffer that takes a snapshot of the original exporter's buffer
(See: http://bugs.python.org/issue10181).

Now, since getbuffer requests to the memoryview object cannot be redirected
to the original object, strict rules are needed for memory_getbuf().



Could you agree with these rules? Point 2) isn't clear from the PEP itself.
I assumed it because Numpy currently allows it, and it appears harmless.


Stefan Krah


Examples:
=========

Cast a multidimensional contiguous array:
-----------------------------------------

I think itemsize in the result should be 1.

[_testbuffer.ndarray is from http://hg.python.org/features/pep-3118#memoryview]

>>> from _testbuffer import *
>>> from numpy import *
>>> from _testbuffer import ndarray as pyarray
>>>
>>> exporter = ndarray(shape=[3,4], dtype="L")
# Issue a PyBUF_SIMPLE request to 'exporter' and act as a re-exporter:
>>> x = pyarray(exporter, getbuf=PyBUF_SIMPLE)
>>> x.len
96
>>> x.shape
()
>>> x.strides
()
>>> x.format
''
>>> x.itemsize # I think this should be 1, not 8.
8

Cast a multidimensional non-contiguous array:
---------------------------------------------

This is clearly not right, since y.buf points to a location that the consumer
cannot handle without shape and strides.

>>> nd = ndarray(buffer=bytearray(96), shape=[3,4], dtype="L")
[182658 refs]
>>> exporter = nd[::-1, ::-2]
[182661 refs]
>>> exporter
array([[0, 0],
       [0, 0],
       [0, 0]], dtype=uint64)
[182659 refs]
>>> y = pyarray(exporter, getbuf=PyBUF_SIMPLE)
[182665 refs]
>>> y.len
48
[182666 refs]
>>> y.strides
()
[182666 refs]
>>> y.shape
()
[182666 refs]
>>> y.format
''
[182666 refs]
>>> y.itemsize
8
[182666 refs]






More information about the NumPy-Discussion mailing list