[Python-Dev] Allocation of shape and strides fields in Py_buffer

Nick Coghlan ncoghlan at gmail.com
Tue Dec 9 14:37:11 CET 2008


Antoine Pitrou wrote:
> Alexander Belopolsky <alexander.belopolsky <at> gmail.com> writes:
>> I did not follow numpy development for the last year or more, so I
>> won't qualify as "the numpy folks," but my understanding is that numpy
>> does exactly what Nick recommended: the viewed object owns shape and
>> strides just as it owns the data.  The viewing object increases the
>> reference count of the viewed object and thus assures that data, shape
>> and strides don't go away prematurely.
> 
> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580

Note that the PEP is unambiguous as to who owns the pointers in the view
object:
"The exporter is responsible for making sure that any memory pointed to
by buf, format, shape, strides, and suboffsets is valid until
releasebuffer is called. If the exporter wants to be able to change an
object's shape, strides, and/or suboffsets before releasebuffer is
called then it should allocate those arrays when getbuffer is called
(pointing to them in the buffer-info structure provided) and free them
when releasebuffer is called."

The problem with memoryview appears to be related to the way it
calculates its own length (since that is the check that is failing when
the view blows up):

>>> a = array('i', range(10))
>>> m = memoryview(a)
>>> len(m) # This is the length in bytes, which is WRONG!
40
>>> m2 = memoryview(a)[2:8]
>>> len(m2) # This is correct
6
>>> a2 = array('i', range(6))
>>> m[:] = a    # But this works
>>> m2[:] = a2  # and this does not
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot modify size of memoryview object
>>> len(memoryview(a2)) # Ah, 24 != 6 is our problem!
24

Looks to me like there are a couple of bugs here:

The first is that memoryview is treating the len field in the Py_buffer
struct as the number of objects in the view in a few places instead of
as the total number of bytes being exposed (it is actually the latter,
as defined in PEP 3118).

The second is that the getbuf implementation in array.array is broken.
It is ONLY OK for shape to be null when ndim=0 (i.e. a scalar value). An
array is NOT a scalar value, so the array objects should be setting the
shape pointer to point to an single item array (where shape[0] is the
length of the array).

memoryview can then be fixed to use shape[0] instead of len to get the
number of objects in the view.

memoryview also currently gets the shape wrong on slices:

>>> m.shape
(10,)
>>> m2.shape
(10,)


Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------


More information about the Python-Dev mailing list