[Numpy-discussion] Buffer Interface for Python 3.0

Tue Feb 27 20:42:53 EST 2007

On 27/02/07, Travis Oliphant <oliphant at ee.byu.edu> wrote:

> Basically, what we are going to do now is
>
> 1) Return the data-format specification in an extended struct-style string
> 2) Return the shape information in a tuple of lists: (shape, strides)
>
> There are two questions I'm grappling with right now:
>
> 1) Do we propose the inclusion of offsets in the shape information?
> NumPy does not use offsets internally but simply has a pointer to the
> start of the array.

I'm not quite sure I understand what this means. Correct me if I'm
wrong, but within numpy, an array typically lives inside a hunk of
memory allocated with malloc(); the first data element is somewhere
inside that, and any data elements are distributed according to
strides. Is that about right? The array object needs to know the
location of the first element, the strides and sizes, the data type of
each element, and it seems to me it also needs the address of the data
area, so that that can be free()d when the last array using that hunk
of memory is deallocated. In fact it would need a refcounted link to
the array...

Or, if this isn't how it works, how does numpy arrange for the array's
memory to be deleted at the right time? Do numpy arrays keep a
refcounted link to the array that "owns" the memory?

How is memory deallocation managed for the buffer protocol? It seems
like what one needs to access the memory is a buffer object plus an
offset (plus the usual strides and whatnot).

> 2) The buffer interface needs to understand the idea of discontiguous
> arrays.  If the shape/stride information is separate from the
> pointer-to-data call, then the user needs to know if that
> pointer-to-data is a "contiguous chunk" or just the beginning of a
> strided memory area (and so should not be treated as a single-segment).
>
> 3) If we support strided memory areas, then we should probably also
> allow some way for PIL-like objects to report their buffer sequence (I'm
> sure this was the origin of the multi-segment buffer protocol to begin
> with).  Or we could just ignore that possibility.  The PIL would have to
> copy memory in order to share it's images.

I'm not quite sure I understand what you mean by "contiguous" here.
One interpretation would be that any array that uses every byte
between the first and last is contiguous, and any other is
discontiguous. Another would be that any array that can be described
by strides and an offest is contiguous, as it must live in a
contiguous block of malloc()ed (or mmap()ed or whatever) memory;
discontiguous arrays would then be things like C's
array-of-pointers-to-arrays arrangement, for which each row would be a
single malloc()ed chunk but the chunks might be arranged arbitrarily
in memory.

If the former, I can't see why we would not support them, since they
naturally occur in numpy and are tidily handled by the
(shape,strides,offset) information. If the latter, supporting them is
going to be a real challenge, involving a great deal of indirection...
would the goal be to make them accessible through an interface
resembling numpy's indexing?

Anne