[Python-Dev] Extended Buffer Interface/Protocol

Carl Banks pythondev at aerojockey.com
Tue Mar 27 00:49:50 CEST 2007


Travis Oliphant wrote:
> Carl Banks wrote:
>> We're done.  Return pointer.
> 
> Thank you for this detailed example.  I will have to parse it in more 
> depth but I think I can see what you are suggesting.
> 
>> First, I'm not sure why getbuffer needs to return a view object. 
> 
> The view object in your case would just be the ImageObject.  

ITSM that we are using the word "view" very differently.  Consider this 
example:

A = zeros((100,100))
B = A.transpose()

In this scenario, A would be the exporter object, I think we both would 
call it that.  When I use the word "view", I'm referring to B.  However, 
you seem to be referring to the object returned by A.getbuffer, right? 
What term have you been using to refer to B?  Obviously, it would help 
the discussion if we could get our terminology straight.

(Frankly, I don't agree with your usage; it doesn't agree with other 
uses of the word "view".  For example, consider the proposed Python 3000 
dictionary views:

D = dict()
V = D.items()

Here, V is the view, and it's analogous to B in the above example.)

I'd suggest the object returned by A.getbuffer should be called the 
"buffer provider" or something like that.

For the sake of discussion, I'm going to avoid the word "view" 
altogether.  I'll call A the exporter, as before.  B I'll refer to as 
the requestor.  The object returned by A.getbuffer is the provider.


 > The reason
 > I was thinking the function should return "something" is to provide more
 > flexibility in what a view object actually is.
 >
> I've also been going back and forth between explicitly passing all this 
> information around or placing it in an actual view-object.  In some 
> sense, a view object is a NumPy array in my mind.  But, with the 
> addition of isptr we are actually expanding the memory abstraction of 
> the view object beyond an explicit NumPy array.
>
> In the most common case, I envisioned the view object would just be the 
> object itself in which case it doesn't actually have to be returned. 
> While returning the view object would allow unspecified flexibilty in 
> the future, it really adds nothing to the current vision.
 >
> We could add a view object separately as an abstract API on top of the 
> buffer interface.

Having thought quite a bit about it, and having written several abortive 
replies, I now understand it and see the importance of it.  getbuffer 
returns the object that you are to call releasebuffer on.  It may or may 
not be the same object as exporter.  Makes sense, is easy to explain.

It's easy to see possible use cases for returning a different object.  A 
  hypothetical future incarnation of NumPy might shift the 
responsibility of managing buffers from NumPy array object to a hidden 
raw buffer object.  In this scenario, the NumPy object is the exporter, 
but the raw buffer object the provider.

Considering this use case, it's clear that getbuffer should return the 
shape and stride data independently of the provider.  The raw buffer 
object wouldn't have that information; all it does is store a pointer 
and keep a reference count.  Shape and stride is defined by the exporter.


>> Second question: what happens if a view wants to re-export the buffer? 
>> Do the views of the buffer ever change?  Example, say you create a 
>> transposed view of a Numpy array.  Now you want a slice of the 
>> transposed array.  What does the transposed view's getbuffer export?
> 
> Basically, you could not alter the internal representation of the object 
> while views which depended on those values were around.
>
> In NumPy, a transposed array actually creates a new NumPy object that 
> refers to the same data but has its own shape and strides arrays.
> 
> With the new buffer protocol, the NumPy array would not be able to alter 
> it's shape/strides/or reallocate its data areas while views were being 
> held by other objects.

But requestors could alter their own copies of the data, no?  Back to 
the transpose example: B itself obviously can't use the same "strides" 
array as A uses.  It would have to create its own strides, right?

So, what if someone takes a slice out of B?  When calling B.getbuffer,
does it get B's strides, or A's?

I think it should get B's.  After all, if you're taking a slice of B, 
don't you care about the slicing relative to B's axes?  I can't really 
think of a use case for exporting A's stride data when you take a slice 
of B, and it doesn't seem to simplify memory management, because B has 
to make it's own copies anyways.


> With the shape and strides information, the format information, and the 
> data buffer itself, there are actually several pieces of memory that may 
> need to be protected because they may be shared with other objects. 
> This makes me wonder if releasebuffer should contain an argument which 
> states whether or not to release the memory, the shape and strides 
> information, the format information, or all of it.

Here's what I think: the lock should only apply to the buffer itself, 
and not to shape and stride data at all.  If the requestor wants to keep 
its own copies of the data, it would have to malloc its own storage for 
it.  I expect that this would be very rare.

As for the provider; I think that's between it the exporter.  If the 
exporter and provider know about each other, they shouldn't have any 
problems managing memory together.


> Having such a thing as a view object would actually be nice because it 
> could hold on to a particular view of data with a given set of shape and 
> strides (whose memory it owns itself) and then the exporting object 
> would be free to change it's shape/strides information as long as the 
> data did not change.

What I don't undestand is why it's important for the provider to retain 
this data.  The requestor only needs the information when it's created; 
it will calculate its own versions of the data, and will not need the 
originals again, so no need to the provider to keep them around.

Indeed, in the use case I described of the raw buffer object, the 
provider doesn't even know about the exporter's shape and strides.


>> The reason I ask is: if things like "buf" and "strides" and "shape" 
>> could change when a buffer is re-exported, then it can complicate things 
>> for PIL-like buffers.  (How would you account for offsets in a dimension 
>> that's in a subarray?)
> 
> I'm not sure what you mean, offsets are handled by changing the starting 
> location of the pointer to the buffer.

Yes, well, the single buffer problem isn't as well solved as I 
originally thought, so maybe it's best to focus on that first.

But to anwser your question: you can't just change the starting location 
because there's hundreds of buffers.  You'd either have to change the 
starting location of each one of them, which is highly undesirable, or 
to factor in an offset somehow.  (This was, in fact, the point of the 
"derefoff" term in my original suggestion.)


Anyways, despite the miscommunications so far, I now have a very good 
idea of what's going on.  We definitely need to get terms straight.  I 
agree that getbuffer should return an object.  I think we need to think 
harder about the case when requestors re-export the buffer.  (Maybe it's 
time to whip up some experimental objects?)


Carl Banks


More information about the Python-Dev mailing list