[Numpy-discussion] Buffer interface PEP

Travis Oliphant oliphant at ee.byu.edu
Tue Mar 27 19:45:45 EDT 2007


Zachary Pincus wrote:
> Hi,
>
>   
> Is this saying that either NULL or a pointer to "B" can be supplied  
> by getbufferproc to indicate to the caller that the array is unsigned  
> bytes? If so, is there a specific reason to put the (minor)  
> complexity of handling this case in the caller's hands, instead of  
> dealing with it internally to getbufferproc? In either case, the  
> wording is a bit unclear, I think.
>   

Yes, the wording could be more clear.   I'm trying to make it easy for 
exporters to change
to the new buffer interface.   

The main idea I really want to see is that if the caller just passes 
NULL instead of an address then it means they are assuming the data will 
be "unsigned bytes"   It is up to the exporter to either allow this or 
raise an error. 

The exporter should always be explicit if an argument for returning the 
format is provided (I may have thought differently a few days ago).

> The general question is that there are several other instances where  
> getbufferproc is allowed to return ambiguous information which must  
> be handled on the client side. For example, C-contiguous data can be  
> indicated either by a NULL strides pointer or a pointer to a properly- 
> constructed strides array. 

Here.  I'm trying to be easy on the exporter and the consumer.  If the 
data is contiguous, then neither the exporter nor will likely care about 
the strides.  Allowing this to be NULL is like the current array 
protocol convention which allows this to be None.  

> Clients that can't handle C-contiguous  
> data (contrived example, I know there is a function to deal with  
> that) would then need to check both for NULL *and* inside the strides  
> array if not null, before properly deciding that the data isn't  
> usable them.
Not really.  A client that cannot deal with strides will simply not pass 
an address to a stride array to the buffer protocol (that argument will 
be NULL).  If the exporter cannot provide memory without stride 
information, then an error will be raised.

> Similarly, the suboffsets can be either all negative or  
> NULL to indicate the same thing.
>   
I think it's much easier to check if suboffsets is NULL rather than 
checking all the entries to see if they are -1 for the very common case 
(i.e. the NumPy case) of no dereferencing.    Also, if you can't deal 
with suboffsets you would just not provide an address for them.
> Might it be more appropriate to specify only one canonical behavior  
> in these cases? Otherwise clients which don't do all the checks on  
> the data might not properly interoperate with providers which format  
> these values in the alternate manner.
>   
It's important to also be easy to use.  I don't think clients should be 
required to ask for strides and suboffsets if they can't handle them. 
>
> Also, some typos, and places additional clarification could help:
>
>   
>> 253 PYBUF_STRIDES (strides and isptr)
>>     
> Should 'isptr' be 'suboffsets'?
>   

Yes, but I think we are going to take out the multiple locks.
>   
>> 75 of a larger array can be described without copying the data.   T
>>     
> Dangling 'T'.
>   
Thanks,

>   
>> 279 Get the buffer and optional information variables about the  
>> buffer.
>> 280 Return an object-specific view object (which may be simply a
>> 281 borrowed reference to the object itself).
>>     
> This phrasing (and similar phrasing elsewhere) is somewhat opaque to  
> me. What's an "object-specific view object"?
>   
At the moment it's the buffer provider.  It is not defined because it 
could be a different thing for each exporter.   We are still discussing 
this particular point and may drop it.
>   
>> 333 The struct string-syntax is missing some characters to fully
>> 334 implement data-format descriptions already available elsewhere (in
>> 335 ctypes and NumPy for example).  Here are the proposed additions:
>>     
> Is the following table just the additions? If so, it might be good to  
> show the full spec, and flag the specific additions. If not, then the  
> additions should be flagged.
>   

Yes, these are just the additions.  I don't want to do the full spec, it 
is already available elsewhere in the Python docs.

>   
>> 341 't'               bit (number before states how many bits)
>>     
> vs.
>   
>> 372 According to the struct-module, a number can preceed a character
>> 373 code to specify how many of that type there are.  The
>>     
> I'm confused -- could this be phrased more clearly? Does '5t' refer  
> to a field 5-bits wide, or 5-one bit fields? Is 'ttttt' allowed? If  
> so, is it equivalent to or different from '5t'?
>   
Yes, 'ttttt' is equivalent to '5t'  and the difference between one field 
5-bits wide or 5-one bit fields is a confusion based on thinking there 
are fields at all.   Both of those are equivalent.  If you want "fields" 
then you have to define names. 

>   
>> 378 Functions should be added to ctypes to create a ctypes object from
>> 379 a struct description, and add long-double, and ucs-2 to ctypes.
>>     
> Very cool.
>
> In general, the logic of the 'locking mechanism' should be described  
> at a high level at some point. It's described in nitty-gritty  
> details, but at least I would have appreciated a bit more of a  
> discussion about the general how and why -- this would be helpful to  
> clients trying to use the locking mechanism properly.
>   

The point of locking is so that the exporter knows when it can 
reallocate its buffer.  Right now, reference counting is the only way to 
do that.  But reference counting is not specific enough.  Perhaps the 
reference is because of an object that is using the same memory but 
perhaps the reference is just another name pointing to exactly the same 
object.  

In the case of NumPy, NumPy needs to know when the resize method can be 
safely applied.   Currently, it is ambiguous and un-clear when a NumPy 
array can re-allocate its own buffer.  Also, in the past exposing the 
array object in Python's memory and then later re-allocating it led to 
problems.

I'll try and address this more clearly.

Thanks for your feedback,

-Travis




More information about the NumPy-Discussion mailing list