[Numpy-discussion] Buffer interface PEP
Travis Oliphant
oliphant at ee.byu.edu
Tue Mar 27 19:45:45 EDT 2007
Zachary Pincus wrote:
> Hi,
>
>
> Is this saying that either NULL or a pointer to "B" can be supplied
> by getbufferproc to indicate to the caller that the array is unsigned
> bytes? If so, is there a specific reason to put the (minor)
> complexity of handling this case in the caller's hands, instead of
> dealing with it internally to getbufferproc? In either case, the
> wording is a bit unclear, I think.
>
Yes, the wording could be more clear. I'm trying to make it easy for
exporters to change
to the new buffer interface.
The main idea I really want to see is that if the caller just passes
NULL instead of an address then it means they are assuming the data will
be "unsigned bytes" It is up to the exporter to either allow this or
raise an error.
The exporter should always be explicit if an argument for returning the
format is provided (I may have thought differently a few days ago).
> The general question is that there are several other instances where
> getbufferproc is allowed to return ambiguous information which must
> be handled on the client side. For example, C-contiguous data can be
> indicated either by a NULL strides pointer or a pointer to a properly-
> constructed strides array.
Here. I'm trying to be easy on the exporter and the consumer. If the
data is contiguous, then neither the exporter nor will likely care about
the strides. Allowing this to be NULL is like the current array
protocol convention which allows this to be None.
> Clients that can't handle C-contiguous
> data (contrived example, I know there is a function to deal with
> that) would then need to check both for NULL *and* inside the strides
> array if not null, before properly deciding that the data isn't
> usable them.
Not really. A client that cannot deal with strides will simply not pass
an address to a stride array to the buffer protocol (that argument will
be NULL). If the exporter cannot provide memory without stride
information, then an error will be raised.
> Similarly, the suboffsets can be either all negative or
> NULL to indicate the same thing.
>
I think it's much easier to check if suboffsets is NULL rather than
checking all the entries to see if they are -1 for the very common case
(i.e. the NumPy case) of no dereferencing. Also, if you can't deal
with suboffsets you would just not provide an address for them.
> Might it be more appropriate to specify only one canonical behavior
> in these cases? Otherwise clients which don't do all the checks on
> the data might not properly interoperate with providers which format
> these values in the alternate manner.
>
It's important to also be easy to use. I don't think clients should be
required to ask for strides and suboffsets if they can't handle them.
>
> Also, some typos, and places additional clarification could help:
>
>
>> 253 PYBUF_STRIDES (strides and isptr)
>>
> Should 'isptr' be 'suboffsets'?
>
Yes, but I think we are going to take out the multiple locks.
>
>> 75 of a larger array can be described without copying the data. T
>>
> Dangling 'T'.
>
Thanks,
>
>> 279 Get the buffer and optional information variables about the
>> buffer.
>> 280 Return an object-specific view object (which may be simply a
>> 281 borrowed reference to the object itself).
>>
> This phrasing (and similar phrasing elsewhere) is somewhat opaque to
> me. What's an "object-specific view object"?
>
At the moment it's the buffer provider. It is not defined because it
could be a different thing for each exporter. We are still discussing
this particular point and may drop it.
>
>> 333 The struct string-syntax is missing some characters to fully
>> 334 implement data-format descriptions already available elsewhere (in
>> 335 ctypes and NumPy for example). Here are the proposed additions:
>>
> Is the following table just the additions? If so, it might be good to
> show the full spec, and flag the specific additions. If not, then the
> additions should be flagged.
>
Yes, these are just the additions. I don't want to do the full spec, it
is already available elsewhere in the Python docs.
>
>> 341 't' bit (number before states how many bits)
>>
> vs.
>
>> 372 According to the struct-module, a number can preceed a character
>> 373 code to specify how many of that type there are. The
>>
> I'm confused -- could this be phrased more clearly? Does '5t' refer
> to a field 5-bits wide, or 5-one bit fields? Is 'ttttt' allowed? If
> so, is it equivalent to or different from '5t'?
>
Yes, 'ttttt' is equivalent to '5t' and the difference between one field
5-bits wide or 5-one bit fields is a confusion based on thinking there
are fields at all. Both of those are equivalent. If you want "fields"
then you have to define names.
>
>> 378 Functions should be added to ctypes to create a ctypes object from
>> 379 a struct description, and add long-double, and ucs-2 to ctypes.
>>
> Very cool.
>
> In general, the logic of the 'locking mechanism' should be described
> at a high level at some point. It's described in nitty-gritty
> details, but at least I would have appreciated a bit more of a
> discussion about the general how and why -- this would be helpful to
> clients trying to use the locking mechanism properly.
>
The point of locking is so that the exporter knows when it can
reallocate its buffer. Right now, reference counting is the only way to
do that. But reference counting is not specific enough. Perhaps the
reference is because of an object that is using the same memory but
perhaps the reference is just another name pointing to exactly the same
object.
In the case of NumPy, NumPy needs to know when the resize method can be
safely applied. Currently, it is ambiguous and un-clear when a NumPy
array can re-allocate its own buffer. Also, in the past exposing the
array object in Python's memory and then later re-allocating it led to
problems.
I'll try and address this more clearly.
Thanks for your feedback,
-Travis
More information about the NumPy-Discussion
mailing list