[Python-Dev] PEP 3118: Extended buffer protocol (new version)

Carl Banks pythondev at aerojockey.com
Tue Apr 17 08:36:39 CEST 2007


Travis Oliphant wrote:
> Carl Banks wrote:
>> My recommendation is, any flag should turn on some circle in the Venn
>> diagram (it could be a circle I didn't draw--shaped arrays, for
>> example--but it should be *some* circle).
> I don't think your Venn diagram is broad enough and it un-necessarily 
> limits the use of flags to communicate between consumer and exporter.   
> We don't have to ram these flags down that point-of-view for them to be 
> productive.    If you have a specific alternative proposal, or specific 
> criticisms, then I'm very willing to hear them.


Ok, I've thought quite a bit about this, and I have an idea that I think 
will be ok with you, and I'll be able to drop my main objection.  It's 
not a big change, either.  The key is to explicitly say whether the flag 
allows or requires.  But I made a few other changes as well.

First of all, let me define how I'm using the word "contiguous": it's a 
single buffer with no gaps.  So, if you were to do this: 
"memset(bufinfo->buf,0,bufinfo->len)", you would not touch any data that 
isn't being exported.

Without further ado, here is my proposal:


------

With no flags, the PyObject_GetBuffer will raise an exception if the 
buffer is not direct, contiguous, and one-dimensional.  Here are the 
flags and how they affect that:

Py_BUF_REQUIRE_WRITABLE - Raise exception if the buffer isn't writable.

Py_BUF_REQUIRE_READONLY - Raise excpetion if the buffer is writable.

Py_BUF_ALLOW_NONCONTIGUOUS - Allow noncontiguous buffers.  (This turns 
on "shape" and "strides".)

Py_BUF_ALLOW_MULTIDIMENSIONAL - Allow multidimensional buffers.  (Also 
turns on "shape" and "strides".)

(Neither of the above two flags implies the other.)

Py_BUF_ALLOW_INDIRECT - Allow indirect buffers.  Implies 
Py_BUF_ALLOW_NONCONTIGUOUS and Py_BUF_ALLOW_MULTIDIMENSIONAL. (Turns on 
"shape", "strides", and "suboffsets".)

Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY or Py_BUF_REQUIRE_ROW_MAJOR - Raise an 
exception if the array isn't a contiguous array with in C (row-major) 
format.

Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY or Py_BUF_REQUIRE_COLUMN_MAJOR - 
Raise an exception if the array isn't a contiguous array with in Fortran 
(column-major) format.

Py_BUF_ALLOW_NONCONTIGUOUS, Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY, and 
Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY all conflict with each other, 
and an exception should be raised if more than one are set.

(I would go with ROW_MAJOR and COLUMN_MAJOR: even though the terms only 
make sense for 2D arrays, I believe the terms are commonly generalized 
to other dimensions.)

Possible pseudo-flags:

Py_BUF_SIMPLE = 0;
Py_BUF_ALLOW_STRIDED = Py_BUF_ALLOW_NONCONTIGUOUS
                        | Py_BUF_ALLOW_MULTIDIMENSIONAL;

------

Now, for each flag, there should be an associated function to test the 
condition, given a bufferinfo struct.  (Though I suppose they don't 
necessarily have to map one-to-one, I'll do that here.)

int PyBufferInfo_IsReadonly(struct bufferinfo*);
int PyBufferInfo_IsWritable(struct bufferinfo*);
int PyBufferInfo_IsContiguous(struct bufferinfo*);
int PyBufferInfo_IsMultidimensional(struct bufferinfo*);
int PyBufferInfo_IsIndirect(struct bufferinfo*);
int PyBufferInfo_IsRowMajor(struct bufferinfo*);
int PyBufferInfo_IsColumnMajor(struct bufferinfo*);

The function PyObject_GetBuffer then has a pretty obvious 
implementation.  Here is an except:

     if ((flags & Py_BUF_REQUIRE_READONLY) &&
             !PyBufferInfo_IsReadonly(&bufinfo)) {
         PyExc_SetString(PyErr_BufferError,"buffer not read-only");
         return 0;
     }

Pretty straightforward, no?

Now, here is a key point: for these functions to work (indeed, for 
PyObject_GetBuffer to work at all), you need enough information in 
bufinfo to figure it out.  The bufferinfo struct should be 
self-contained; you should not need to know what flags were passed to 
PyObject_GetBuffer in order to know exactly what data you're looking at.

Therefore, format must always be supplied by getbuffer.  You cannot tell 
if an array is contiguous without the format string.  (But see below.)

And even if the consumer isn't asking for a contiguous buffer, it has to 
know the item size so it knows what data not to step on.

(This is true even in your own proposal, BTW.  If a consumer asks for a 
non-strided array in your proposal, PyObject_GetBuffer would have to 
know the item size to determine if the array is contiguous.)


------

FAQ:

Q. Why ALLOW_NONCONTIGUOUS and ALLOW_MULTIDIMENSIONAL instead of 
ALLOW_STRIDED and ALLOW_SHAPED?

A. It's more useful to the consumer that way.  With ALLOW_STRIDED and 
ALLOW_SHAPED, there's no way for a consumer to request a general 
one-dimensional array (it can only request a non-strided one-dimensional 
array), and requesting a SHAPED array but not a STRIDED one can only 
return a C-like (row-major) array, although a consumer might reasonably 
want a Fortran-like (column-major) array.  This approach maps more 
directly to the consumer's needs, is more flexible, and still maintains 
the same functionality of ALLOW_SHAPED and ALLOW_STRIDED.


Q. Why call it ALLOW_INDIRECT instead of ALLOW_OFFSETS?

A. It's just a name, and not too important to me, but I wanted to 
emphasize the consumer's usage, rather than the benefit to the exporter. 
  The consumers, after all, are the ones setting the flags.


Q. Why ALLOW_NONCONTIGUOUS instead of REQUIRE_CONTIGUOUS?

Two reasons: 1. Contiguous arrays are "simpler", so it's better to make 
the people who want more complex arrays to work harder, and 2. 
ALLOW_NONCONTIGUOUS is closely tied to ALLOW_MULTIDIMENSIONAL.  If the 
negative is a problem, perhaps a name like ALLOW_DISCONTINUOUS or 
ALLOW_GAPS would be better?


Q. What about Py_BUF_FORMAT?

A. Ok, fine, if it's that imporant to you.  I think it's totally 
superfluous, but it's not evil.  But consider these things:

1. Require that it does not throw an exception.  It's not the exporter's 
business to tell the consumer to how to use its data.

2. Even if you don't supply the format string, you need to supply an 
itemsize in struct bufferinfo, otherwise there is no way for a consumer 
to determine if the array is contiguous, and or to know (in general) 
what data is being exported.  The itemsize must ALWAYS be available.

3. Invert Py_BUF_FORMAT.  Use Py_BUF_DONT_NEED_FORMAT instead.  Make the 
consumer that cares about performance ask for the optimization.  (You 
admit yourself that Py_BUF_FORMAT is part of the least common 
denominator, so invert it.)

I would be -0 on it if all three of these conditions are met.


------

Conclusion:

My main objection, that the flags are confusing because some allow and 
others restrict, would be remedied just by using ALLOW and REQUIRE in 
the constant.  Even if you still want to go with ALLOW_STRIDED and 
ALLOW_SHAPE, I'd still be -0 as long as the ALLOW is there.

I still think Py_BUF_FORMAT is superfluous, but I can live with it if 
some other things happen.



Carl Banks


More information about the Python-Dev mailing list