[Numpy-discussion] Latest Array-Interface PEP

Sat Jan 6 22:03:43 EST 2007

Tim Hochberg wrote:
> Christopher Barker wrote:
>
> [SNIP]
>   
>> I think the PEP has far more chances of success if it's seen as a 
>> request from a variety of package developers, not just the numpy crowd 
>> (which, after all, already has numpy
>>     
> This seems eminently sensible. Getting a few developers from other 
> projects on board would help a lot; it might also reveal some 
> deficiencies to the proposal that we don't see yet.
>   
It would help quite a bit.  Are there any suggestions of who to recruit 
to review the proposal?  We should not forget that the NumPy world is 
quite diverse as well.

> I've only given the PEP a quick read through at this point, but here a 
> couple of comments:
>   
Thank you for taking the time to read through it.  I know it takes 
precious effort to do all this, which is why it's been so slow in coming 
from my end.   It is important to get a lot of discussion on something 
like this.  A lot of what is in the PEP does stem from a lot of 
discussion that's happened in the past 10 years, but admittedly some of 
it doesn't (extended data-format descriptions for example.).

>    1. It seems very numpy-centric. That's not necessarily bad, but I
>       think it would help to have some outsiders look it over -- perhaps
>       they would see things that they need that it doesn't address.
>       Conversely, there may universal opinion that some parts of it
>       aren't needed, and we can strip the proposal down somewhat.
>   
Yes, this is true.    I took the struct module, NumPy, and c-types as a 
guide for "what is needed" to be described in terms of memory.

>    2. It seems pretty complicated. In particular, the PyDataFormatObject
>       seems pretty complicated. This part in particular seems like it
>       might be a hard sell, so I expect this is going to need
>       considerable more motivation. For example:
>   
Yes, the PyDataFormatObject is complicated --- but I don't think 
un-necessarily so.  I've stripped a lot of it away from what's in NumPy 
to reduce it already.    The question really is how are you going to 
describe what an arbitrary chunk of memory represents.   One could 
restrict it to primitive types and replace the PyDataFormatObject with 
the enumerated typed and just give up on describing more complicated 
structures.

But, my contention is why?  Numarray and NumPy and C-types have already 
laid a tremendous amount of groundwork in how we can represent 
complicated data-structures.  They clearly exist so why shouldn't we 
have some mechansim to describe them. 

Once you decide to handle complicated types you need to replace the 
simple enumerated type with something that is "self-recursive" (i.e. so 
you can have fields of arbitrary data-types).  This lends itself to 
some-kind of structure design like the PyDataFormatObject.  The only 
difference in what I've proposed to the c-types approach is that c-types 
over-loads Python Type Objects.  (In other-words the PyDataFormatObject 
equivalent in c-types is at it's core a PyTypeObject while here it is 
built on PyObject).

>          1. Why do we need Py_ARRAYOF? Can't we get the same effect just
>             using longer shape and strides arrays?
>   
Yes, this is true for a single data-format in isolation (and in fact 
exactly what you get when you instantiate in NumPy a data-type that is 
an array of another primitive data-type).   However, how do you describe 
a structure whose second field is an array of a primitive type?  This is 
where the ARRAYOF qualifier is needed.  In NumPy, actually, it's not 
done this way, but a separate subarray field in the data-type object is 
used.  After studying c-types,  however, I think this approach is better.

>          2. Is there any type besides Py_STRUCTURE that can have names
>             and fields. If so, what and what do they mean. If not, you
>             should just say that.
>   
Yes, you can add fields to a multi-byte primitive if you want.  This 
would be similar to thinking about the data-format as a C-like union.   
Perhaps the data-field has meaning as a 4-byte integer but the 
most-significant and least-significant bytes should also be addressable 
individually.

>          3. And on this topic, why a tuple of ([names,..], {field})? Why
>             not simply a list of (name, dfobject, offset, meta) for
>             example? And what's the meta information if it's not PyNone?
>             Just a string? Anything at all?
>   

The list of names is useful for having an ordered list so you can 
traverse the structure in field order.   It is technically not necessary 
but it makes it a lot easier to parse a data-format object in offset 
order (it is used a bit in NumPy, for example).

The meta information is a place holder for field tags and future growth 
(kind of like column headers in a spreadsheet).  It started as a place 
to put a "longer" name or to pass along information about a field (like 
units) through.

-Travis