[Numpy-discussion] Latest Array-Interface PEP
Travis Oliphant
oliphant at ee.byu.edu
Sat Jan 6 22:03:43 EST 2007
Tim Hochberg wrote:
> Christopher Barker wrote:
>
> [SNIP]
>
>> I think the PEP has far more chances of success if it's seen as a
>> request from a variety of package developers, not just the numpy crowd
>> (which, after all, already has numpy
>>
> This seems eminently sensible. Getting a few developers from other
> projects on board would help a lot; it might also reveal some
> deficiencies to the proposal that we don't see yet.
>
It would help quite a bit. Are there any suggestions of who to recruit
to review the proposal? We should not forget that the NumPy world is
quite diverse as well.
> I've only given the PEP a quick read through at this point, but here a
> couple of comments:
>
Thank you for taking the time to read through it. I know it takes
precious effort to do all this, which is why it's been so slow in coming
from my end. It is important to get a lot of discussion on something
like this. A lot of what is in the PEP does stem from a lot of
discussion that's happened in the past 10 years, but admittedly some of
it doesn't (extended data-format descriptions for example.).
> 1. It seems very numpy-centric. That's not necessarily bad, but I
> think it would help to have some outsiders look it over -- perhaps
> they would see things that they need that it doesn't address.
> Conversely, there may universal opinion that some parts of it
> aren't needed, and we can strip the proposal down somewhat.
>
Yes, this is true. I took the struct module, NumPy, and c-types as a
guide for "what is needed" to be described in terms of memory.
> 2. It seems pretty complicated. In particular, the PyDataFormatObject
> seems pretty complicated. This part in particular seems like it
> might be a hard sell, so I expect this is going to need
> considerable more motivation. For example:
>
Yes, the PyDataFormatObject is complicated --- but I don't think
un-necessarily so. I've stripped a lot of it away from what's in NumPy
to reduce it already. The question really is how are you going to
describe what an arbitrary chunk of memory represents. One could
restrict it to primitive types and replace the PyDataFormatObject with
the enumerated typed and just give up on describing more complicated
structures.
But, my contention is why? Numarray and NumPy and C-types have already
laid a tremendous amount of groundwork in how we can represent
complicated data-structures. They clearly exist so why shouldn't we
have some mechansim to describe them.
Once you decide to handle complicated types you need to replace the
simple enumerated type with something that is "self-recursive" (i.e. so
you can have fields of arbitrary data-types). This lends itself to
some-kind of structure design like the PyDataFormatObject. The only
difference in what I've proposed to the c-types approach is that c-types
over-loads Python Type Objects. (In other-words the PyDataFormatObject
equivalent in c-types is at it's core a PyTypeObject while here it is
built on PyObject).
> 1. Why do we need Py_ARRAYOF? Can't we get the same effect just
> using longer shape and strides arrays?
>
Yes, this is true for a single data-format in isolation (and in fact
exactly what you get when you instantiate in NumPy a data-type that is
an array of another primitive data-type). However, how do you describe
a structure whose second field is an array of a primitive type? This is
where the ARRAYOF qualifier is needed. In NumPy, actually, it's not
done this way, but a separate subarray field in the data-type object is
used. After studying c-types, however, I think this approach is better.
> 2. Is there any type besides Py_STRUCTURE that can have names
> and fields. If so, what and what do they mean. If not, you
> should just say that.
>
Yes, you can add fields to a multi-byte primitive if you want. This
would be similar to thinking about the data-format as a C-like union.
Perhaps the data-field has meaning as a 4-byte integer but the
most-significant and least-significant bytes should also be addressable
individually.
> 3. And on this topic, why a tuple of ([names,..], {field})? Why
> not simply a list of (name, dfobject, offset, meta) for
> example? And what's the meta information if it's not PyNone?
> Just a string? Anything at all?
>
The list of names is useful for having an ordered list so you can
traverse the structure in field order. It is technically not necessary
but it makes it a lot easier to parse a data-format object in offset
order (it is used a bit in NumPy, for example).
The meta information is a place holder for field tags and future growth
(kind of like column headers in a spreadsheet). It started as a place
to put a "longer" name or to pass along information about a field (like
units) through.
-Travis
More information about the NumPy-Discussion
mailing list