[Python-Dev] Expose the array interface in Python 2.5?

Travis E. Oliphant oliphant.travis at ieee.org
Fri Mar 17 11:40:38 CET 2006


Nick Coghlan wrote:
> Travis E. Oliphant wrote:
>> Would it be possible to add at least the C-struct array interface to the 
>> Python arrayobject in time for Python 2.5?
> 
> Do you mean simply adding an __array_shape__ attribute that consists of a 
> tuple with the array length, and an __array_type__ attribute set to 'O'?
> 
> Or trying to expose the array object's data?

I was thinking more the __array_struct__ (in particular the C-structure 
that defines it).

> 
> The former seems fairly pointless, and the latter difficult (since it has 
> implications for moving the data store when the array gets resized).

Sure, it's the same problem as exposing through the buffer protocol. 
Since, we already have that problem, why try to pretend we don't?
> 
> I've spent a fair bit of time looking at this interface, and while I'm a big
> fan of the basic idea, I'm not convinced that it makes sense to
> include the interface in the core without *also* adopting a common convention
> for multi-dimensional fixed shape indexing (e.g. by introducing a simple
> dimensioned array type as something like array.dimarray).

True, such a thing would be great, but it could also be written in 
Python fairly quickly building on top of the array and serve as a simple 
example.

My big quest is to get PIL, PyVox, WxPython, PyOpenGL, and so forth to 
be able to use the same interface.  Blessing the interface by including 
it in the Python core would help.  I'm also just wanting people in 
py-dev to get the concept of an array interface on their radar, as 
discussions of new bytes types emerges.

Sometimes, there is not enough cross-talk between numpy-discussions and 
pydev.  This is our fault, of course, but we're often swamped (I know I 
am...), and it can take some effort for us "array" people to figure out 
what's going on in the depths of Python sufficiently to comprehend some 
of the discussions here.

> 
> The fact that array.array is a mutable sequence rather than a fixed shape
> array means that it doesn't mesh particularly well with the ideas behind the 
> array interface. numpy arrays can have their shape changed via reshape, but 
> they impose the rule that the total number of elements can't change so that 
> the allocated memory doesn't need to be moved - the standard library's array 
> type has no such limitation.

This is not really a limitation of numpy arrays either.  Check the 
resize method...  But, I understand your point that array.array's are 
more-like lists.  Of course, when they behave that way, their buffer 
interface is presently broken.   So, maybe the array.array is 
sufficiently broken to not be worth "fixing", but what else should be done?

I'm kind of tired of this problem dragging on and on.  The Numeric 
header (essentially what the __array_struct__ exposes) is now basically 
unchanged for over 10 years and yet it's direct support by Python is 
still not their.   The Python community has been very helpful over the 
years, but we need more direct discussion with Python developers to help 
things along.  I'm grateful Nick has responded.  If anyone else has any 
interest in these ideas, please sound off.

> 
> Aside from the obvious (the use of Ellipsis and permitting multiple
> dimensions), there are a number of ways in which the semantics of numpy array
> subscripts differ from normal sequence subcripts, and which of these should be
> part of the common multi-dimensional indexing conventions needs to be thrashed
> out in a PEP:

While these are interesting academic issues. The problem with most of 
these comments is that you will get load voices of disapproval if any of 
these conventions changes significantly from what has become standard 
via Numeric's use over 10 years.

I think no one is up to the task of trying to re-concile Numeric 
behavior with Python-dev opinions of what 'ought' to be, unless the 
basic usage does not change too much.

> 
>    - numpy array slices are views that permit mutation of the original object
>      (slicing a sequence creates a copy of the sliced section)

Not really open for discussion among Numeric Python users as it's been 
debated for years always coming to the same (keep the current behavior) 
conclusion.
> 
>    - assignment to slices is not allowed to change the shape of a numpy array
>      (assigning to a slice of a normal sequence may change the total length)

People might be open to this idea, as it adds a new feature and doesn't 
signficantly change other usages.

> 
>    - deletion of slices is not permitted by numpy arrays
>      (deleting a slice of a sequence changes the total length)
>

Also something people might accept.

>    - NewAxis is a novel use of subscript notation

True, but not something we can really change.

> 
>    - there are sophisticated rules to try to align numpy array shapes

You are speaking of broadcasting.  These could of course be discussed, 
but current behavior is "entrenched"

> 
>    - assignment of a sequence to a numpy array section is rather disconcerting,
>      as the checks to determine what should and should not be repeated to fit
>      into the available space are type based

I'm not sure what this means... Please elaborate.


> 
> For something in the standard library, much of the complexity should be
> stripped out, with the clever bits of programmer convenience left for numpy to
> provide. However, decided which bits to remove and which to keep is a
> non-trivial task.
> 

I agree.  I suppose your itemization above was really to come to this 
conclusion as well.   But, I think a stripped-down array that doesn't 
try to guess what to do with these interfaces is a good start.  In other 
words, I disagree that you need to implement multidimensional indexing 
in order for Python to support the array interface.  All you need is a 
simple object that supports the buffer protocol and has the 
__array_struct__ method and has a C-structure very similar to the 
current NumPy array (which is very similar to the old Numeric 
C-structure).

If such a thing were in Python, then NumPy could inherit from it (as 
could other array-like objects), with the big advantage that there is at 
least one common memory model for arrays.  Others could still exist, of 
course, but at least there would be a very useful common one.


> Given that even the bytes type has been deferred to 2.6 to allow further 
> consideration of the appropriate API, my vote is to do the same for an 
> array.dimarray type and allow more time to figure out the appropriate *Python* 
> interface.

I was afraid of that.  But, unless people in pydev actually care to 
discuss these matters, I fear that yet again nothing will be done.  The 
problem is that for most of us array users, it's only community outreach 
and a desire to get people using Python talking the same array language 
that makes us really care about these things.  The NumPy library works 
fine for what we really need it to do, and it's hard to get motivated to 
   convince people that haven't used an array-language like IDL or 
MATLAB in the past to understand the reasons for NumPy's behavior.

The big difference with the bytes type, is that Numeric has 10 years of 
history behind it.  There is a lot of experience with an appropriate 
array type.  It's not like we just came up with this a few days ago :-)

As the bytes type is developed please keep in mind it's uses as the 
memory for an N-dimensional array.  Perhaps the bytes object could be a 
default way (or built on a default way) to allocate memory.  A simple 
reference-counted memory object would certainly belay the problems of 
the buffer interface that the array object currently has problems with.

In other words, the array object should not malloc it's own memory but 
create a memory object which is nothing more than a reference-counted 
pointer to memory.  Surely this has been talked about. Is there a reason 
it has not been implemented?  It would not be that hard.

Even something like that would be a first step.

Thanks for the comments.  I'm glad there is another voice here that 
cares about the issues involved.


-Travis



> 
> Regards,
> Nick.
> 



More information about the Python-Dev mailing list