[Numpy-discussion] Questions about the array interface.

Chris Barker Chris.Barker at noaa.gov
Wed Apr 6 23:36:36 EDT 2005


Travis Oliphant wrote:

> You should account for the '<' or '>' that might be present in 
> __array_typestr__   (Numeric won't put it there, but scipy.base and 
> numarray will---since they can have byteswapped arrays internally). 

Good point, but a pain. Maybe they should be required, that way I don't 
have to first check for the presence of '<' or '>', then check if they 
have the right value.

> A more generic interface would handle multiple integer types if possible 

I'd like to support doubles as well...

> (but this is a good start...)

Right. I want to get _something_ working, before I try to make it universal!

> I think one idea here is that if __array_strides__ returns None, then 
> C-style contiguousness is assumed.   In fact, I like that idea so much 
> that I just changed the interface.  Thanks for the suggestion.

You're welcome. I like that too.

> No, they won't always be there for SciPy arrays (currently 4 of them 
> are).  Only record-arrays will provide __array_descr__ for example and 
> __array_offset__ is unnecessary for SciPy arrays.  I actually don't much 
> like the __array_offset__  parameter myself, but Scott convinced me that 
> it would could be useful for very complicated array classes.

I can see that it would, but then, we're stuck with checking for all 
these optional attributes. If I don't bother to check for it, one day, 
someone is going to pass a weird array in with an offset, and a strange 
bug will show up.

> e.g.  ndarray.cint  (gives 'iX' on the correct platform).
> For now, I would check (__array_typestr__ == 'i%d' % 
> array.array('i',[0]).itemsize)

I can see that that would work, but it does feel like a hack. BEsides, I 
might be doign this in C++ anyway, so it would probably be easier to use 
sizeof()


> But, on most platforms these days an int is 4 bytes, but the about would 
> be just to make sure.

Right. Making that assumption will jsut lead to weird bugs way don't he 
line. Of course, I wouldn't be surprised if wxWidgets and/or python 
makes that assumption in other places anyway!

>> 5) Why is: __array_data__ optional? Isn't that the whole point of this?
> 
> Because the object itself might expose the buffer interface.  We could 
> make __array_data__ required and prefer that it return a buffer object.  

Couldn't it be required, and return a reference to itself if that works?

Maybe I'm just being lazy, but it feels clunky and prone to errors to 
keep having to check if a attribute exists, then use it (or not).

> So, the correct consumer usage for grabbing the data is
> 
> data = getattr(obj, '__array_data__', obj)

Ah! I hadn't noticed the default parameter to getattr(). That makes it 
much easier. Is there an equivalent in C? It doesn't look like it to me, 
but I'm kind of a newbie with the C API.

> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int 
> *buffer_len)

I'm starting to get this.

> Of course this approach has the 32-bit limit until we get this changed 
> in Python.

That's the least of my worries!

>> 6) Should __array_offset__ be optional? I'd rather it were required, 
>> but  default to zero. This way I have to check for it, then use it. 
>> Also, I assume it is an integer number of bytes, is that right?
> 
> A consumer has to check for most of the optional stuff if they want to 
> support all types of arrays.

That's not quite true. I'm happy to support only the simple types of 
arrays (contiguous, single type elements, zero offset(, but I have to 
check all that stuff to make sure that I have a simple array. The 
simplest arrays are the most common case, they should be as easy as 
possible to support.

> Again a simple:
> 
> getattr(obj, '__array_offset__', 0)
> 
> works fine.

not too bad.

Also, what if we find the need for another optional attribute later? Any 
older code won't check for it. Or maybe I'm being paranoid....

>> 7) An alternative to the above: A __simple_ flag, that means the data 
>> is a simple, C array of contiguous data of a single type. The most 
>> common use, and it would be nice to just check that flag and not have 
>> to take all other options into account.

  > I think if __array_strides__ returns None (and if an object doesn't
> expose it you can assume it) it is probably good enough.

That and __array_typestr__

Travis Oliphant wrote:
> 
> At http://numeric.scipy.org/array_interface.py
> 
> you will find the start of a set of helper functions for the array 
> interface that can make it more easy to deal with. 

Ah! this may well address my concerns. Good idea.

Thanks for all your work on this Travis.

By the way, a quote form Robin Dunn about this:

"Sweet!"

Thought you might appreciate that.

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer
                                      		
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov




More information about the NumPy-Discussion mailing list