[Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)

Sébastien de Menten sdementen at hotmail.com
Wed Apr 6 03:12:32 EDT 2005


Hi,

I follow with great interest the threads around Numeric3/scipy.base.
As Travis suggested (“It would also help if other people who have concerns 
would voice them (I'm very grateful for those who have expressed their 
concerns) so that we can all address them and get on the same page for 
future development.”), I voice my concert J

Sometimes it is quite useful to treat data at a higher level than just an 
“array of number of some types”. Adding metadata to array (I called them 
“augmented arrays”) is a simple way to add sense to an array. I see 
different user cases like:
1)	attaching a physical unit to array data (see for instance Unum 
http://home.tiscali.be/be052320/Unum.html )
	2) description of axis (see 
http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very useful 
to manipulate easily time series.
	3) masked arrays as in MA module of Numeric
	4) arrays for interval arithmetic where one keep another array with 
precision of data
	5) record arrays (currently being integrated in scipy.base as a base type)

The current solution for those situation is nicely summarized by quoting 
Konrad
“but rather a class written using arrays than a variety of the basic array 
type.
It’s actually pretty straightforward to implement, the most difficult choice 
being the form of the constructor that gives most flexibility in use.”

However, I disagree with the “pretty straightforward to implement”. In fact, 
if one wants to inherit most of the functionalities of Numeric, it becomes 
quite cumbersome. Looking at MA module, I see that it needs to:
1)	redefine all methods (__add__, …)
2)	redefine all ufuncs
3)	redefine all array functions (like reshape, sort, argmax, …)
For other purposes, the same burden may apply.

A general solution to this problem is not straightforward and may be out of 
reach (computationally and/or conceptually).
However, a quite-general-enough elegant solution could solve most practical 
problems.

Looking at threads in this list, I think that there is enough brain power to 
get to something usable in the medium term.

An embryo of idea would be to add hooks in the machinery to allow an object 
to interact with an ufunc. Currently, this is done by calling __array__ to 
extract a “naked array” (== Numeric.array vs “augmented array”) but the 
result is then always a “naked array”.
In pseudocode, this looks like:

  def ufunc( augmented_array ):
    if not isarray(augmented_array):
      augmented_array = augmented_array.__array__()
    return ufunc.apply(augmented_array)

where I would prefer something like

  def ufunc( augmented_array ):
    if not isarray(augmented_array):
      augmented_array, contructor = augmented_array.__array_constructor__()
    else:
      constructor = lambda x:x
    return constructor(ufunc.apply(augmented_array))

For array functions and methods, I have even less clues to a solution J. But 
calling hooks specified by some protocol would be a path:
a)	__array_constructor__
b)	__array_binary_op__ (would be called for __add__, __sub__, …)
c)	__array_rbinary_op__ (would be called for __radd__, __rsub__, …)

If I miss a point and there is an easy way to do this, I’ll be pleased to 
know it.
Otherwise, any feedback on this ability to easily increase array 
functionalities by appending metadata and related behavior.

Sebastien






More information about the NumPy-Discussion mailing list