[Numpy-discussion] metadata and metabehavior for arrays (for scipy.base or Numeric3)
Sébastien de Menten
sdementen at hotmail.com
Wed Apr 6 03:12:32 EDT 2005
Hi,
I follow with great interest the threads around Numeric3/scipy.base.
As Travis suggested (It would also help if other people who have concerns
would voice them (I'm very grateful for those who have expressed their
concerns) so that we can all address them and get on the same page for
future development.), I voice my concert J
Sometimes it is quite useful to treat data at a higher level than just an
array of number of some types. Adding metadata to array (I called them
augmented arrays) is a simple way to add sense to an array. I see
different user cases like:
1) attaching a physical unit to array data (see for instance Unum
http://home.tiscali.be/be052320/Unum.html )
2) description of axis (see
http://sourceforge.net/mailarchive/message.php?msg_id=11051806). Very useful
to manipulate easily time series.
3) masked arrays as in MA module of Numeric
4) arrays for interval arithmetic where one keep another array with
precision of data
5) record arrays (currently being integrated in scipy.base as a base type)
The current solution for those situation is nicely summarized by quoting
Konrad
but rather a class written using arrays than a variety of the basic array
type.
Its actually pretty straightforward to implement, the most difficult choice
being the form of the constructor that gives most flexibility in use.
However, I disagree with the pretty straightforward to implement. In fact,
if one wants to inherit most of the functionalities of Numeric, it becomes
quite cumbersome. Looking at MA module, I see that it needs to:
1) redefine all methods (__add__,
)
2) redefine all ufuncs
3) redefine all array functions (like reshape, sort, argmax,
)
For other purposes, the same burden may apply.
A general solution to this problem is not straightforward and may be out of
reach (computationally and/or conceptually).
However, a quite-general-enough elegant solution could solve most practical
problems.
Looking at threads in this list, I think that there is enough brain power to
get to something usable in the medium term.
An embryo of idea would be to add hooks in the machinery to allow an object
to interact with an ufunc. Currently, this is done by calling __array__ to
extract a naked array (== Numeric.array vs augmented array) but the
result is then always a naked array.
In pseudocode, this looks like:
def ufunc( augmented_array ):
if not isarray(augmented_array):
augmented_array = augmented_array.__array__()
return ufunc.apply(augmented_array)
where I would prefer something like
def ufunc( augmented_array ):
if not isarray(augmented_array):
augmented_array, contructor = augmented_array.__array_constructor__()
else:
constructor = lambda x:x
return constructor(ufunc.apply(augmented_array))
For array functions and methods, I have even less clues to a solution J. But
calling hooks specified by some protocol would be a path:
a) __array_constructor__
b) __array_binary_op__ (would be called for __add__, __sub__,
)
c) __array_rbinary_op__ (would be called for __radd__, __rsub__,
)
If I miss a point and there is an easy way to do this, Ill be pleased to
know it.
Otherwise, any feedback on this ability to easily increase array
functionalities by appending metadata and related behavior.
Sebastien
More information about the NumPy-Discussion
mailing list