[Numpy-discussion] Let's blame Java [was ndarray.fill and ma.array.filled]
Tim Hochberg
tim.hochberg at cox.net
Tue Apr 11 16:15:04 EDT 2006
As I understand it, the goal that Sasha is pursuing here is to make
masked arrays and normal arrays interchangeable as much as practical. I
believe that there is reasonable consensus that this is desirable. Sasha
has proposed a compromise solution that adds minimal attributes to
ndarray while allowing a lot of interoperability between ma and ndarray.
However it has it's clunky aspects as evidenced by the pushback he's
been getting from masked array users.
Here's one example. In the masked array context it seems perfectly
reasonable to pass a fill value to sum. That is:
x.sum(fill=0.0)
But, if you want to preserve interoperability, that means you have to
add fill arguments to all of the ndarray methods and what do you have? A
mess! Particularly is some *other* package comes along that we decide is
important to support in the same manner as ma. Then we have another set
of methods or keyword args that we need to tack on to ndarray. Ugh!
However, I know who, or rather what, to blame for our problems: the
object-oriented hype industry in general and Java in particular <0.1
wink>. Why? Because the root of the problem here is the move from
functions to methods in numpy. I appreciate a nice method as much as the
nice person, but they're not always better than the equivalent function
and in this case they're worse.
Let's fantasize for a minute that most of the methods of ndarray
vanished and instead we went back to functions. Just to show that I'm
not a total purist, I'll let the mask attribute stay on both MaskedArray
and ndarray. However, filled bites the dust on *both* MaskedArray and
ndarray just like the rest. How would we deal with sum then? Something
like this:
# ma.py
def filled(x, fill):
x = x.copy()
if x.mask is not False:
x[x.mask] = value
x.umask()
return x
def sum(x, axis, fill=None):
if fill is not None:
x = filled(x, fill)
# I'm blowing off the correct treatment of the fill=None case
here because I'm lazy
return add.reduce(x, axis)
# numpy.py (or __init__ or oldnumeric or something)
def sum(x, axis):
if x.mask is not False:
raise ValueError("use ma.sum for masked arrays")
return add.reduce(x, axis)
[Fixing the fill=None case and dealing correctly dtype is left as an
exercise for the reader.]
All of the sudden all of the problems we're running into go away. Users
of masked arrays simply use the functions from ma and can use ndarrays
and masked arrays interchangeably. On the other hand, users of
non-masked arrays aren't burdened with the extra interface and if they
accidentally get passed a masked array they quickly find about it (you
don't want to be accidentally using masked arrays in an application that
doesn't expect them -- that way lies disaster).
I realize that railing against methods is tilting at windmills, but
somehow I can't help myself ;-|
Regards,
-tim
More information about the NumPy-Discussion
mailing list