[Numpy-discussion] Let's blame Java [was ndarray.fill and ma.array.filled]

Tue Apr 11 16:15:04 EDT 2006

As I understand it, the goal that Sasha is pursuing here is to make 
masked arrays and normal arrays interchangeable as much as practical. I 
believe that there is reasonable consensus that this is desirable. Sasha 
has proposed a compromise solution that adds minimal attributes to 
ndarray while allowing a lot of interoperability between ma and ndarray. 
However it has it's clunky aspects as evidenced by the pushback he's 
been getting from masked array users.

Here's one example. In the masked array context it seems perfectly 
reasonable to pass a fill value to sum. That is:

x.sum(fill=0.0)

But, if you want to preserve interoperability, that means you have to 
add fill arguments to all of the ndarray methods and what do you have? A 
mess! Particularly is some *other* package comes along that we decide is 
important to support in the same manner as ma. Then we have another set 
of methods or keyword args that we need to tack on to ndarray. Ugh!

However, I know who, or rather what, to blame for our problems: the 
object-oriented hype industry in general and Java in particular <0.1 
wink>. Why? Because the root of the problem here is the move from 
functions to methods in numpy. I appreciate a nice method as much as the 
nice person, but they're not always better than the equivalent function 
and in this case they're worse.

Let's fantasize for a minute that most of the methods of ndarray 
vanished and instead we went back to functions. Just to show that I'm 
not a total purist, I'll let the mask attribute stay on both MaskedArray 
and ndarray. However, filled bites the dust on *both* MaskedArray and 
ndarray just like the rest. How would we deal with sum then? Something 
like this:

    # ma.py

    def filled(x, fill):
        x = x.copy()
        if x.mask is not False:
            x[x.mask] = value
        x.umask()
        return x

    def sum(x, axis, fill=None):
        if fill is not None:
            x = filled(x, fill)
        # I'm blowing off the correct treatment of the fill=None case
    here because I'm lazy
        return add.reduce(x, axis)

    # numpy.py (or __init__ or oldnumeric or something)

    def sum(x, axis):
        if x.mask is not False:
           raise ValueError("use ma.sum for masked arrays")
        return add.reduce(x, axis)

[Fixing the fill=None case and dealing correctly dtype is left as an 
exercise for the reader.]

All of the sudden all of the problems we're running into go away. Users 
of masked arrays simply use the functions from ma and can use ndarrays 
and masked arrays interchangeably. On the other hand, users of 
non-masked arrays aren't burdened with the extra interface and if they 
accidentally get passed a masked array they quickly find about it (you 
don't want to be accidentally using masked arrays in an application that 
doesn't expect them -- that way lies disaster).

I realize that railing against methods is tilting at windmills, but 
somehow I can't help myself ;-|

Regards,

-tim