[Numpy-discussion] Re: ndarray.fill and ma.array.filled

Mon Apr 10 13:37:07 EDT 2006

> > [... longish example snipped ...]
> >
> >>> ma.array([1,1], mask=[0,1]).sum()
>
> 1
So ? The result is not `masked`, the missing value has been omitted.

MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum()
array(data = [1 1],   mask = [False False], fill_value=999999)

> This is exactly the point of the current discussion: make fill a
> method of ndarray.
Mrf. I'm still not convinced, but I have nothing against it. Along with a 
mask=False_ by default ?

> With the current behavior, how would you achieve masking (no fill) a.sum()?
Er, why would I want to get MA.masked along one axis if one value is masked  ? 
The current behavior is to mask only if all the values along that axis are 
masked:

MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum()
array(data = [1 999999],   mask = [False True], fill_value=999999)

With a.filled(0).sum(), how would you distinguish between the cases (a) at 
least one value is not masked and (b) all values are masked  ? (OK, by 
querying the mask with something in the line of a a._mask.all(axis), but it's 
longer... Oh well, I'll just to adapt)

> > - this behavior was already in Numeric
>
> That's true, but it makes the result of sum(a) different from
> __builtins__.sum(a).  I believe consistency with the python
> conventions is more important than with legacy Numeric in the long
> run.
>
> Array methods are a very recent addition to ma.  We can still use this
> window of opportunity to get things right before to many people get
> used to the wrong behavior.  (Note that I changed your implementation
> of cumsum and cumprod.)

Good points... We'll just have to put strong warnings everywhere.

> >
> > - The current way reflects how mask are used in GIS or image processing.
>
> Can you elaborate on this? Note that in R na.rm is false by default in sum:
> > sum(c(1,NA))
>
> [1] NA
>
> So it looks like the convention is different in the field of statistics.

MMh. *digs in his old GRASS scripts* 
OK, my bad. I had to fill missing values somehow, or at least check whether 
there were any before processing. I'll double check on that. Please 
temporarily forget that comment.

> With the flag approach making ndarray and ma.array interfaces
> consistent would require adding an extra argument to many methods.
> Instead, I poropose to add one method: fill to ndarray.
OK, good point.

On a semantic aspect:
While digging these GRASS scripts I mentioned, I realized/remembered that 
masked values are called 'null', when there's no data, a NAN, or just when 
you want to hide some values. What about 'null' instead of 
'mask','missing','na' ?