[Numpy-discussion] Re: ndarray.fill and ma.array.filled
Pierre GM
pgmdevlist at mailcan.com
Mon Apr 10 13:37:07 EDT 2006
> > [... longish example snipped ...]
> >
> >>> ma.array([1,1], mask=[0,1]).sum()
>
> 1
So ? The result is not `masked`, the missing value has been omitted.
MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum()
array(data = [1 1], mask = [False False], fill_value=999999)
> This is exactly the point of the current discussion: make fill a
> method of ndarray.
Mrf. I'm still not convinced, but I have nothing against it. Along with a
mask=False_ by default ?
> With the current behavior, how would you achieve masking (no fill) a.sum()?
Er, why would I want to get MA.masked along one axis if one value is masked ?
The current behavior is to mask only if all the values along that axis are
masked:
MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum()
array(data = [1 999999], mask = [False True], fill_value=999999)
With a.filled(0).sum(), how would you distinguish between the cases (a) at
least one value is not masked and (b) all values are masked ? (OK, by
querying the mask with something in the line of a a._mask.all(axis), but it's
longer... Oh well, I'll just to adapt)
> > - this behavior was already in Numeric
>
> That's true, but it makes the result of sum(a) different from
> __builtins__.sum(a). I believe consistency with the python
> conventions is more important than with legacy Numeric in the long
> run.
>
> Array methods are a very recent addition to ma. We can still use this
> window of opportunity to get things right before to many people get
> used to the wrong behavior. (Note that I changed your implementation
> of cumsum and cumprod.)
Good points... We'll just have to put strong warnings everywhere.
> >
> > - The current way reflects how mask are used in GIS or image processing.
>
> Can you elaborate on this? Note that in R na.rm is false by default in sum:
> > sum(c(1,NA))
>
> [1] NA
>
> So it looks like the convention is different in the field of statistics.
MMh. *digs in his old GRASS scripts*
OK, my bad. I had to fill missing values somehow, or at least check whether
there were any before processing. I'll double check on that. Please
temporarily forget that comment.
> With the flag approach making ndarray and ma.array interfaces
> consistent would require adding an extra argument to many methods.
> Instead, I poropose to add one method: fill to ndarray.
OK, good point.
On a semantic aspect:
While digging these GRASS scripts I mentioned, I realized/remembered that
masked values are called 'null', when there's no data, a NAN, or just when
you want to hide some values. What about 'null' instead of
'mask','missing','na' ?
More information about the NumPy-Discussion
mailing list