[Numpy-discussion] Re: ndarray.fill and ma.array.filled
Sasha
ndarray at mac.com
Mon Apr 10 11:37:00 EDT 2006
On 4/10/06, Pierre GM <pgmdevlist at mailcan.com> wrote:
> > If you sum along a particular dimension and encounter a masked value,
> > the result is masked.
>
> That's not how it currently works (still on 0.9.6):
>
> [... longish example snipped ...]
>>> ma.array([1,1], mask=[0,1]).sum()
1
> and frankly, I'd be quite frustrated if it had to change:
> - `filled` is not a ndarray method, which means that a.filled(0).sum() fails
> if a is not MA. Right now, I can use a.sum() without having to check the
> nature of a first.
This is exactly the point of the current discussion: make fill a
method of ndarray.
With the current behavior, how would you achieve masking (no fill) a.sum()?
> - this behavior was already in Numeric
That's true, but it makes the result of sum(a) different from
__builtins__.sum(a). I believe consistency with the python
conventions is more important than with legacy Numeric in the long
run.
> [...]
> - The current way reflects how mask are used in GIS or image processing.
>
Can you elaborate on this? Note that in R na.rm is false by default in sum:
> sum(c(1,NA))
[1] NA
So it looks like the convention is different in the field of statistics.
> > If you would like to ignore masked values, you write
> > a.filled(0).sum() instead of a.sum(). In 1d case, you can also use
> > a.compress().sum().
>
> Once again, Sasha, I'd agree with you if it wasn't a major difference
Array methods are a very recent addition to ma. We can still use this
window of opportunity to get things right before to many people get
used to the wrong behavior. (Note that I changed your implementation
of cumsum and cumprod.)
>
> > In other words, what in R you achieve with a
> > flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an
> > explicit call to "fill". This is not quite the same as na.actions in
> > R, but that is what I had in mind.
>
> I kinda like the idea of a flag, though
With the flag approach making ndarray and ma.array interfaces
consistent would require adding an extra argument to many methods.
Instead, I poropose to add one method: fill to ndarray.
More information about the NumPy-Discussion
mailing list