[Numpy-discussion] Re: ndarray.fill and ma.array.filled

Sasha ndarray at mac.com
Mon Apr 10 16:06:00 EDT 2006


On 4/10/06, Pierre GM <pgmdevlist at mailcan.com> wrote:
> > > [... longish example snipped ...]
> > >
> > >>> ma.array([1,1], mask=[0,1]).sum()
> >
> > 1
> So ? The result is not `masked`, the missing value has been omitted.
>
I am just making your point with a shorter example.

> [...]
> Mrf. I'm still not convinced, but I have nothing against it. Along with a
> mask=False_ by default ?
>
It looks like there is little opposition here.  I'll submit a patch
soon and unless better names are suggested, it will probably go in.

> > With the current behavior, how would you achieve masking (no fill) a.sum()?
> Er, why would I want to get MA.masked along one axis if one value is masked  ?

Because if you don't know one of the addends you don't know the sum. 
Replacing missing values with zeros is not always the right strategy.
If you know that your data has non-zero mean, for example, you might
want to replace missing values with the mean instead of zero.


> The current behavior is to mask only if all the values along that axis are
> masked:
>
> MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum()
> array(data = [1 999999],   mask = [False True], fill_value=999999)
>

I did not realize that, but it is really bad. What is the
justification for this?
In R:

> sum(c(NA,NA), na.rm=TRUE)
[1] 0

What does MATLAB do in this case?


> With a.filled(0).sum(), how would you distinguish between the cases (a) at
> least one value is not masked and (b) all values are masked  ? (OK, by
> querying the mask with something in the line of a a._mask.all(axis), but it's
> longer... Oh well, I'll just to adapt)
>

Exactly. Explicit is better than implicit. The Zen of Python
<http://www.python.org/dev/peps/pep-0020>.

> > > - this behavior was already in Numeric
> >
> > That's true, but it makes the result of sum(a) different from
> > __builtins__.sum(a).  I believe consistency with the python
> > conventions is more important than with legacy Numeric in the long
> > run.
> >
> > Array methods are a very recent addition to ma.  We can still use this
> > window of opportunity to get things right before to many people get
> > used to the wrong behavior.  (Note that I changed your implementation
> > of cumsum and cumprod.)
>
> Good points... We'll just have to put strong warnings everywhere.
>
Do you agree with my proposal as long as we have explicit warnings in
the documentation that methods behave differently from legacy
functions?

> [... GIS comment snipped ...]

> > With the flag approach making ndarray and ma.array interfaces
> > consistent would require adding an extra argument to many methods.
> > Instead, I poropose to add one method: fill to ndarray.
> OK, good point.
>
>
> On a semantic aspect:
> While digging these GRASS scripts I mentioned, I realized/remembered that
> masked values are called 'null', when there's no data, a NAN, or just when
> you want to hide some values. What about 'null' instead of
> 'mask','missing','na' ?
>

I don't think "null" returning an array of bools will create a lot of
enthusiasm.  It sounds more like ma.masked as in a[i] = ma.masked.
Besides, there is probably a reason why python uses the name "None"
instead of "Null" - I just don't know what it is :-).




More information about the NumPy-Discussion mailing list