[Numpy-discussion] Clarifications in numpy.ma module

Tue Dec 30 16:04:36 EST 2014

On Tue, Dec 30, 2014 at 3:29 PM, Alexander Belopolsky <ndarray at mac.com>
wrote:

> On Tue, Dec 30, 2014 at 2:49 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>> Where does it say that operations on masked arrays should not produce
>> NaNs?
>
>
> Masked arrays were invented with the specific goal to avoid carrying NaNs
> in computations.  Back in the days, NaNs were not available on some
> platforms and had significant performance issues on others.  These days NaN
> support for floating point types is nearly universal, but numpy types are
> not limited by floating point.
>
>
>From the numpy.ma docstring:
"Arrays sometimes contain invalid or missing data.  When doing operations
    on such arrays, we wish to suppress invalid values, which is the
purpose masked
    arrays fulfill (an example of typical use is given below)."

A few lines down:
"Here, we construct a masked array that suppress all ``NaN`` values.  We
    may now proceed to calculate the mean of the other values"

Note the repeated usage of the term "suppress" in the context of the input
arrays. The phrase "We may now proceed to calculate the mean of the other
values" implies that the mean of a masked array is taken to be the mean of
everything but the masked values. If there are no values remaining, then I
expect it to give me the equivalent of np.mean([]).

> > Having np.mean([]) return the same thing as np.ma.mean([]) makes
> complete sense.
>
> Does the following make sense as well?
>
> >>> import numpy
> >>> numpy.ma.masked_values([0, 0], 0).mean()
> masked
> >>> numpy.ma.masked_values([0], 0).mean()
> masked
> >>> numpy.ma.masked_values([], 0).mean()
> * Two warnings *
> masked_array(data = nan,
>              mask = False,
>        fill_value = 0.0)
>
>
No, I would consider the first two to be bugs. And actually, returning a
masked array in the third one is also incorrect in this case. The result
should be a scalar. This is now veering to the same issues discussed in the
np.nanmean([]) vs. np.nanmean([np.nan]) discussion.

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141230/9628231b/attachment.html>