[Numpy-discussion] Masked arrays: Rationale for "False convention"

Tue Oct 1 13:23:43 EDT 2013

On Tue, Oct 1, 2013 at 4:29 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Tue, Oct 1, 2013 at 3:57 AM, Ondřej Čertík <ondrej.certik at gmail.com>
> wrote:
>
>> I see, that makes sense. So to remember this, the rule is:
>>
>> "Specify elements that you want to get masked using True in 'mask'".
>
> Yes. This convention dates back at least to the original MA package in
> Numeric; I don't know if Paul Dubois stole it from any previous software.

I see, thanks.

>
> One way to motivate the convention is to think about doing a binary
> operation on masked arrays, which is really the most common kind of thing
> one does with masked arrays. The mask of the result is the logical OR of the
> two operand masks (barring additional masked elements from domain
> violations, 0/0, etc.).

In the other convention, you just use logical AND, so that seams equally
simple, unless I am missing something.

> I assume that the convention was decided mostly on
> what was most convenient and efficient for the common internal operations
> for *implementing* the masked arrays and not necessarily matching any
> particular intuitions when putting data *into* the masked arrays.

That makes sense.

On Mon, Sep 30, 2013 at 9:05 PM, Eric Firing <efiring at hawaii.edu> wrote:
> On 2013/09/30 4:57 PM, Ondřej Čertík wrote:
>>
>> But why do I need to invert the mask when I want to see the valid elements:
>>
>> In [1]: from numpy import ma
>>
>> In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])
>>
>> In [3]: a
>> Out[3]:
>> masked_array(data = [1 2 -- 4],
>>               mask = [False False  True False],
>>         fill_value = 999999)
>>
>>
>> In [4]: a[~a.mask]
>> Out[4]:
>> masked_array(data = [1 2 4],
>>               mask = [False False False],
>>         fill_value = 999999)
>>
>>
>> I would find natural to write [4] as a[a.mask]. This is when it gets confusing.
>
> There is no getting around it; each of the two possible conventions has
> its advantages.  But try this instead:
>
> In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])
>
> In [3]: a.compressed()
> Out[3]: array([1, 2, 4])
>
>
> I do occasionally need a "goodmask" which is the inverse of a.mask, but
> not very often; and when I do, needing to invert a.mask doesn't bother me.

a.compressed() works for getting data out --- but I also use it to
assign data in,
e.g.:

a[~a.mask] = 1

Thanks everybody for the discussion. It sheds some light onto the current
convention.

Ondrej