[Numpy-discussion] masked arrays as array indices (is a bad idea)

Ernest Adrogué eadrogue at gmx.net
Mon Sep 21 16:23:42 EDT 2009


21/09/09 @ 14:43 (-0400), thus spake Pierre GM:
> 
> 
> On Sep 21, 2009, at 12:17 PM, Ryan May wrote:
> 
> > 2009/9/21 Ernest Adrogué <eadrogue at gmx.net>
> > Hello there,
> >
> > Given a masked array such as this one:
> >
> > In [19]: x = np.ma.masked_equal([-1, -1, 0, -1, 2], -1)
> >
> > In [20]: x
> > Out[20]:
> > masked_array(data = [-- -- 0 -- 2],
> >             mask = [ True  True False  True False],
> >       fill_value = 999999)
> >
> > When you make an assignemnt in the vein of x[x == 0] = 25
> > the result can be a bit puzzling:
> >
> > In [21]: x[x == 0] = 25
> >
> > In [22]: x
> > Out[22]:
> > masked_array(data = [25 25 25 25 2],
> >             mask = [False False False False False],
> >       fill_value = 999999)
> >
> > Is this the correct result or have I found a bug?
> >
> > I see the same here on 1.4.0.dev7400.  Seems pretty odd to me.  Then  
> > again, it's a bit more complex using masked boolean arrays for  
> > indexing since you have True, False, and masked values.  Anyone have  
> > thoughts on what *should* happen here?  Or is this it?
> 
> Using a masked array in fancy indexing is always a bad idea, as  
> there's no way of guessing the behavior one would want for missing  
> values: should they be evaluated as False ? True ? You should really  
> use the `filled` method to control the behavior.
> 
>  >>> x[(x==0).filled(False)]
> masked_array(data = [0],
>               mask = [False],
>         fill_value = 999999)
>  >>>x[(x==0).filled(True)]
> masked_array(data = [-- -- 0 --],
>               mask = [ True  True False  True],
>         fill_value = 999999)
> 
> P.
> 
> [If you're really interested:
> When testing for equality, a masked array is first filled with 0 (that  
> was the behavior of the first implementation of numpy.ma), tested for  
> equality, and the mask of the result set to the mask of the input.   
> When used in fancy indexing, a masked array is viewed as a standard  
> ndarray by dropping the mask. In the current case, the combination is  
> therefore equivalent to (x.filled(0)==0), which explains why the  
> missing values are treated as True... I agree that the prefilling may  
> not be necessary...]

This explains why x[x == 3] = 4 works "as expected", whereas
x[x == 0] = 4 ruins everything. Basically, any condition that matches
0 will match every masked item as well.

I don't know, but maybe it would be better to raise an exception when
the index is a masked array then. The current behaviour seems a bit
confusing to me.

-- 
Ernest



More information about the NumPy-Discussion mailing list