[Numpy-discussion] masked arrays as array indices (is a bad idea)
Ernest Adrogué
eadrogue at gmx.net
Mon Sep 21 16:23:42 EDT 2009
21/09/09 @ 14:43 (-0400), thus spake Pierre GM:
>
>
> On Sep 21, 2009, at 12:17 PM, Ryan May wrote:
>
> > 2009/9/21 Ernest Adrogué <eadrogue at gmx.net>
> > Hello there,
> >
> > Given a masked array such as this one:
> >
> > In [19]: x = np.ma.masked_equal([-1, -1, 0, -1, 2], -1)
> >
> > In [20]: x
> > Out[20]:
> > masked_array(data = [-- -- 0 -- 2],
> > mask = [ True True False True False],
> > fill_value = 999999)
> >
> > When you make an assignemnt in the vein of x[x == 0] = 25
> > the result can be a bit puzzling:
> >
> > In [21]: x[x == 0] = 25
> >
> > In [22]: x
> > Out[22]:
> > masked_array(data = [25 25 25 25 2],
> > mask = [False False False False False],
> > fill_value = 999999)
> >
> > Is this the correct result or have I found a bug?
> >
> > I see the same here on 1.4.0.dev7400. Seems pretty odd to me. Then
> > again, it's a bit more complex using masked boolean arrays for
> > indexing since you have True, False, and masked values. Anyone have
> > thoughts on what *should* happen here? Or is this it?
>
> Using a masked array in fancy indexing is always a bad idea, as
> there's no way of guessing the behavior one would want for missing
> values: should they be evaluated as False ? True ? You should really
> use the `filled` method to control the behavior.
>
> >>> x[(x==0).filled(False)]
> masked_array(data = [0],
> mask = [False],
> fill_value = 999999)
> >>>x[(x==0).filled(True)]
> masked_array(data = [-- -- 0 --],
> mask = [ True True False True],
> fill_value = 999999)
>
> P.
>
> [If you're really interested:
> When testing for equality, a masked array is first filled with 0 (that
> was the behavior of the first implementation of numpy.ma), tested for
> equality, and the mask of the result set to the mask of the input.
> When used in fancy indexing, a masked array is viewed as a standard
> ndarray by dropping the mask. In the current case, the combination is
> therefore equivalent to (x.filled(0)==0), which explains why the
> missing values are treated as True... I agree that the prefilling may
> not be necessary...]
This explains why x[x == 3] = 4 works "as expected", whereas
x[x == 0] = 4 ruins everything. Basically, any condition that matches
0 will match every masked item as well.
I don't know, but maybe it would be better to raise an exception when
the index is a masked array then. The current behaviour seems a bit
confusing to me.
--
Ernest
More information about the NumPy-Discussion
mailing list