[Numpy-discussion] bug with with fill_values in masked arrays?

Fri Mar 21 20:24:40 EDT 2008

On Friday 21 March 2008 12:55:11 Chris Withers wrote:
> Pierre GM wrote:
> > On Wednesday 19 March 2008 19:47:37 Matt Knox wrote:
> >>> 1. why am I not getting my NaN's back?
> >
> > Because they're gone when you create your masked array.
>
> Really? At least one other post has disagreed with that.

Well, yeah, my bad, that depends on whether you use masked_invalid or 
fix_invalid or just build a basic masked array.

Example:
>>>import numpy as np
>>>import numpy.ma as ma
>>>x = np.array([1,np.nan,3])
>>># Basic construction
>>>y=ma.array(x)
masked_array(data = [  1.  NaN   3.],
      mask = False,
      fill_value=1e+20)
>>>y=ma.masked_invalid(x)
masked_array(data = [1.0 -- 3.0],
      mask = [False  True False],
      fill_value=1e+20)
>>>y._data
array([  1.,  NaN,   3.])
>>>y=ma.fix_invalid(x)
masked_array(data = [1.0 -- 3.0],
      mask = [False  True False],
      fill_value=1e+20)
>>>y._data
array([  1.00000000e+00,   1.00000000e+20,   3.00000000e+00])

> And it does seem odd that a value, even if it's a nan, would be
> destroyed...

Having NaNs in an array usually reduces performance: the option we follow w/ 
fix_invalid is to clear the masked array of the NaNs, and keeping track of 
where they were by setting the mask to True at the appropriate location. That 
way, you don't have the drop of performance of having NaNs in your underlying 
array.
Oh, and NaNs will be transformed to 0 if you use ints...

> > The idea here is to
> > get rid of the nan in your data
>
> No, it's to mask them, otherwise I would have used a normal array, not a
> ma.

Nope, the idea is really is to make things as efficient as possible. Now, you 
can still have your nans if you're ready to eat them.

> > to avoid potential problems while keeping
> > track of where the nans were in the first place.
>
> ...like plotting them on a graph, which the current behaviour makes
> unworkable, that you end up doing a myarray.filled(0) to get around it,
> with imperfect results.

Send an example. I don't seem to have this problem:

x = np.arange(10,dtype=np.float)
x[5]=np.nan
y=ma.masked_invalid(x)

plot(x,'ok-')
plot(y,'sr-')

> Right, but why when the masked array is cast back to a list of numbers
> if the fill_value of the ma not respected?

Because in your particular case, you're inspecting elements one by one, and 
then, your masked data becomes the masked singleton which is a special value. 
That has nothing to do w/ the filling.

> >>> 2. why is the wrong fill value being used here?
> >>
> >> the second element in the array iteration here is actually the
> >> numpy.ma.masked constant, which always has the same fill value...
>
> ...and that's a bug.

And once again, it's not. numpy.ma.masked is a special value, like numpy.nan 
or numpy.inf