[Numpy-discussion] how to use masked arrays

Wed May 14 11:30:01 EDT 2008

On Wednesday 14 May 2008 02:18:06 Christopher Burns wrote:
> I'm finding it difficult to tell which methods/operations respect the
> mask and which do not, in masked arrays.

Christopher, 
Unfortunately, there's no tutorial yet. Perhaps could you get one started on 
the scipy wiki ? I'm afraid I won't have time to do it myself, but I'd be 
more than happy to fill the gaps.

To answer some of your questions:
>>>import numpy as np, numpy.ma as ma
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])

* If you want to access the underlying data directly, these two commands are 
(almost) equivalent [1]:
>>>mydata._data
>>>mydata.view(np.ndarray)
Note that you lose the mask information, and that the values that were masked 
can be bogus.

* If you want to get a copy of the underlying data with masked values set 
to "myvalue", use .filled(myvalue). 
>>>mydata.filled(-999)
array([-99,   1, -99,   3, -99,   5])

If you don't use any argument, ".filled" uses the "fill_value" attribute, 
whose value depends on the dtype:
>>>mydata.fill_value
999999
>>>mydata.filled()
array([999999,      1, 999999,      3, 999999,      5])

Note that the argument of ".filled" is casted to the dtype of mydata. 
>>>mydata.dtype
dtype('int64')
>>>mydata.filled(np.pi)
array([3, 1, 3, 3, 3, 5])
That can be a problem if you wanted to use NaNs as filling values (a bad idea 
in itself):
>>>mydata.filled(np.nan)
array([0, 1, 0, 3, 0, 5])
Here, you don't have the NaNs you expected because NaNs are for floats, not 
integers.

* Because masked arrays inherit from ndarrays, there's also a "fill" method 
available: this one acts directly on the ._data part, but setting all the 
values at once. The mask is preserved.
>>>mydata.fill(-999)
>>>print mydata
[-- -999 -- -999 -- -999]

You could achieve the same result with this command
>>>mydata.flat = -999

* Assigning a value to a slice of mydata will modify the mask:
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
>>>mydata[:2] = -999
>>>print mydata
[-999 -999 -- 3 -- 5]
>>>mydata[-2:] = ma.masked
>>>print mydata
[-999 -999 -- 3 -- --]

* If you want to make sure you don't unmask data by mistake with slice 
assignments, set the ._hardmask attribute to True (it is set to False by 
default)
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0], hard_mask=True)
>>>mydata[:2] = -999
>>>print mydata
[-- -999 -- 3 -- 5]
You can change the value of ._hardmask either directly, or with the 
soften_mask() and harden_mask() methods

* 
> Basic methods respect the mask, like mydata.mean(), but np.asarray
> ignores the mask.

Yes, np.asarray(x) is equivalent to np.array(x, copy=False, subok=False). If 
you want to keep the mask, use np.asanyarray, which is equivalent to 
np.array(x, copy=False, subok=True) [2]

>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
>>>print mydata.mean()
3.0
>>>print np.asarray(mydata).mean()
2.5
>>>print np.asanyarray(mydata).mean()
3.0
>>>print np.mean(mydata)
3.0
On the last command, np.mean(mydta) tries first to access the .mean method of 
mydata: if mydata hand't such a method, it would be equivalent to 
np.asarray(mydata).mean()

Hope it helps, don't hesitate to ask for more details/explanations. Specific 
examples are always easier.
I'm looking forward to your wiki page ;)

P.

[1] Almost: mydata._data is in fact a shortcut to 
mydata.view(mydata._baseclass), where ._baseclass is the class of the 
underlying data. For example
>>>mxdata=ma.array(np.matrix([[1,2,],[3,4,]]),mask=[[1,0],[0,0]])
>>>print mxdata._baseclass
<class 'numpy.core.defmatrix.matrix'>
>>>print type(mxdata._data)
<class 'numpy.core.defmatrix.matrix'>
>>>print type(mxdata.view(np.ndarray))
<type 'numpy.ndarray'>

[2] Note that np.asanyarray returns a masked array in numpy.ma only, not in 
previous implementations.