[Numpy-discussion] how to use masked arrays
Pierre GM
pgmdevlist at gmail.com
Wed May 14 11:30:01 EDT 2008
On Wednesday 14 May 2008 02:18:06 Christopher Burns wrote:
> I'm finding it difficult to tell which methods/operations respect the
> mask and which do not, in masked arrays.
Christopher,
Unfortunately, there's no tutorial yet. Perhaps could you get one started on
the scipy wiki ? I'm afraid I won't have time to do it myself, but I'd be
more than happy to fill the gaps.
To answer some of your questions:
>>>import numpy as np, numpy.ma as ma
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
* If you want to access the underlying data directly, these two commands are
(almost) equivalent [1]:
>>>mydata._data
>>>mydata.view(np.ndarray)
Note that you lose the mask information, and that the values that were masked
can be bogus.
* If you want to get a copy of the underlying data with masked values set
to "myvalue", use .filled(myvalue).
>>>mydata.filled(-999)
array([-99, 1, -99, 3, -99, 5])
If you don't use any argument, ".filled" uses the "fill_value" attribute,
whose value depends on the dtype:
>>>mydata.fill_value
999999
>>>mydata.filled()
array([999999, 1, 999999, 3, 999999, 5])
Note that the argument of ".filled" is casted to the dtype of mydata.
>>>mydata.dtype
dtype('int64')
>>>mydata.filled(np.pi)
array([3, 1, 3, 3, 3, 5])
That can be a problem if you wanted to use NaNs as filling values (a bad idea
in itself):
>>>mydata.filled(np.nan)
array([0, 1, 0, 3, 0, 5])
Here, you don't have the NaNs you expected because NaNs are for floats, not
integers.
* Because masked arrays inherit from ndarrays, there's also a "fill" method
available: this one acts directly on the ._data part, but setting all the
values at once. The mask is preserved.
>>>mydata.fill(-999)
>>>print mydata
[-- -999 -- -999 -- -999]
You could achieve the same result with this command
>>>mydata.flat = -999
* Assigning a value to a slice of mydata will modify the mask:
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
>>>mydata[:2] = -999
>>>print mydata
[-999 -999 -- 3 -- 5]
>>>mydata[-2:] = ma.masked
>>>print mydata
[-999 -999 -- 3 -- --]
* If you want to make sure you don't unmask data by mistake with slice
assignments, set the ._hardmask attribute to True (it is set to False by
default)
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0], hard_mask=True)
>>>mydata[:2] = -999
>>>print mydata
[-- -999 -- 3 -- 5]
You can change the value of ._hardmask either directly, or with the
soften_mask() and harden_mask() methods
*
> Basic methods respect the mask, like mydata.mean(), but np.asarray
> ignores the mask.
Yes, np.asarray(x) is equivalent to np.array(x, copy=False, subok=False). If
you want to keep the mask, use np.asanyarray, which is equivalent to
np.array(x, copy=False, subok=True) [2]
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
>>>print mydata.mean()
3.0
>>>print np.asarray(mydata).mean()
2.5
>>>print np.asanyarray(mydata).mean()
3.0
>>>print np.mean(mydata)
3.0
On the last command, np.mean(mydta) tries first to access the .mean method of
mydata: if mydata hand't such a method, it would be equivalent to
np.asarray(mydata).mean()
Hope it helps, don't hesitate to ask for more details/explanations. Specific
examples are always easier.
I'm looking forward to your wiki page ;)
P.
[1] Almost: mydata._data is in fact a shortcut to
mydata.view(mydata._baseclass), where ._baseclass is the class of the
underlying data. For example
>>>mxdata=ma.array(np.matrix([[1,2,],[3,4,]]),mask=[[1,0],[0,0]])
>>>print mxdata._baseclass
<class 'numpy.core.defmatrix.matrix'>
>>>print type(mxdata._data)
<class 'numpy.core.defmatrix.matrix'>
>>>print type(mxdata.view(np.ndarray))
<type 'numpy.ndarray'>
[2] Note that np.asanyarray returns a masked array in numpy.ma only, not in
previous implementations.
More information about the NumPy-Discussion
mailing list