[Numpy-discussion] MaskedArray __setitem__ Performance

Pierre GM pgmdevlist at gmail.com
Sat Feb 16 15:21:32 EST 2008


> Can I safely carry around the data, mask and MaskedArray? I'm
> considering working along the lines of the following conceptual
> outline:

That depends a lot on what calculate_results does, and whether you update the 
arrays in place or not.

> d = numpy.array(shape, dtype)
> m = numpy.array(shape, bool)
> a = numpy.ma.MaskedArray(d, m)

You should be able to update d and m, and have the changes passed to a (as 
long as you're not using copy=True). You have to make sure that m has indeed 
a dtype of MaskType (or bool), else you'll break the connection.

Explanation: in MaskedArray.__new__, the mask argument is converted to a dtype 
of MaskType (bool): if the mask is originally in integer, for example, a copy 
is made, and the _mask of your masked array does not point to `mask`. For 
example:
>>>d=numpy.array([1,2,3])
>>>m=numpy.array([0,0,1])
>>>x=numpy.ma.array(d,mask=m)
>>>x
[1 2 --]
>>>d[0]=17
>>>x
[17 2 --]

OK, x is properly updated. If now we try to change the mask:

>>>m[0]=1
>>>x
[17 2 --]

x is not updated, as x._mask doesn't point to m, but to a copy of m as the 
dtype changed from int to bool.
Now, if we ensure that m is an array of booleans:
>>>d=numpy.array([1,2,3])
>>>m=numpy.array([0,0,1], dtype=bool)
>>>x=numpy.ma.array(d,mask=m)
>>>print x
[1 2 --]
>>>d[0]=17
>>>print x
[17 2 --]
>>>m[0]=1
>>>print x
[-- 2 --]
m was of the correct dtype in the first place, so no copy is made, and x._mask 
does point to m.

In short: in your example, updating d and m should work and be more efficient 
than updating a directly.



More information about the NumPy-Discussion mailing list