[Numpy-discussion] Indexing structured masked arrays with multidimensional fields; what with fill_value?

Tue Dec 1 06:29:14 EST 2015

Hello,

usually, a masked array's .fill_value attribute has ndim=0 and the
same dtype as the data attribute:

In [27]: ar = array((0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 0.0),
dtype="int, (2,3)float, float")

In [28]: arm = ma.masked_array(ar)

In [29]: arm.fill_value.ndim
Out[29]: 0

In [31]: arm.fill_value.dtype
Out[31]: dtype([('f0', '<i8'), ('f1', '<f8', (2, 3)), ('f2', '<f8')])

What would be the optimal approach for .fill_value if I address the
member "f1" in this case?  The current behaviour is:

In [32]: f = arm["f1"]

In [36]: f.fill_value
Out[36]:
array([[  1.00000000e+20,   1.00000000e+20,   1.00000000e+20],
       [  1.00000000e+20,   1.00000000e+20,   1.00000000e+20]])

This breaks the usual behaviour that .fill_value has ndim=0, which can
cause bugs such as reported in issue #6723:
https://github.com/numpy/numpy/issues/6723

What should numpy do instead?  In pull request 6728, I propose to
change the behaviour so that arm["f1"].fill_value is set to
arm.fill_value["f1"].flat[0].  This is an arbitrary and somewhat
ad-hoc solution.  If I have chosen to set arm.fill_value["f1"] to
something else, such as array([[1., 2., 3.], [4., 5., 6.]]), then the
rest of my fill_value is lost.  I don't know if this might lead to
problems.  Does it matter?  See also
http://stackoverflow.com/questions/33921579/what-practical-impact-if-any-does-the-fill-value-of-a-masked-array-have
.

regards,
Gerrit.