[Numpy-discussion] MaskedArray.__array__ bug? (was 'A reimplementation of MaskedArray')

Michael Sorich michael.sorich at gmail.com
Sun Nov 19 22:18:41 EST 2006


On 11/9/06, Paul Dubois <pfdubois at gmail.com> wrote:
> 2 cents from the author of the first folio:
>
> The intent was to allow creation of masked arrays with copy=no, so that the
> original data could be retrieved from it if desired. But I was quite, quite
> rigorous about NEVER assuming the data in a masked slot made any sense
> whatsoever.
>
> The intention was that there are two ways to get a numeric array out of a
> masked one:
>
> 1. Get the data field
> 2. m.filled()
>
> (1) is strictly at your own risk.

Is there consensus that in __array__ it is incorrect to return the
_data component of a MaskedArray when there are masked values? It
certainly worries me that what is underneath the mask is returned.

Currently in numpy.core.ma MaskedArray class
def __array__ (self, t=None, context=None):
        "Special hook for numeric. Converts to numeric if possible."
        if self._mask is not nomask:
            if fromnumeric.ravel(self._mask).any():
                if context is None:
                    warnings.warn("Cannot automatically convert masked
array to "\
                                  "numeric because data\n    is masked
in one or "\
                                  "more locations.");
                    return self._data

I think the warning should be an exception. Alternatively, there could
be subclass of masked array for uses in which the masked data
represents missing data (hence data under the mask is spurious and
should not be exposed). In this subclass the __array__ method could be
redefined to impose stricter control of the masked data.



More information about the NumPy-Discussion mailing list