[Numpy-discussion] alterNEP - was: missing data discussion round 2

Benjamin Root ben.root at ou.edu
Sat Jul 2 16:10:31 EDT 2011


On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith <njs at pobox.com> wrote:

>
> I'm not sure what you mean here. If we have masked array support at
> all (and some people seem to want it), then we have to say more than
> "it's an array with a mask". Indexing such a beast has to do
> *something*, so we need some kind of theory to say what, ufuncs have
> to do *something*, ditto. I mean, I guess we could just say that a
> masked array is literally an np.ndarray where you have attached a
> field named "mask" that doesn't do anything, but I don't think that
> would really satisfy most users :-).
>
>
Indexing a masked array just returns an array with np.NA in the appropriate
elements.  This is no different than with regular ndarray objects or in
numpy.ma.  As for ufuncs, the NEP already addresses this in multiple ways.
For element-wise ufuncs, a "where" parameter is available for indicating
which elements to skip.  For reduction ufuncs, a "skipna" parameter will
indicate whether or not to skip the values.  On top of that, subclassed
ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function
that can set a default value for those parameters to make things easier for
masked array users.

I don't know about others, but my main objection is this: He's
> proposing two different implementations for NA. I only need one, so
> having two is redundant and confusing. Of these two, the bit-pattern
> one has lower memory overhead (which many people have spoken up to say
> matters to them), and really obvious semantics (assignment is
> implemented as assignment, etc.). So why force people to make this
> confusing choice? What does the mask implementation add? AFAICT, its
> only purpose is to satisfy a rather different set of use cases. (See
> Gary Strangman's email here for a good description of these use cases:
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
> But AFAICT again, it's been crippled for those use cases in order to
> give it the NA semantics. So I just don't see who the masking part is
> supposed to help.
>
>
As a user of numpy.ma, masked arrays have always been a second-class citizen
to me. Developing new code with it always brought about new surprises and
discoveries of strange behavior from various functions. In this sense,
numpy.ma has always been crippled.  By sacrificing *some* of the existing
semantics (which would likely be taken care of by a re-implemented
numpy.mato preserve backwards-compatibility), the masked array
community gains a
first-class citizen in numpy, and numpy developers will have the
masked/missing data issue in the forefront whenever developing new functions
and libraries.  I am more than happy with that trade-off.  I am willing to
learn to semantics so long as I have a guarantee that the functions I use
behaves the way I expect them to.


> BTW, you can't access the memory of a masked value by taking a view,
> at least if I'm reading this version of the NEP correctly, and it
> seems to be the latest:
>
> https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst
> The only way to access the memory of a masked value is take a view
> *before* you mask it. And if the array has a mask at all when you take
> the view, you also have to set a.flags.ownmask = True, before you mask
> the value.
>

This isn't actually as bad as it sounds.  From a function's perspective, it
should only know the values that it has been given access to.  If I -- as a
user of said function -- decide that certain values should be unknown to the
function, I wouldn't want the function to be able to override that
decision.  Remember, it is possible that the masked element never was
initialized.  Therefore, we wouldn't want the function to use that element.
(Note, this is one of those "fun" surprises that a numpy.ma user sometimes
encounters when a function uses np.asarray instead of np.asanyarray).

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110702/e78fd4d0/attachment.html>


More information about the NumPy-Discussion mailing list