[Numpy-discussion] Splitting MaskedArray into a separate package

Sebastian Berg sebastian at sipsolutions.net
Wed May 23 17:48:07 EDT 2018


On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote:
> On 05/23/2018 04:02 PM, Eric Firing wrote:
> > Bad or missing values (and situations where one wants to use a mask
> > to
> > operate on a subset of an array) are found in many domains of real
> > life;
> > do you really want python users in those domains to have to fall
> > back on
> > Matlab-style reliance on nans and/or manual mask manipulations, as
> > the
> > new maskedarray package is sidelined?
> 
> I also think that missing value support is important to include
> inside
> numpy, just as it is included in other numerical packages like R and
> Julia.
> 
> The time is ripe to write a new and better MaskedArray, because
> __array_ufunc__ exists now. With some other numpy devs a few months
> ago
> we also played with rewriting MA using __array_ufunc__ and fixing up
> all
> the bugs and inconsistencies we have discovered over time (eg,
> getting
> rid of the Masked constant). Both Eric and I started working on some
> code changes, but never submitted PRs. See a little bit of discussion
> here (there was some more elsewhere I can't find now):
> 
> https://github.com/numpy/numpy/pull/9792#issuecomment-333346420
> 
> As I say there, numpy's current MA support is pretty poor compared to
> R
> - Wes McKinney partly justified his desire to move pandas away from
> numpy because of it. We have a lot to gain by implementing it nicely.
> 
> We already have an NEP discussing possible ways forward:
> https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html
> 
> I was pretty excited by discussion above, and still am. I want to get
> back to it after I finish more immediate priorities - finishing
> printing/loading/saving fixes and structured array fixes.
> 
> But Masked-Array-2 is on my list of desired long-term enhancements
> for
> numpy.

Well, if we plan to replace it within numpy, I think we should wait
until then for any move on deprecation (after which it seems like the
obviously right choice)?

If we do not plan to replace it within numpy, we need to discuss a bit
how it might affect infrastructure (multiple implementations....).

There is the other discussion about how to replace it. By opening
up/creating new masked dtypes or similar (cool but unclear how
complex/long term) or `__array_ufunc__` based (relatively simple, will
get rid of the nastier hacks that are currently needed).

Or even both, just on different time scales?

My first gut feeling about the proposal is: I love the idea to get rid
of it... but lets not do it, it does feel like it makes too much
infrastructure unclear.

- Sebastian


> 
> Allan
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180523/89c8e575/attachment.sig>


More information about the NumPy-Discussion mailing list