[Numpy-discussion] alterNEP - was: missing data discussion round 2

Lluís xscript at gmx.net
Fri Jul 1 13:47:16 EDT 2011

Nathaniel Smith writes:

> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett <matthew.brett at gmail.com>
>> wrote:
>>> Do you see problems with the alterNEP proposal?
>> Yes, I really like my design as it stands now, and the alterNEP removes a
>> lot of the abstraction and interoperability that are in my opinion the best
>> parts. I've made more updates to the NEP based on continuing feedback, which
>> are part of the pull request I want reviews for.
>>> If so, what are they?
>> Mainly: Reduced interoperability, more complex implementation (leading to
>> more bugs), and an unclear theoretical model for the masked part of it.

> Can you give any examples of situations where one would run into this
> "reduced interoperability"? I'm not sure what it means. The only
> person who has so far spoken up as needing both masking semantics and
> NA semantics -- Gary Strangman -- has said that he strongly prefers
> the alterNEP semantics *exactly because* it makes it clear *how these
> functions will interoperate.*

Interoperability improves code maintenance, see my other mail.

> Do you have a clearer theoretical model for the masked part of your
> proposal? The best I've been able to extract from any of your messages
> is when you wrote "it seems to me that people wanting masked arrays
> want missing data without touching their data". But as a matter of
> English grammar, I have no idea what this means -- if you have data,
> it's not missing! It seems to me that people wanting masked data want
> to *hide* parts of their data, which seems much clearer to me and is
> the theoretical model used in the alterNEP. Note that this model
> actually predicts several of the differences between how people want
> masks to work and how people want NAs to work (e.g., their behavior
> during reduction); I

Come on, let's not jump into each other's throats, I think we've long
ago arrived at a point where we all know what masked means.

If you agree on the interoperability point, then I don't see how the
aNEP improves on that, having in mind that masks must be *explicitly*
activated (again, see the other mail).

> Well, that's not true. There are some marginal advantages in the
> special case of working with integers+NAs. But I don't think anyone's
> making that argument.

I for one would love that, instead of having to explicitly set dtypes
when using genfromtxt.

> But as far as I can tell right now, every single person who has
> experience with handling missing data for statistical purposes (esp.
> in R) has real concerns about your proposal, and AFAICT the community
> has very much *not* reached consensus on how these features should
> look.

What I have seen is that people used to R see the mask concept as an
alien, and said "I don't want to use it, so please make it more explicit
so that I will know what to avoid". What I say is that you simply don't
have to make np.IGNORE explicit to avoid masks. Simply do not create
arrays with masks.


 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom

More information about the NumPy-Discussion mailing list