[Numpy-discussion] alterNEP - was: missing data discussion round 2

Lluís xscript at gmx.net
Fri Jul 1 13:47:16 EDT 2011


Nathaniel Smith writes:

> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett <matthew.brett at gmail.com>
>> wrote:
>>> Do you see problems with the alterNEP proposal?
>> 
>> Yes, I really like my design as it stands now, and the alterNEP removes a
>> lot of the abstraction and interoperability that are in my opinion the best
>> parts. I've made more updates to the NEP based on continuing feedback, which
>> are part of the pull request I want reviews for.
>> 
>>> 
>>> If so, what are they?
>> 
>> Mainly: Reduced interoperability, more complex implementation (leading to
>> more bugs), and an unclear theoretical model for the masked part of it.

> Can you give any examples of situations where one would run into this
> "reduced interoperability"? I'm not sure what it means. The only
> person who has so far spoken up as needing both masking semantics and
> NA semantics -- Gary Strangman -- has said that he strongly prefers
> the alterNEP semantics *exactly because* it makes it clear *how these
> functions will interoperate.*

Interoperability improves code maintenance, see my other mail.


[...]
> Do you have a clearer theoretical model for the masked part of your
> proposal? The best I've been able to extract from any of your messages
> is when you wrote "it seems to me that people wanting masked arrays
> want missing data without touching their data". But as a matter of
> English grammar, I have no idea what this means -- if you have data,
> it's not missing! It seems to me that people wanting masked data want
> to *hide* parts of their data, which seems much clearer to me and is
> the theoretical model used in the alterNEP. Note that this model
> actually predicts several of the differences between how people want
> masks to work and how people want NAs to work (e.g., their behavior
> during reduction); I

Come on, let's not jump into each other's throats, I think we've long
ago arrived at a point where we all know what masked means.

If you agree on the interoperability point, then I don't see how the
aNEP improves on that, having in mind that masks must be *explicitly*
activated (again, see the other mail).


[...]
> Well, that's not true. There are some marginal advantages in the
> special case of working with integers+NAs. But I don't think anyone's
> making that argument.

I for one would love that, instead of having to explicitly set dtypes
when using genfromtxt.


[...]
> But as far as I can tell right now, every single person who has
> experience with handling missing data for statistical purposes (esp.
> in R) has real concerns about your proposal, and AFAICT the community
> has very much *not* reached consensus on how these features should
> look.

What I have seen is that people used to R see the mask concept as an
alien, and said "I don't want to use it, so please make it more explicit
so that I will know what to avoid". What I say is that you simply don't
have to make np.IGNORE explicit to avoid masks. Simply do not create
arrays with masks.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



More information about the NumPy-Discussion mailing list