[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Wes McKinney wesmckinn at gmail.com
Fri Oct 28 20:45:55 EDT 2011


On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Friday, October 28, 2011, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers
>> <ralf.gommers at googlemail.com> wrote:
>>>
>>>
>>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris
>>>> <charlesr.harris at gmail.com> wrote:
>>>> >
>>>> >
>>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett
>>>> > <matthew.brett at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett
>>>> >> <matthew.brett at gmail.com>
>>>> >> wrote:
>>>> >> > Hi,
>>>> >> >
>>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris
>>>> >> > <charlesr.harris at gmail.com> wrote:
>>>> >> >>
>>>> >> >>
>>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs at pobox.com>
>>>> >> >> wrote:
>>>> >> >>>
>>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant
>>>> >> >>> <oliphant at enthought.com>
>>>> >> >>> wrote:
>>>> >> >>> > I think Nathaniel and Matthew provided very
>>>> >> >>> > specific feedback that was helpful in understanding other
>>>> >> >>> > perspectives
>>>> >> >>> > of a
>>>> >> >>> > difficult problem.     In particular, I really wanted
>>>> >> >>> > bit-patterns
>>>> >> >>> > implemented.    However, I also understand that Mark did quite
>>>> >> >>> > a
>>>> >> >>> > bit
>>>> >> >>> > of
>>>> >> >>> > work
>>>> >> >>> > and altered his original designs quite a bit in response to
>>>> >> >>> > community
>>>> >> >>> > feedback.   I wasn't a major part of the pull request
>>>> >> >>> > discussion,
>>>> >> >>> > nor
>>>> >> >>> > did I
>>>> >> >>> > merge the changes, but I support Charles if he reviewed the
>>>> >> >>> > code
>>>> >> >>> > and
>>>> >> >>> > felt
>>>> >> >>> > like it was the right thing to do.  I likely would have done
>>>> >> >>> > the
>>>> >> >>> > same
>>>> >> >>> > thing
>>>> >> >>> > rather than let Mark Wiebe's work languish.
>>>> >> >>>
>>>> >> >>> My connectivity is spotty this week, so I'll stay out of the
>>>> >> >>> technical
>>>> >> >>> discussion for now, but I want to share a story.
>>>> >> >>>
>>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what
>>>> >> >>> the
>>>> >> >>> best API for describing statistical models would be -- whether we
>>>> >> >>> wanted something like R's "formulas" (which I supported), or
>>>> >> >>> another
>>>> >> >>> approach based on sympy (his idea). To summarize, I thought his
>>>> >> >>> API
>>>> >> >>> was confusing, pointlessly complicated, and didn't actually solve
>>>> >> >>> the
>>>> >> >>> problem; he thought R-style formulas were superficially simpler
>>>> >> >>> but
>>>> >> >>> hopelessly confused and inconsistent underneath. Now, obviously,
>>>> >> >>> I
>>>> >> >>> was
>>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it
>>>> >> >>> wasn't like I could just wave a wand and make his arguments go
>>>> >> >>> away,
>>>> >> >>> no I should point out that the implementation hasn't - as far as
>>>> >> >>> I can
>> see - changed the discussion.  The discussion was about the API.
>> Implementations are useful for agreed APIs because they can point out
>> where the API does not make sense or cannot be implemented.  In this
>> case, the API Mark said he was going to implement - he did implement -
>> at least as far as I can see.  Again, I'm happy to be corrected.
>>
>>>> In saying that we are insisting on our way, you are saying, implicitly,
>>>> 'I
>>>> am not going to negotiate'.
>>>
>>> That is only your interpretation. The observation that Mark compromised
>>> quite a bit while you didn't seems largely correct to me.
>>
>> The problem here stems from our inability to work towards agreement,
>> rather than standing on set positions.  I set out what changes I think
>> would make the current implementation OK.  Can we please, please have
>> a discussion about those points instead of trying to argue about who
>> has given more ground.
>>
>>> That commitment would of course be good. However, even if that were
>>> possible
>>> before writing code and everyone agreed that the ideas of you and
>>> Nathaniel
>>> should be implemented in full, it's still not clear that either of you
>>> would
>>> be willing to write any code. Agreement without code still doesn't help
>>> us
>>> very much.
>>
>> I'm going to return to Nathaniel's point - it is a highly valuable
>> thing to set ourselves the target of resolving substantial discussions
>> by consensus.   The route you are endorsing here is 'implementor
>> wins'.   We don't need to do it that way.  We're a mature sensible
>> bunch of adults who can talk out the issues until we agree they are
>> ready for implementation, and then implement.  That's all Nathaniel is
>> saying.  I think he's obviously right, and I'm sad that it isn't as
>> clear to y'all as it is to me.
>>
>> Best,
>>
>> Matthew
>>
>
> Everyone, can we please not do this?! I had enough of adults doing finger
> pointing back over the summer during the whole debt ceiling debate.  I think
> we can all agree that we are better than the US congress?
>
> Forget about rudeness or decision processes.
>
> I will start by saying that I am willing to separate ignore and absent, but
> only on the write side of things.  On read, I want a single way to identify
> the missing values.  I also want only a single way to perform calculations
> (either skip or propagate).
>
> An indicator of success would be that people stop using NaNs and magic
> numbers (-9999, anyone?) and we could even deprecate nansum(), or at least
> strongly suggest in its docs to use NA.

Well, I haven't completely made up my mind yet, will have to do some
more prototyping and playing (and potentially have some of my users
eat the differently-flavored dogfood), but I'm really not very
satisfied with the API at the moment. I'm mainly worried about the
abstraction leaking through to pandas users (this is a pretty large
group of people judging by # of downloads).

The basic position I'm in is that I'm trying to push Python into a new
space, namely mainstream data analysis and statistical computing, one
that is solidly occupied by R and other such well-known players. My
target users are not computer scientists. They are not going to invest
in understanding dtypes very deeply or the internals of ndarray. In
fact I've spent a great deal of effort making it so that pandas users
can be productive and successful while having very little
understanding of NumPy. Yes, I essentially "protect" my users from
NumPy because using it well requires a certain level of sophistication
that I think is unfair to demand of people. This might seem totally
bizarre to some of you but it is simply the state of affairs. So far I
have been successful because more people are using Python and pandas
to do things that they used to do in R. The NA concept in R is dead
simple and I don't see why we are incapable of also implementing
something that is just as dead simple. To we, the scipy elite let's
call us, it seems simple: "oh, just pass an extra flag to all my array
constructors!" But this along with the masked array concept is going
to have two likely outcomes:

1) Create a great deal more complication in my already very large codebase

and/or

2) force pandas users to understand the new masked arrays after I've
carefully made it so they can be largely ignorant of NumPy

The mostly-NaN-based solution I've cobbled together and tweaked over
the last 42 months actually *works really well*, amazingly, with
relatively little cost in code complexity. Having found a reasonably
stable equilibrium I'm extremely resistant to upset the balance.

So I don't know. After watching these threads bounce back and forth
I'm frankly not all that hopeful about a solution arising that
actually addresses my needs.

best,
Wes

> Cheers!
> Ben Root
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list