[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Fri Oct 28 18:21:41 EDT 2011

On Friday, October 28, 2011, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:
>> Hi,
>>
>> On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>>
>>>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <
oliphant at enthought.com>
>>>> wrote:
>>>> > I think Nathaniel and Matthew provided very
>>>> > specific feedback that was helpful in understanding other
perspectives
>>>> > of a
>>>> > difficult problem.     In particular, I really wanted bit-patterns
>>>> > implemented.    However, I also understand that Mark did quite a bit
of
>>>> > work
>>>> > and altered his original designs quite a bit in response to community
>>>> > feedback.   I wasn't a major part of the pull request discussion, nor
>>>> > did I
>>>> > merge the changes, but I support Charles if he reviewed the code and
>>>> > felt
>>>> > like it was the right thing to do.  I likely would have done the same
>>>> > thing
>>>> > rather than let Mark Wiebe's work languish.
>>>>
>>>> My connectivity is spotty this week, so I'll stay out of the technical
>>>> discussion for now, but I want to share a story.
>>>>
>>>> Maybe a year ago now, Jonathan Taylor and I were debating what the
>>>> best API for describing statistical models would be -- whether we
>>>> wanted something like R's "formulas" (which I supported), or another
>>>> approach based on sympy (his idea). To summarize, I thought his API
>>>> was confusing, pointlessly complicated, and didn't actually solve the
>>>> problem; he thought R-style formulas were superficially simpler but
>>>> hopelessly confused and inconsistent underneath. Now, obviously, I was
>>>> right and he was wrong. Well, obvious to me, anyway... ;-) But it
>>>> wasn't like I could just wave a wand and make his arguments go away,
>>>> no matter how annoying and wrong-headed I thought they were... I could
>>>> write all the code I wanted but no-one would use it unless I could
>>>> convince them it's actually the right solution, so I had to engage
>>>> with him, and dig deep into his arguments.
>>>>
>>>> What I discovered was that (as I thought) R-style formulas *do* have a
>>>> solid theoretical basis -- but (as he thought) all the existing
>>>> implementations *are* broken and inconsistent! I'm still not sure I
>>>> can actually convince Jonathan to go my way, but, because of his
>>>> stubbornness, I had to invent a better way of handling these formulas,
>>>> and so my library[1] is actually the first implementation of these
>>>> things that has a rigorous theory behind it, and in the process it
>>>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R
>>>> folks can fix either of them at this point without breaking a ton of
>>>> code, since they both have API consequences.)
>>>>
>>>> --
>>>>
>>>> It's extremely common for healthy FOSS projects to insist on consensus
>>>> for almost all decisions, where consensus means something like "every
>>>> interested party has a veto"[2]. This seems counterintuitive, because
>>>> if everyone's vetoing all the time, how does anything get done? The
>>>> trick is that if anyone *can* veto, then vetoes turn out to actually
>>>> be very rare. Everyone knows that they can't just ignore alternative
>>>> points of view -- they have to engage with them if they want to get
>>>> anything done. So you get buy-in on features early, and no vetoes are
>>>> necessary. And by forcing people to engage with each other, like me
>>>> with Jonathan, you get better designs.
>>>>
>>>> But what about the cost of all that code that doesn't get merged, or
>>>> written, because everyone's spending all this time debating instead?
>>>> Better designSorry - this was too short and a little rude.  I'm sorry.
>
> I was reacting to what I perceived as intolerance for discussing the
> issues, and I may be wrong in that perception.
>
> I think what Nathaniel is saying, is that it is not in the best
> interests of numpy to push through code where there is not good
> agreement.  In reverting the change, he is, I think, appealing for a
> commitment to that process, for the good of numpy.
>
> I have in the past taken some of your remarks to imply that if someone
> is prepared to write code then that overrides most potential
> disagreement.
>
> The reason I think Nathaniel is the more right, is because most of us,
> I believe, do honestly have the interests of numpy at heart, and, want
> to fully understand the problem, and are prepared to be proven wrong.
> In that situation, in my experience of writing code at least, by far
> the most fruitful way to proceed is by letting all voices be heard.
> On the other hand, if the rule becomes 'unless I see an implementation
> I'm not listening to you' - then we lose the great benefits, to the
> code, of having what is fundamentally a good and strong community.
>
> Best,
>
> Matthew
>

Maybe an alternative implementation isn't really needed.  It seemed to me
that most of the current implantation isn't too far off the mark.  There are
just key portions missing or might need to be modified.

The space issues was never ignored and Mark left room for that to be
addressed.  Parameterized dtypes can still be added (and isn't all that
different from multi-na). Perhaps I could be convinced of a having np.MA
assignments mean "ignore" and np.NA mean "absent".  How far off are we
really from consensus?

Although, I still think that ignore + absent = ignore

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111028/85f8fcb4/attachment.html>