[SciPy-Dev] scipy.stats

Mon May 31 11:50:47 EDT 2010

On Mon, May 31, 2010 at 9:32 AM, Skipper Seabold <jsseabold at gmail.com>wrote:

> On Mon, May 31, 2010 at 10:38 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, May 31, 2010 at 8:23 AM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >>
> >> On Mon, May 31, 2010 at 8:16 AM, <josef.pktd at gmail.com> wrote:
> >>>
> >>> Since Travis seems to want to take back control of scipy.stats, I am
> >>> considering my role as inofficial maintainer as ended.
> >>>
> >>> I would have appreciated his help almost 3 years ago, when I started
> >>> to learn numpy, scipy, and started to submit patches for
> >>> scipy.stats.distributions.
> >>>
> >>> But by now, I have pretty strong opinions about statistics in python,
> >>> after almost  three years, I'm a bit tired of cleaning up the mess of
> >>> others (and want to clean up my own mess), and there are obviously big
> >>> philosophical differences for the development process between me and
> >>> Travis (no discussion, no review, no tests).
> >>> http://projects.scipy.org/scipy/log/trunk/scipy/stats/tests
> >>>
> >>> Watching the scipy changelog and checking any function that Travis
> >>> quietly commits is no fun (see mailing list for the introduction of
> >>> curve_fit or ask Stefan).
> >>>
> >>> I said early on that I would like to trust the results that
> >>> scipy.stats produces (although I don't find the mailing list thread
> >>> any more).
> >>>
> >>> I considered scipy to go into a stable direction like Python is,
> >>> kitchen sink for scientific programming, which might be slow-moving
> >>> but with high standards, and not a sandbox.
> >>>
> >>> Details are at
> >>> http://mail.scipy.org/pipermail/scipy-dev/2010-April/014058.html
> >>>
> >>> After my initial scipy.stats.distributions cleanup, test coverage was
> >>> at 91%, I have no idea where it is after this weekend.
> >>>
> >>> This is more about the process then the content, distributions was
> >>> Travis's baby (although unfinished), and most of his changes are very
> >>> good, but I don't want to look for the 5-10% (?) typos anymore.
> >>>
> >>
> >> Ah Josef, there are easier ways to lodge complaints than resignation ;)
> I
> >> agree that it was rude of Travis to make those changes without running
> them
> >> through the list, and he does tend to toss stuff in that others have to
> >> clean up, the same with c-code. But maybe we can manage to get him
> >> housebroken without all moving out.
> >>
> >
> > I think a policy of mandatory review will solve these sorts of problems,
> and
> > that is probably a good argument for moving to github where review is
> much
> > easier. On stats, we probably need an additional policy of rigorous
> testing
> > to make sure that things are working right, the stat tests are more
> > difficult by their very nature. I think Travis is amenable to such
> > processes, but we do need to start a discussion. If you do feel strongly
> > about the recent changes maybe they can be reverted and added back in
> after
> > review.
> >
>
> I am perhaps wading out of my depth here, but I agree with the
> concerns and having the proposed dialogue, as I think having Josef's
> input on the direction of scipy.stats is important.
>
> This does dovetail with the move to DVCS/github and having a review
> and discussion policy in place before things go into trunk.  I don't
> recall there being a time frame set up for the move (?) though there
> was little dissent in actually making the move.  Perhaps we could
> start hashing out concrete plans for review and a renewed policy for
> testing standards so that the discussions can focus more on design and
> as little energy as possible is spent uncovering precision errors,
> typos, and niggling bugs.  Does it make sense to do this before the
> move maybe as part of the docs marathon?  Of course there were also
> those in favor of shoot first, sort it out as we go along because this
> is a problem that has been solved before.
>
> Re:testing, the things that go into stats must be as test driven as
> possible given that there are plenty of choices of where to turn to do
> statistics work.  The econometricians that I have talked to who
> develop in R tell me Python is a "dark horse" for choice of language
> and having undiscovered precision errors etc., to say nothing about
> actual design, does not help our case.
>
>
With this in mind, perhaps it would be best to revert the changes so that
there is a clean starting point; we can keep them somewhere else for
review.  The discussion of process can then take place without dealing with
the specifics of the recent commits.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100531/45618983/attachment.html>