[SciPy-Dev] scipy.stats

David Goldsmith d.l.goldsmith at gmail.com
Mon May 31 13:43:11 EDT 2010


On Mon, May 31, 2010 at 8:32 AM, Skipper Seabold <jsseabold at gmail.com>wrote:

> On Mon, May 31, 2010 at 10:38 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> > On Mon, May 31, 2010 at 8:23 AM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >>
> >> On Mon, May 31, 2010 at 8:16 AM, <josef.pktd at gmail.com> wrote:
> >>>
> >>> Since Travis seems to want to take back control of scipy.stats, I am
> >>> considering my role as inofficial maintainer as ended.
> >>>
> >>> I would have appreciated his help almost 3 years ago, when I started
> >>> to learn numpy, scipy, and started to submit patches for
> >>> scipy.stats.distributions.
> >>>
> >>> But by now, I have pretty strong opinions about statistics in python,
> >>> after almost  three years, I'm a bit tired of cleaning up the mess of
> >>> others (and want to clean up my own mess), and there are obviously big
> >>> philosophical differences for the development process between me and
> >>> Travis (no discussion, no review, no tests).
> >>> http://projects.scipy.org/scipy/log/trunk/scipy/stats/tests
> >>>
> >>> Watching the scipy changelog and checking any function that Travis
> >>> quietly commits is no fun (see mailing list for the introduction of
> >>> curve_fit or ask Stefan).
> >>>
> >>> I said early on that I would like to trust the results that
> >>> scipy.stats produces (although I don't find the mailing list thread
> >>> any more).
> >>>
> >>> I considered scipy to go into a stable direction like Python is,
> >>> kitchen sink for scientific programming, which might be slow-moving
> >>> but with high standards, and not a sandbox.
> >>>
> >>> Details are at
> >>> http://mail.scipy.org/pipermail/scipy-dev/2010-April/014058.html
> >>>
> >>> After my initial scipy.stats.distributions cleanup, test coverage was
> >>> at 91%, I have no idea where it is after this weekend.
> >>>
> >>> This is more about the process then the content, distributions was
> >>> Travis's baby (although unfinished), and most of his changes are very
> >>> good, but I don't want to look for the 5-10% (?) typos anymore.
> >>>
> >>
> >> Ah Josef, there are easier ways to lodge complaints than resignation ;)
> I
> >> agree that it was rude of Travis to make those changes without running
> them
> >> through the list, and he does tend to toss stuff in that others have to
> >> clean up, the same with c-code. But maybe we can manage to get him
> >> housebroken without all moving out.
> >>
> >
> > I think a policy of mandatory review will solve these sorts of problems,
> and
> > that is probably a good argument for moving to github where review is
> much
> > easier. On stats, we probably need an additional policy of rigorous
> testing
> > to make sure that things are working right, the stat tests are more
> > difficult by their very nature. I think Travis is amenable to such
> > processes, but we do need to start a discussion. If you do feel strongly
> > about the recent changes maybe they can be reverted and added back in
> after
> > review.
> >
>
> I am perhaps wading out of my depth here, but I agree with the
> concerns and having the proposed dialogue, as I think having Josef's
> input on the direction of scipy.stats is important.
>
> This does dovetail with the move to DVCS/github and having a review
> and discussion policy in place before things go into trunk.  I don't
> recall there being a time frame set up for the move (?) though there
> was little dissent in actually making the move.  Perhaps we could
> start hashing out concrete plans for review and a renewed policy for
> testing standards so that the discussions can focus more on design and
> as little energy as possible is spent uncovering precision errors,
> typos, and niggling bugs.  Does it make sense to do this before the
> move maybe as part of the docs marathon?  Of course there were also
> those in favor of shoot first, sort it out as we go along because this
> is a problem that has been solved before.
>
> Re:testing, the things that go into stats must be as test driven as
> possible


As in Test Driven
Development<http://en.wikipedia.org/wiki/Test_driven_development>?
Hear hear!  This would force (stats) developers to think first about
developing tests to verify the correctness of the more esoteric aspects of
the desired results of the algorithm they're working on, and would make it
much less likely that code would be submitted w/out tests (if you had to
write the tests first, why are you submitting code w/out tests?)  Obviously,
this would need to be "strongly advised," as we have no way of enforcing how
people actually write code, but we certainly could enforce that code w/out
tests (and/or without Standards compliant docstrings) won't even be
reviewed.

DG


> given that there are plenty of choices of where to turn to do
> statistics work.  The econometricians that I have talked to who
> develop in R tell me Python is a "dark horse" for choice of language
> and having undiscovered precision errors etc., to say nothing about
> actual design, does not help our case.
>
> Skipper
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100531/1c0f028c/attachment.html>


More information about the SciPy-Dev mailing list