[SciPy-Dev] scipy.stats

Charles R Harris charlesr.harris at gmail.com
Mon May 31 12:32:18 EDT 2010


On Mon, May 31, 2010 at 10:06 AM, <josef.pktd at gmail.com> wrote:

> On Mon, May 31, 2010 at 11:50 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, May 31, 2010 at 9:32 AM, Skipper Seabold <jsseabold at gmail.com>
> > wrote:
> >>
> >> On Mon, May 31, 2010 at 10:38 AM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Mon, May 31, 2010 at 8:23 AM, Charles R Harris
> >> > <charlesr.harris at gmail.com> wrote:
> >> >>
> >> >>
> >> >> On Mon, May 31, 2010 at 8:16 AM, <josef.pktd at gmail.com> wrote:
> >> >>>
> >> >>> Since Travis seems to want to take back control of scipy.stats, I am
> >> >>> considering my role as inofficial maintainer as ended.
> >> >>>
> >> >>> I would have appreciated his help almost 3 years ago, when I started
> >> >>> to learn numpy, scipy, and started to submit patches for
> >> >>> scipy.stats.distributions.
> >> >>>
> >> >>> But by now, I have pretty strong opinions about statistics in
> python,
> >> >>> after almost  three years, I'm a bit tired of cleaning up the mess
> of
> >> >>> others (and want to clean up my own mess), and there are obviously
> big
> >> >>> philosophical differences for the development process between me and
> >> >>> Travis (no discussion, no review, no tests).
> >> >>> http://projects.scipy.org/scipy/log/trunk/scipy/stats/tests
> >> >>>
> >> >>> Watching the scipy changelog and checking any function that Travis
> >> >>> quietly commits is no fun (see mailing list for the introduction of
> >> >>> curve_fit or ask Stefan).
> >> >>>
> >> >>> I said early on that I would like to trust the results that
> >> >>> scipy.stats produces (although I don't find the mailing list thread
> >> >>> any more).
> >> >>>
> >> >>> I considered scipy to go into a stable direction like Python is,
> >> >>> kitchen sink for scientific programming, which might be slow-moving
> >> >>> but with high standards, and not a sandbox.
> >> >>>
> >> >>> Details are at
> >> >>> http://mail.scipy.org/pipermail/scipy-dev/2010-April/014058.html
> >> >>>
> >> >>> After my initial scipy.stats.distributions cleanup, test coverage
> was
> >> >>> at 91%, I have no idea where it is after this weekend.
> >> >>>
> >> >>> This is more about the process then the content, distributions was
> >> >>> Travis's baby (although unfinished), and most of his changes are
> very
> >> >>> good, but I don't want to look for the 5-10% (?) typos anymore.
> >> >>>
> >> >>
> >> >> Ah Josef, there are easier ways to lodge complaints than resignation
> ;)
> >> >> I
> >> >> agree that it was rude of Travis to make those changes without
> running
> >> >> them
> >> >> through the list, and he does tend to toss stuff in that others have
> to
> >> >> clean up, the same with c-code. But maybe we can manage to get him
> >> >> housebroken without all moving out.
> >> >>
> >> >
> >> > I think a policy of mandatory review will solve these sorts of
> problems,
> >> > and
> >> > that is probably a good argument for moving to github where review is
> >> > much
> >> > easier. On stats, we probably need an additional policy of rigorous
> >> > testing
> >> > to make sure that things are working right, the stat tests are more
> >> > difficult by their very nature. I think Travis is amenable to such
> >> > processes, but we do need to start a discussion. If you do feel
> strongly
> >> > about the recent changes maybe they can be reverted and added back in
> >> > after
> >> > review.
> >> >
> >>
> >> I am perhaps wading out of my depth here, but I agree with the
> >> concerns and having the proposed dialogue, as I think having Josef's
> >> input on the direction of scipy.stats is important.
> >>
> >> This does dovetail with the move to DVCS/github and having a review
> >> and discussion policy in place before things go into trunk.  I don't
> >> recall there being a time frame set up for the move (?) though there
> >> was little dissent in actually making the move.  Perhaps we could
> >> start hashing out concrete plans for review and a renewed policy for
> >> testing standards so that the discussions can focus more on design and
> >> as little energy as possible is spent uncovering precision errors,
> >> typos, and niggling bugs.  Does it make sense to do this before the
> >> move maybe as part of the docs marathon?  Of course there were also
> >> those in favor of shoot first, sort it out as we go along because this
> >> is a problem that has been solved before.
> >>
> >> Re:testing, the things that go into stats must be as test driven as
> >> possible given that there are plenty of choices of where to turn to do
> >> statistics work.  The econometricians that I have talked to who
> >> develop in R tell me Python is a "dark horse" for choice of language
> >> and having undiscovered precision errors etc., to say nothing about
> >> actual design, does not help our case.
> >>
> >
> > With this in mind, perhaps it would be best to revert the changes so that
> > there is a clean starting point; we can keep them somewhere else for
> > review.  The discussion of process can then take place without dealing
> with
> > the specifics of the recent commits.
>
> Or someone writes the tests for them and fixes possible problems, then
> I don't think it's a problem to keep them.
>
>
I think the policy has to be that additions *must* come with tests and
documentation. If the recent changes don't meet that criterion, then they
must be taken out. The policy has to be established at some point. Places
where we can afford to be less strict are simple bug fixes, or totally new
projects unrelated to current contents, where a certain period of shaking
out is to be expected, but for modifications to existing areas I believe
more care needs to be taken. Projects grow and mature, and what was
acceptable or even essential early on can become counter-productive.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100531/5ab3054a/attachment.html>


More information about the SciPy-Dev mailing list