[SciPy-Dev] pull requests and code review

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Jan 6 23:37:15 EST 2012


On Fri, Jan 6, 2012 at 10:04 PM, Travis Oliphant <travis at continuum.io> wrote:
>>>
>> I do appreciate the work that various people put into this project but
>> this sort of illustrates the frustration that some of us mere users
>> have with numpy and scipy. One hand you appear to want to change the
>> numpy governance to being more open and community driven yet this
>> thread does reflects the opposite.
>>
>
> Actually, I feel like it already is pretty open and community driven *as witnessed* by this thread.   I may have a particular opinion, but it is the opinion of the active community that holds sway and drives what happens.    What I would like to do is underscore that point more broadly.    But, I also want to make sure that the current crop of maintainers doesn't make it difficult for new people to come in because they have a different mindset or are too overwhelmed by how much one has to learn or understand to make a contribution.

I think that's where the move to github has made it much easier. It's
much easier to make comments to a specific line or general comments
when reviewing a pull request.

Sometimes it's relatively easy, just make sure test coverage is high
and everything works as advertised. Sometimes it's pointing to the
existing pattern, eg. for cython code or how to wrap lapack.

Other times it's difficult to make changes that are compatible with
existing usage. The recent example are the improvements to
gaussian_kde. Ralf made improvements that would be good if it were in
a new package, but it was a lengthy process to get it into a form that
doesn't break what I considered to be the existing standard usage
(recommended on the mailinglists and on stackoverflow). It can be a
little bit painful and time consuming at times but we finally managed
what looks like a good enhancement.

Other times pull requests or suggestions just never get to the stage
where they are ready to be merged, either because the proposer doesn't
bring into into a mergable form or no "maintainer" is interested
enough to do the work himself, or it doesn't make enough sense.

My impression is that in most cases the pull request and code review
system on github works pretty well for this.

Welcoming new users with a different mindset doesn't mean that code
that is not sufficiently tested or that can be expected to make
maintenance much more difficult in the future has to be merged into
scipy.

(I'm just a reviewer, not a committer anymore, and I'm very glad about
the role that especially Ralf and Warren take in getting improvements
into the actual code.)

Josef

>
> Also, it really depends on what section of code we are talking about as to how much input is needed.   The more people are depending on code, the more review is needed.  That is clear, but it is difficult to tell how much code is actually being used.   There is a lot of SciPy that still needs to be developed more fully, and if it takes an action-by-committee to make it happen, SciPy will not reach 1.0 very quickly.
>
> Here, for example, I wonder how many active users of fmin_ncg are actually out there.  Most optimization approaches don't use the hessian parameters at all.    I rarely recommend using fmin_ncg to others.   That was one of the variables in my mental model when I rapidly reviewed the original contribution.

Overall I find it difficult to tell which code is in use (for example
for statsmodels). For scipy it's a bit easier to tell from what the
big downstream packages use, and complain when something breaks and
also when the propose improvements, or just from keeping track of the
discussion.

As for the optimizers, Skipper spend some effort on wrapping all the
unconstraint solvers and now also many constraint solvers so that they
can be used by the estimators with a simple method="ncg" or something
like this. Skipper also spend a lot of time coding gradients and
hessians.(Performance with analytical hessians is much better, but in
some cases we just use a numerical hessian.)
I think all wrapped optimizers are covered in the test suite. Which
once are commonly used is not so clear, for discrete models the
default is actually our home (Skipper) made Newton method.


>
> -Travis
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev



More information about the SciPy-Dev mailing list