[SciPy-Dev] Fwd: [Numpy-discussion] GSoC?

Thu Mar 3 11:59:23 EST 2016

---------- Forwarded message ----------
From: Ralf Gommers <ralf.gommers at gmail.com>
Date: Wed, Feb 10, 2016 at 11:02 PM
Subject: Re: [Numpy-discussion] GSoC?
To: Discussion of Numerical Python <numpy-discussion at scipy.org>

On Wed, Feb 10, 2016 at 11:55 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Wed, Feb 10, 2016 at 11:48 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>>
>> Thanks Ralf,
>>
>>> Note that we have always done a combined numpy/scipy ideas page and submission. For really good students numpy may be the right challenge, but in general scipy is easier to get started on.
>>
>>
>> yup -- good idea. Is there a page ready to go, or do we need to get one up? (I don't even know where to put it...)
>
>
> This is last year's page: https://github.com/scipy/scipy/wiki/GSoC-2015-project-ideas
>
> Some ideas have been worked on, others are still relevant. Let's copy this page to -2016- and start editing it and adding new ideas. I'll start right now actually.

OK first version: https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas
I kept some of the ideas from last year, but removed all potential
mentors as the same people may not be available this year - please
re-add yourselves where needed.

And to everyone who has a good idea, and preferably is willing to
mentor for that idea: please add it to that page.

Here are a couple of vague ideas which I'd like to hash out on the
list first. Both are mostly in the request-for-comment,
feedback-welcome stage.
(I am going to write them out even though I cannot guarantee that I'll
be available to mentor either of them.)

1. Numeric differentiation. The current proposal focuses on bundling
numdifftools. I think there's room for a complementary set of routines
for numeric differentiation with fixed step, or limited intelligence
w.r.t. the step selection. In fact, there is a fairly comprehensive
implementation in scipy already, a by-product of last year's GSoC.
It's not exposed publicly, and is hidden as
scipy.optimize._numdiff.approx_derivative. I suspect that what's
already there covers the bulk of the implementation, and what's left
is a only a bit of work to spec out the user-facing API, especially
for the broadcasting behavior, and then replacing semi-broken bits of
scipy.misc.derivative, scipy.optimize.approx_fprime etc.
The original PR has quite extensive discussion,
https://github.com/scipy/scipy/pull/4884
And here's an unfinished attempt at following up on that discussion:
https://gist.github.com/ev-br/7a1edb3f250bd375c46a

2. Optimization benchmarks.

There are two quite large bodies of work on benchmarking various
optimization algorithms:

- Andrew Nelson's global optimizers benchmarks,
https://github.com/scipy/scipy/pull/4191.

- Nikolay Mayorov's nonliear LSQ benchmarks, some of which are
available in scipy/benchmarks, and the most recent version is at
https://github.com/nmayorov/lsq-benchmarks

I think both can be very valuable for a broader scipy ecosystem, and
both of these herculean efforts are currently blocked on the question
of how to actually run them. It seems to me that it would be a useful
project to come up with a standardized way of running these sorts of
benchmarks and publishing the results. Ideally, it would include,

* a runner. Maybe an ASV-based one, but then the project might include
some enhancements to asv itself.
* a way of pushlishing the results (again, asv has one, is it sufficient?)
* a self-contained environment for running and publishing (a VM image?
a docker container?)

So that by the end of the project we would have a web page with the
reference results, some automation for refreshing when needed, and a
documented way of reproducing them.

Thoughts?

Evgeni