[SciPy-Dev] GSoC'21 participation SciPy

Mon Feb 15 14:01:44 EST 2021

On Mon, Feb 15, 2021 at 6:24 PM Pamphile Roy <roy.pamphile at gmail.com> wrote:

> Hi,
>
> Thank you for putting this together!
>
> I would have some ideas for the ideal pool :)
>

Thanks Pamphile!

> *scipy.optimize:* Would it be wanted to have a possibility to have
> workers to evaluate the function during an optimization?
> In most industrial context, the function is not trivial and might require
> minutes if not hours or even days to compute.
> Having a simple way to first parallelise the runs would help. We have
> machines with easily ten cores now and it would be great to leverage this
> here.
>

Definitely - see the mention of workers under
http://scipy.github.io/devdocs/roadmap.html#performance-improvements.

Going that direction, having a more general infrastructure to handle
> external workers would be great.
>

I'm assuming you mean something like standard multiprocessing, or using a
custom Pool object, for code that's trivially parallelizable. Both are
covered by the `workers` pattern. If you're thinking about something else,
can you elaborate?

Sure there are external packages to do this, but then it’s not so trivial
> if you want to use SciPy’s optimizers.
>
> *scipy.optimize:* What about another optimization method such as EGO?
> This would require to have a Gaussian Process regressor.
>

In general we'd like to continue adding high-quality optimization methods
if they bring something extra - see
https://mail.python.org/pipermail/scipy-dev/2021-January/024489.html.

Not sure about EGO in particular (I'm not familiar with it), gaussian
processes sounds a little out of scope - that's scikit-learn territory
probably.

> *scipy.stats:* there is an ANOVA section in the roadmap. But is
> sensitivity analysis in general something which would be of interest.
> I am thinking about Sobol’ indices (not related to Sobol’ sequence but
> from the same author), moment based indices, Shapley values, cusunoro, etc.
>

I'm not 100% sure, let's see if someone more familiar with this topic has
an opinion. In general for new stats functionality we try to figure out if
it fits better in scipy.stats or in statsmodels. The latter doesn't have
much either right now, only:
https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_estimating_equations.GEEResults.sensitivity_params.html

> *scipy.metamodel: *last but not least, a metamodel/response surface
> module. This is linked to the optimization or sensitivity analysis of
> expensive
> functions. Would be sufficient to have Gaussian Process and polynomial
> chaos expansion. Could also include more general things like linear
> regression or
> others things in scipy.interpolate.
>

That is out of scope I'd say, too specific for a new submodule - at the
very least it should start as a separate package first.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210215/e0b984ba/attachment.html>