[SciPy-Dev] Sensitivity analysis module proposal

Sun Apr 11 13:55:43 EDT 2021

On Sun, Apr 11, 2021 at 9:07 AM Pamphile Roy <roy.pamphile at gmail.com> wrote:

>
> On 09.04.2021, at 19:51, Robert Kern <robert.kern at gmail.com> wrote:
>
> On Fri, Apr 9, 2021 at 1:42 PM Pamphile Roy <roy.pamphile at gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I would like to propose to add sensitivity analysis (SA/GSA) functions.
>> Also called uncertainty quantification (UQ) or verification and validation
>> (V&V) depending on the field.
>>
>
> SALib is actively developed. I recommend contributing there if there are
> any gaps that you think need to be filled.
>
> https://salib.readthedocs.io/en/latest/
>
> In my opinion, the fact that a library exists is not contradictory to
> adding some functionalities in SciPy. We are discussing about including
> UNU.RAN which is arguably the same.
> SALib is a nice library, but as a user you will only find it and be
> willing to use it if you already know about SA. Like all niche products.
>

"It exists elsewhere" isn't my argument. While this is obviously a
judgement call, and my opinion isn't necessarily that of anyone else's, I
do have a rough rubric in mind when I consider things for inclusion in
scipy. The main guiding principle is to make important functionality
available to the scientific Python community. If including that
functionality in scipy advances that, great, that's an argument for
inclusion. But sometimes, inclusion inside scipy is just a neutral move,
and I think that's the case here. That's not dispositive, but then we have
to go to more specific reasons, like wanting to use the functionality
inside other parts of scipy (like QMC in SHGO).

So to take UNU.RAN as an example, it's an old, relatively unmaintained C
library. Its important functionality is *not* currently available to the
scientific Python community. Further, we want to *use* UNU.RAN internally
to provide faster implementations of random sampling for our distributions
that lack `_rvs()` methods. In contrast, SALib is actively maintained by a
multi-developer team; it's a Python library that uses numpy; it's liberally
licensed like scipy; it is used by other projects.

> Having it in SciPy (or another project with a wider scope like
> statsmodels) would allow a greater exposure to the whole scientific
> community to this problematic. Again, this topic is getting more and more
> traction and SA is now a recurring theme for industrial applications.
>
>

> We should really consider the positive fallback it could have. Taking
> scipy.stats.qmc for instance. Now that it’s in, a lot of projects will
> benefit from this inclusion. Not only they can rely on it, but being SciPy,
> we also took great care about the design and fixed things which were not
> that obvious nor even really studied (scikit-optimize, optuna, pydoe, and
> even SALib all had issues with their QMC implementations).
> Thanks to the implementation and review process, 2 articles got written
> and SciPy will be presented during a conference to a new community, the QMC
> community.
> And I believe we could have the same impact here and attract people from
> the SA community. R is still massively used in both cases.
>

The existing packages that did just QMC were often just
individual-maintained projects that are not very sustainable. So the
higher-level packages that *needed* QMC often rolled their own to varying
degrees of effectiveness. Implementing QMC in scipy in the disciplined and
thorough manner that you did means that those projects can now rely on that
solid building block. The important thing wasn't necessarily "in scipy" per
se, it was the discipline and thoroughness. IMO, SALib has done the
discipline and thoroughness just fine outside of scipy.

If the SALib developers expressed interest in merging SALib into scipy,
that'd be one thing. But if they are interested in maintaining it as an
independent project, I would recommend contributing to it to build on their
success instead of starting from scratch. As a tentpole project of the
scientific Python community, we want to support the efforts of the whole
community, not replace them or absorb them.

In the end, if we don’t want any SA in SciPy, it’s fine but it should be
> motivated by something other than: it exists elsewhere. Because we are at
> the point where almost everything exists elsewhere.
> Furthermore, I believe SA matches our scope as we have various types of
> analysis of variance (ANOVA) in the roadmap.
>

I'm not sure that the connection between ANOVA (at least, the specific
tools that are on the roadmap) and SA is apt. You use ANOVA and SA to solve
different problems on different objects (datasets vs models, respectively).
I'm not sure about the comparison being made here. Some SA techniques do
use some ANOVA-like analyses internally on specifically-designed
sample points, but I think that's as far as the connection goes. In any
case, what's on the roadmap for ANOVA is really just implementing a handful
of very standard textbook hypothesis tests.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210411/a41b2227/attachment.html>