[SciPy-Dev] Sensitivity analysis module proposal (Robert Kern)

Ralf Gommers ralf.gommers at gmail.com
Tue Sep 14 11:11:47 EDT 2021


On Fri, Sep 3, 2021 at 7:15 PM Pamphile Roy <roy.pamphile at gmail.com> wrote:

>
>
> Hi all,
>
>
> First things first, there are too many things to discuss and email is not
> appropriate. I propose to make a call with people interested next week or
> the week after. Let me know if we can do this and what time would work so I
> can put together a doodle.
>

I think I understand the point you were trying to make - this is a large
topic, and email is not the easiest conversation mechanism. So having a
call with people who are interested to hash out a few things can be very
useful. However let me point out that the purpose of that would be to get
to a clearer proposal - final discussions and decisions on new features and
directions for SciPy development has always been and will continue to be on
this list.

We will need a better scoped proposal, like a list of functionality that
should go into (e.g.) `scipy.stats`.


> Some replies on specific points:
>
>
> Let me preface this by saying that I am not gaining anything from working
> on SciPy. I have no project related to SA at the moment and nothing is
> planned. I am only giving some personal time and have zero financial
> support for my time.
>

This is true for almost everyone at the moment (good news is that we do
have a new grant from CZI starting soon, for ~0.5 FTE).


>    - SALib into SciPy:
>
>
> I proposed this following your input Robert. You said that if the
> maintainers of this library would be willing to do so, then we could
> discuss. I am very happy to see that they would actually be willing to do
> this and help.
>
>
>
>    - Funding:
>
>
> I can only disagree. Yes, being part of SciPy does not open directly a
> line of credit, but it does indirectly help. Art Owen already told me that
> he is using the fact that he helped us on stats.qmc to get some fundings.
> And I am certain some of us also use this with their current employers.
>

I think both points of view are valid here. For dedicated direct funding,
like a grant for developing new SA methods, a separate project is probably
better. For "secondary benefits", being able to say that something goes
into SciPy - and therefore reaches potentially O(10 million) users - can be
powerful.


>
>    - Bus factor:
>
>
> We are way pass the bus factor. There is the team behind SALib, me, and I
> already got interest from people like Sergei Kucherenko (one of the pillar
> of SA). And if we go that way, I am 100% sure Saltelli (another pillar)
> will be interested (I worked with him on a tooling project).
>

It's pure Python code and relatively straightforward technically, so if
there's a commitment from more than one person to be maintainers, I think
this part doesn't worry me.


> People like them are willing to contribute only because it’s SciPy. We had
> this discussion at MCM2021 when I presented SciPy’s QMC module to the QMC
> community. Researchers, at least from the communities I know and talked
> with, do not want to contribute to something that is uncertain or linked to
> a particular group. With SciPy they see an opportunity to have a large
> impact across a wide range of fields. And also it’s the assurance to have a
> long term impact.
>

This is not the primary factor for deciding whether or not something
belongs in SciPy, but it's helpful to know that there will be expert
support/reviewers. When adding new functionality, ensuring correctness and
fit for purpose of algorithms is quite often a pain point when deciding
whether to merge a PR.


>    - Why SA? How is this important?
>
> SA is becoming a major field. Not only by itself, but other fields are
> starting to use its methods. Things like Shapley values in AI, importance
> factor, etc. In most engineering field, it’s now mandatory to assess
> uncertainties. Hence there is a real need for tooling.
>

I think this is where more detail and data is needed. For reference, these
are the sub-submodules we added in the past 10 years: `stats.qmc`,
`spatial.transform`, `signal.windows`, Cython interfaces for `linalg` and
`special`, `sparse.csgraph`. We haven't added any new submodules in over a
decade. All of those seem more "general" than SA, except perhaps
`stats.qmc` which has a similar audience.

So I'd say that a new top-level submodule does not sound like a good fit. A
sub-submodule or just a set of functions/classes in `scipy.stats` could
make sense on the other hand. The question is what that list should look
like to make this effort make sense for both SciPy and SALib.

There are mature libraries in the field. OpenTURNS, UQLab and Dakota are
> the most used among practitioners. And they are all not independent and
> open as we are (explains partially why people like Sergei or Saltelli are
> not contributors). I will not go here about why I think these library
> should not be used.
>
>
> I really don’t get the push back on this one. It’s about adding a few
> method in stats at most and the benefits would be arguably huge compared to
> some functions/sub modules we added recently.
>

That sounds like what I said before - so let's make that list, to make the
conversation concrete.

Behind it, there would be the most renown people of the field and that
> could give another great exposure for SciPy. We will talk of SciPy at new
> conferences, etc.
>
>
> My presentation of QMC had a great impact in the QMC community. I had 4
> concrete proposal of collaboration just during the conference. We went from
> not being in one field to be the recommended tool by a community.
>
>
> SciPy is at the foundation of the scientific ecosystem in Python and not
> having basic tooling a about uncertainty/SA that other higher lever
> packages could rely on is a puzzle for me.
>

That is usually a good reason to put something in SciPy: if other packages
with a significant user base need/want it, and those do not want to rely on
a smaller package like SALib as a dependency. This is how we got sparse
graph algorithms for example - upstreamed from scikit-learn and then
expanded.

Cheers,
Ralf



> As I said at the beginning, let’s have a talk to decide what’s in the best
> interest of SciPy.
>
>
> Cheers,
>
> Pamphile
>
>
> On 3 Sep 2021, at 03:35, Robert Kern <robert.kern at gmail.com> wrote:
>
> 
> On Thu, Sep 2, 2021 at 3:30 PM William Usher <wusher at kth.se> wrote:
>
>> Hi Robert,
>>
>> Thanks for the response. You raise good points.
>>
>> Obviously, that you are interested in the proposal assuages some of that,
>> but I'm still unclear on why you are interested in this. What is the
>> benefit that you think everyone will get by absorbing SALib into scipy? It
>> still looks to me like a mostly-lateral move that will merely be disruptive
>> to your dependent projects more than anything else.
>>
>>
>> The real value of SALib is in providing a consistent interface to a
>> (large) suite of sensitivity analysis methods which allows users to easily
>> switch between those methods.
>>
>> We (as maintainers) could benefit from reducing duplication of code and
>> implementations, such has Sobol’ sequence generation, LHS, and could
>> contribute some of the more general sample generation implementations where
>> appropriate (many are linked directly to the SA implementations and not
>> useful outside of that).
>>
>
> I definitely think anything that could plausibly fall in the purview of
> scipy.stats.qmc would be a good target for convergence. If you can't use
> scipy.stats.qmc due to missing functionality (and not just that you don't
> want to require that recent version of scipy), then let's see how we can
> shore it up. I think that where your sampling methods overlap with design
> of experiments in general might also be a fruitful place for scipy.stats to
> grow/absorb some community-wide functionality.
>
>  We think the scientific community as a whole would benefit from the
>> greater exposure a SciPy implementation of SA would bring - as a large
>> community-led effort - it could provide a neutral forum for further
>> development of these methods. This would likely come at some “cost” to the
>> successful “cottage industry” we’ve established and grown (SALib is getting
>> lots of citations and use).
>>
>
> That is gold. I am having trouble understanding why you would contemplate
> anything that might place that at risk. If I were dictator here, I would
> reject you out of hand for your own good. ;-)
>
> I think a key argument against integration is that it may reduce the
>> agility with which we can add new methods to our SA suite (although this
>> could be mitigated with careful design).
>>
>> I think you raise an important point about our dependent projects, and
>> particularly how we would continue to support legacy releases if developer
>> resources were focussed on a SciPy integration?
>>
>> An attraction is the possibility of funding to support the development of
>> SA within SciPy.  Like all open-source projects, we suffer from resourcing
>> issues, and are predominantly volunteer-driven from the academic
>> community.  And while we are technically a “multi-developer” community,
>> we’re only a few bus accidents (or career changes) away from being a
>> lone-maintainer.
>>
>
> That's more required bus accidents than many parts of scipy. :-)
>
> I'm afraid that contributing into scipy doesn't unlock a pot of funds. I'm
> out of the grant-writing game, but I suspect that the difference between
> applying for funds on your own and arguing for allocation from competing
> needs inside of scipy is mostly a push. Have you considered applying to
> NumFocus? I occasionally look at their accepted and rejected projects, and
> I'd lump you in with the former, IMO. You're doing important work, playing
> well with other projects in the ecosystem, and have at least the seed
> kernel of sustainable community development so that funds are likely to
> actually sustain development.
>
> --
> Robert Kern
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210914/f926736a/attachment-0001.html>


More information about the SciPy-Dev mailing list