[SciPy-Dev] Adding non-parametric methods to scipy.stats

Ralf Gommers ralf.gommers at gmail.com
Wed Aug 19 17:26:31 EDT 2020


On Mon, Aug 17, 2020 at 9:44 AM Romain Jacob <jacobr at ethz.ch> wrote:

> Hello everyone,
>
> I've submitted the PR adding support for non-parametric confidence
> intervals for quantiles (https://github.com/scipy/scipy/pull/12680).
> There has been quite some comments made already, which I fixed
> appropriately I believe
>
> Will be happy to get some more feedback or see the PR merged :-)
>
> Note: the last commit has a CI failing apparently due to a file change in
> `scipy/sparse/linalg/` which is completely unrelated. I'm not sure how to
> go about this... ?
>
If it's clearly unrelated, you can just ignore it. Or add a comment "the
only CI failure is in sparse.linalg and unrelated to this PR". Then the
reviewer can just go ahead and merge if everything else looks good - CI
doesn't have to be green.

Cheers,
Ralf

Cheers,
> --
> Romain
>
> On 15/06/2020 08:27, Romain Jacob wrote:
>
> On 13/06/2020 20:54, josef.pktd at gmail.com wrote:
>
> On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd at gmail.com> wrote:
>
>> On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr at ethz.ch> wrote:
>>
>>> On 11/06/2020 20:54, Warren Weckesser wrote:
>>>
>>> On 6/11/20, josef.pktd at gmail.com <josef.pktd at gmail.com> <josef.pktd at gmail.com> wrote:
>>>
>>> I think it would make a good and useful addition and fit into scipy.stats.
>>> There are no pure confint functions yet, AFAIR.
>>>
>>> I agree with Josef and Matt, this looks like it would be a nice
>>> addition to SciPy.  At the moment, I'm not sure what the API should
>>> look like.  Romain, is the work that you've already done available
>>> online somewhere?
>>>
>>> Warren
>>>
>>> Yes, I have some functional implementation available here:
>>> https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
>>>
>>
>> An implementation detail:
>> binom has cdf and ppf functions
>> My guess, not verified, is that we can just use binom.interval
>>
>> (at least I used those for similar cases)
>>
>
> I found my version again
>
> https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-592769480
>
>
> I guess that's the same for two sided confint as the references.
> It doesn't have interpolation if that could be applied in this case.
>
> I don't entirely follow what you mean here: that the building of the
> probabilities in these two lines(
> https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L438 and
> L439) can be built directly form binom without np.cumsum? That definitely
> correct (I actually have code also doing that somewhere).
>
> I did not know about the `interval` method. That's sound interesting
> indeed, but it's not 100% clear to me how the uniqueness problem is
> handled. I looked for the implementation of the method but couldn't find it
> in `binom`... I'm looking in the wrong place?
>
> Cheers,
> --
> Romain
>
>
> This will eventually end up in statsmodels, but I don't know yet where.
> That's not a reason not to add it to scipy.stats.
>
> Josef
>
>
>> Josef
>>
>>
>>> There is quite some work to be done on formatting and documentation to
>>> comply with the SciPy standards, but functionally it's already there (and
>>> as you'll see, the method is quite simple).
>>>
>>> Cheers,
>>> --
>>> Romain
>>>
>>> I recently wrote a function for the confidence interval for the median,
>>> mainly because I ran into the formulas that were easy to code.
>>> related open issue: how do we get confidence intervals for QQ-plot.
>>>
>>> aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion
>>> a while ago in numpy.
>>>
>>> Josef
>>>
>>>
>>> On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla at calpoly.edu> <mhaberla at calpoly.edu>
>>> wrote:
>>>
>>>
>>> OK, we should let our statistics experts weigh in on this. (I'm not
>>> actually one of them.)
>>>
>>> On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr at ethz.ch> <jacobr at ethz.ch> wrote:
>>>
>>>
>>> I think a dedicated function makes more sense. This function takes as
>>> input an array, a percentile and a confidence level, and returns the
>>> corresponding one-sided confidence intervals.
>>>
>>> I quickly looked at the list of existing functions in scipy.stats but
>>> did
>>> not see any function in "summary statistics" that does similar things. So
>>> I
>>> would go for a new function.
>>> On 10/06/2020 20:38, Matt Haberland wrote:
>>>
>>> Where do you envision this living in SciPy? In its own function, or
>>> added
>>> functionality to other functions e.g. scipy.stats.percentileofscore<https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore>
>>> ?
>>>
>>> On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr at ethz.ch> <jacobr at ethz.ch> wrote:
>>>
>>>
>>> On 09/06/2020 20:18, Matt Haberland wrote:
>>>
>>> Yes, I think we would be interested in confidence intervals, but I
>>> think
>>> the algorithm should be very well standard/cited, even if it's not the
>>> best/most modern.
>>>
>>> Yes definitely! We did not invented the method I am referring to, it a
>>> long-known approach (first proposed by Thompson in 1936 [1], extended
>>> later
>>> and commonly found in textbooks, eg [2,3]). This method is very simple,
>>> quite powerful, yet it has been largely overlooked in many scientific
>>> fields. I found no available implementation to facilitate its use (at
>>> least
>>> not in Python, there may be something in R, I have not looked).
>>>
>>> [1] https://www.jstor.org/stable/2957563
>>> [2] doi.org/10.1002/0471722162.ch7
>>> [3] https://perfeval.epfl.ch/
>>>
>>> @WarrenWeckesser and I had planned to work on confidence intervals for
>>> the test statistics returned by our statistical tests<https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
>>>
>>>
>>> That is also definitely interesting, although I am not myself an expert
>>> in that area. I am glad to see that the complete list contains some
>>> non-parametric tests :-)
>>>
>>> Cheers,
>>> --
>>> Romain
>>>
>>>
>>> On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr at ethz.ch> <jacobr at ethz.ch> wrote:
>>>
>>>
>>> Hello everyone,
>>>
>>> I have been working for some time on the implementation of
>>> non-parametric methods to compute confidence intervals for
>>> percentiles.
>>> There are some very interesting results in the literature (see e.g. a
>>> nice
>>> pitch in [1]) which I think it would be great to add to SciPy to make
>>> them
>>> more readily available. It also seems to be rather in line with
>>> "recent"
>>> discussions of the roadmap for scipy.stats [2].
>>>
>>> I would be interested in contributing this. What do you think?
>>>
>>> Cheers,
>>> --
>>> Romain
>>>
>>> [1] https://ieeexplore.ieee.org/document/6841797
>>> [2] https://github.com/scipy/scipy/issues/10577
>>> --
>>> Romain Jacob
>>> Postdoctoral Researcher
>>> ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net
>>> @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner>
>>> Gloriastrasse 35, ETZ G75
>>> 8092 Zurich
>>> +41 7 68 16 88 22
>>> _______________________________________________
>>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> --
>>> Matt Haberland
>>> Assistant Professor
>>> BioResource and Agricultural Engineering
>>> 08A-3K, Cal Poly
>>>
>>> _______________________________________________
>>> SciPy-Dev mailinglistSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> --
>>> Romain Jacob
>>> Postdoctoral Researcher
>>> ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net
>>> @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner>
>>> Gloriastrasse 35, ETZ G75
>>> 8092 Zurich
>>> +41 7 68 16 88 22
>>> _______________________________________________
>>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> --
>>> Matt Haberland
>>> Assistant Professor
>>> BioResource and Agricultural Engineering
>>> 08A-3K, Cal Poly
>>>
>>> _______________________________________________
>>> SciPy-Dev mailinglistSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> --
>>> Matt Haberland
>>> Assistant Professor
>>> BioResource and Agricultural Engineering
>>> 08A-3K, Cal Poly
>>> _______________________________________________
>>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>> --
>>> Romain Jacob
>>> Postdoctoral Researcher
>>> ETH Zurich - Computer Engineering and Networks Laboratory
>>> www.romainjacob.net
>>> @RJacobPartner <https://twitter.com/RJacobPartner>
>>> Gloriastrasse 35, ETZ G75
>>> 8092 Zurich
>>> +41 7 68 16 88 22
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>
> _______________________________________________
> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>
> --
> Romain Jacob
> Postdoctoral Researcher
> ETH Zurich - Computer Engineering and Networks Laboratory
> www.romainjacob.net
> @RJacobPartner <https://twitter.com/RJacobPartner>
> Gloriastrasse 35, ETZ G75
> 8092 Zurich
> +41 7 68 16 88 22
>
> _______________________________________________
> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>
> --
> Romain Jacob
> Postdoctoral Researcher
> ETH Zurich - Computer Engineering and Networks Laboratory
> www.romainjacob.net
> @RJacobPartner <https://twitter.com/RJacobPartner>
> Gloriastrasse 35, ETZ G75
> 8092 Zurich
> +41 7 68 16 88 22
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200819/74caece9/attachment-0001.html>


More information about the SciPy-Dev mailing list