[SciPy-Dev] Adding non-parametric methods to scipy.stats

Romain Jacob jacobr at ethz.ch
Mon Aug 17 04:38:28 EDT 2020


Hello everyone,

I've submitted the PR adding support for non-parametric confidence 
intervals for quantiles (https://github.com/scipy/scipy/pull/12680). 
There has been quite some comments made already, which I fixed 
appropriately I believe

Will be happy to get some more feedback or see the PR merged :-)

Note: the last commit has a CI failing apparently due to a file change 
in `scipy/sparse/linalg/` which is completely unrelated. I'm not sure 
how to go about this... ?

Cheers,
-- 
Romain

On 15/06/2020 08:27, Romain Jacob wrote:
> On 13/06/2020 20:54, josef.pktd at gmail.com wrote:
>> On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd at gmail.com 
>> <mailto:josef.pktd at gmail.com>> wrote:
>>
>>     On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr at ethz.ch
>>     <mailto:jacobr at ethz.ch>> wrote:
>>
>>         On 11/06/2020 20:54, Warren Weckesser wrote:
>>>         On 6/11/20,josef.pktd at gmail.com  <mailto:josef.pktd at gmail.com>  <josef.pktd at gmail.com>  <mailto:josef.pktd at gmail.com>  wrote:
>>>>         I think it would make a good and useful addition and fit into scipy.stats.
>>>>         There are no pure confint functions yet, AFAIR.
>>>         I agree with Josef and Matt, this looks like it would be a nice
>>>         addition to SciPy.  At the moment, I'm not sure what the API should
>>>         look like.  Romain, is the work that you've already done available
>>>         online somewhere?
>>>
>>>         Warren
>>
>>         Yes, I have some functional implementation available here:
>>         https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
>>
>>
>>     An implementation detail:
>>     binom has cdf and ppf functions
>>     My guess, not verified, is that we can just use binom.interval
>>
>>     (at least I used those for similar cases)
>>
>>
>> I found my version again
>> https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-592769480 
>>
>>
>> I guess that's the same for two sided confint as the references.
>> It doesn't have interpolation if that could be applied in this case.
>>
> I don't entirely follow what you mean here: that the building of the 
> probabilities in these two lines( 
> https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L438 
> and L439) can be built directly form binom without np.cumsum? That 
> definitely correct (I actually have code also doing that somewhere).
>
> I did not know about the `interval` method. That's sound interesting 
> indeed, but it's not 100% clear to me how the uniqueness problem is 
> handled. I looked for the implementation of the method but couldn't 
> find it in `binom`... I'm looking in the wrong place?
>
> Cheers,
> -- 
> Romain
>
>
>> This will eventually end up in statsmodels, but I don't know yet 
>> where. That's not a reason not to add it to scipy.stats.
>>
>> Josef
>>
>>
>>     Josef
>>
>>         There is quite some work to be done on formatting and
>>         documentation to comply with the SciPy standards, but
>>         functionally it's already there (and as you'll see, the
>>         method is quite simple).
>>
>>         Cheers,
>>         -- 
>>         Romain
>>
>>>>         I recently wrote a function for the confidence interval for the median,
>>>>         mainly because I ran into the formulas that were easy to code.
>>>>         related open issue: how do we get confidence intervals for QQ-plot.
>>>>
>>>>         aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion
>>>>         a while ago in numpy.
>>>>
>>>>         Josef
>>>>
>>>>
>>>>         On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland<mhaberla at calpoly.edu>  <mailto:mhaberla at calpoly.edu>
>>>>         wrote:
>>>>
>>>>>         OK, we should let our statistics experts weigh in on this. (I'm not
>>>>>         actually one of them.)
>>>>>
>>>>>         On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob<jacobr at ethz.ch>  <mailto:jacobr at ethz.ch>  wrote:
>>>>>
>>>>>>         I think a dedicated function makes more sense. This function takes as
>>>>>>         input an array, a percentile and a confidence level, and returns the
>>>>>>         corresponding one-sided confidence intervals.
>>>>>>
>>>>>>         I quickly looked at the list of existing functions in scipy.stats but
>>>>>>         did
>>>>>>         not see any function in "summary statistics" that does similar things. So
>>>>>>         I
>>>>>>         would go for a new function.
>>>>>>         On 10/06/2020 20:38, Matt Haberland wrote:
>>>>>>
>>>>>>         Where do you envision this living in SciPy? In its own function, or
>>>>>>         added
>>>>>>         functionality to other functions e.g. scipy.stats.percentileofscore
>>>>>>         <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore>  <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore>
>>>>>>         ?
>>>>>>
>>>>>>         On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob<jacobr at ethz.ch>  <mailto:jacobr at ethz.ch>  wrote:
>>>>>>
>>>>>>>         On 09/06/2020 20:18, Matt Haberland wrote:
>>>>>>>
>>>>>>>         Yes, I think we would be interested in confidence intervals, but I
>>>>>>>         think
>>>>>>>         the algorithm should be very well standard/cited, even if it's not the
>>>>>>>         best/most modern.
>>>>>>>
>>>>>>>         Yes definitely! We did not invented the method I am referring to, it a
>>>>>>>         long-known approach (first proposed by Thompson in 1936 [1], extended
>>>>>>>         later
>>>>>>>         and commonly found in textbooks, eg [2,3]). This method is very simple,
>>>>>>>         quite powerful, yet it has been largely overlooked in many scientific
>>>>>>>         fields. I found no available implementation to facilitate its use (at
>>>>>>>         least
>>>>>>>         not in Python, there may be something in R, I have not looked).
>>>>>>>
>>>>>>>         [1]https://www.jstor.org/stable/2957563
>>>>>>>         [2]doi.org/10.1002/0471722162.ch7  <http://doi.org/10.1002/0471722162.ch7>
>>>>>>>         [3]https://perfeval.epfl.ch/
>>>>>>>
>>>>>>>         @WarrenWeckesser and I had planned to work on confidence intervals for
>>>>>>>         the test statistics returned by our statistical tests
>>>>>>>         <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>  <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
>>>>>>>
>>>>>>>
>>>>>>>         That is also definitely interesting, although I am not myself an expert
>>>>>>>         in that area. I am glad to see that the complete list contains some
>>>>>>>         non-parametric tests :-)
>>>>>>>
>>>>>>>         Cheers,
>>>>>>>         --
>>>>>>>         Romain
>>>>>>>
>>>>>>>
>>>>>>>         On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob<jacobr at ethz.ch>  <mailto:jacobr at ethz.ch>  wrote:
>>>>>>>
>>>>>>>>         Hello everyone,
>>>>>>>>
>>>>>>>>         I have been working for some time on the implementation of
>>>>>>>>         non-parametric methods to compute confidence intervals for
>>>>>>>>         percentiles.
>>>>>>>>         There are some very interesting results in the literature (see e.g. a
>>>>>>>>         nice
>>>>>>>>         pitch in [1]) which I think it would be great to add to SciPy to make
>>>>>>>>         them
>>>>>>>>         more readily available. It also seems to be rather in line with
>>>>>>>>         "recent"
>>>>>>>>         discussions of the roadmap for scipy.stats [2].
>>>>>>>>
>>>>>>>>         I would be interested in contributing this. What do you think?
>>>>>>>>
>>>>>>>>         Cheers,
>>>>>>>>         --
>>>>>>>>         Romain
>>>>>>>>
>>>>>>>>         [1]https://ieeexplore.ieee.org/document/6841797
>>>>>>>>         [2]https://github.com/scipy/scipy/issues/10577
>>>>>>>>         --
>>>>>>>>         Romain Jacob
>>>>>>>>         Postdoctoral Researcher
>>>>>>>>         ETH Zurich - Computer Engineering and Networks Laboratory
>>>>>>>>         www.romainjacob.net  <http://www.romainjacob.net>
>>>>>>>>         @RJacobPartner<https://twitter.com/RJacobPartner>  <https://twitter.com/RJacobPartner>
>>>>>>>>         Gloriastrasse 35, ETZ G75
>>>>>>>>         8092 Zurich
>>>>>>>>         +41 7 68 16 88 22
>>>>>>>>         _______________________________________________
>>>>>>>>         SciPy-Dev mailing list
>>>>>>>>         SciPy-Dev at python.org  <mailto:SciPy-Dev at python.org>
>>>>>>>>         https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>>>>
>>>>>>>         --
>>>>>>>         Matt Haberland
>>>>>>>         Assistant Professor
>>>>>>>         BioResource and Agricultural Engineering
>>>>>>>         08A-3K, Cal Poly
>>>>>>>
>>>>>>>         _______________________________________________
>>>>>>>         SciPy-Dev mailing
>>>>>>>         listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev  <mailto:listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
>>>>>>>
>>>>>>>         --
>>>>>>>         Romain Jacob
>>>>>>>         Postdoctoral Researcher
>>>>>>>         ETH Zurich - Computer Engineering and Networks Laboratory
>>>>>>>         www.romainjacob.net  <http://www.romainjacob.net>
>>>>>>>         @RJacobPartner<https://twitter.com/RJacobPartner>  <https://twitter.com/RJacobPartner>
>>>>>>>         Gloriastrasse 35, ETZ G75
>>>>>>>         8092 Zurich
>>>>>>>         +41 7 68 16 88 22
>>>>>>>         _______________________________________________
>>>>>>>         SciPy-Dev mailing list
>>>>>>>         SciPy-Dev at python.org  <mailto:SciPy-Dev at python.org>
>>>>>>>         https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>>>
>>>>>>         --
>>>>>>         Matt Haberland
>>>>>>         Assistant Professor
>>>>>>         BioResource and Agricultural Engineering
>>>>>>         08A-3K, Cal Poly
>>>>>>
>>>>>>         _______________________________________________
>>>>>>         SciPy-Dev mailing
>>>>>>         listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev  <mailto:listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
>>>>>>
>>>>>>         _______________________________________________
>>>>>>         SciPy-Dev mailing list
>>>>>>         SciPy-Dev at python.org  <mailto:SciPy-Dev at python.org>
>>>>>>         https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>>
>>>>>         --
>>>>>         Matt Haberland
>>>>>         Assistant Professor
>>>>>         BioResource and Agricultural Engineering
>>>>>         08A-3K, Cal Poly
>>>>>         _______________________________________________
>>>>>         SciPy-Dev mailing list
>>>>>         SciPy-Dev at python.org  <mailto:SciPy-Dev at python.org>
>>>>>         https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>
>>>         _______________________________________________
>>>         SciPy-Dev mailing list
>>>         SciPy-Dev at python.org  <mailto:SciPy-Dev at python.org>
>>>         https://mail.python.org/mailman/listinfo/scipy-dev
>>         -- 
>>         Romain Jacob
>>         Postdoctoral Researcher
>>         ETH Zurich - Computer Engineering and Networks Laboratory
>>         www.romainjacob.net <https://www.romainjacob.net/>
>>         @RJacobPartner <https://twitter.com/RJacobPartner>
>>         Gloriastrasse 35, ETZ G75
>>         8092 Zurich
>>         +41 7 68 16 88 22
>>         _______________________________________________
>>         SciPy-Dev mailing list
>>         SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>         https://mail.python.org/mailman/listinfo/scipy-dev
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
> -- 
> Romain Jacob
> Postdoctoral Researcher
> ETH Zurich - Computer Engineering and Networks Laboratory
> www.romainjacob.net <https://www.romainjacob.net/>
> @RJacobPartner <https://twitter.com/RJacobPartner>
> Gloriastrasse 35, ETZ G75
> 8092 Zurich
> +41 7 68 16 88 22
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
-- 
Romain Jacob
Postdoctoral Researcher
ETH Zurich - Computer Engineering and Networks Laboratory
www.romainjacob.net <https://www.romainjacob.net/>
@RJacobPartner <https://twitter.com/RJacobPartner>
Gloriastrasse 35, ETZ G75
8092 Zurich
+41 7 68 16 88 22
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200817/98be1711/attachment-0001.html>


More information about the SciPy-Dev mailing list