[SciPy-Dev] Adding non-parametric methods to scipy.stats

Hans Dembinski hans.dembinski at gmail.com
Tue Oct 27 10:00:16 EDT 2020


Dear Warren, all,

I am following up on my message from June about integrating a general bootstrap library into scipy.

Daniel and I have been busy with finishing our rewrite of the resample library and we released version 1.0.1 for general use on August 24. I have been busy with other stuff that's why I didn't come back sooner, sorry.

Docs: https://resample.readthedocs.io/en/master
Source: https://github.com/resample-project/resample
PyPI: https://pypi.org/project/resample

resample is a pure Python implementation written from scratch using only scipy and numpy as dependencies and a BSD 3-clause license. It should be suitable for inclusion in scipy. I believe we have converged on a high quality Pythonic interface that offers both a powerful low-level API for experts and a convenient high-level API for practitioners. Our implementations were optimised to make efficient use of numpy to offload the hot loops into C and to avoid creation of unnecessary copies and temporary arrays.

What resample offers:

- Ordinary, balanced, and parametric bootstrap resampling with stratification of N-dimensional data
- Jackknife resampling of N-dimensional data
- For both bootstrap and jackknife resampling: computation of bias and/or variance of an estimator (that would be a generic Python function which maps data samples to N-dimensional output)
- Bootstrap confidence intervals (BCa and percentile)
- A battery of non-parametric permutation-based tests like the Wilcoxon rank sum test, https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
- Accessive docs in numpydoc format

The bootstrap and jackknife functionalities are completely generic. One can compute confidence intervals (the BCa method is state-of-the-art) for any statistical estimator, including arbitrary complicated ones obtained from machine learning and also for the quantile which has the original topic of this thread.

So far we only have 34 stars on Github, but that is mostly because we did not advertise. I believe our library has the potential to be very popular if we actually start advertising, but neither myself nor Daniel are very interested doing public relations. We both have full-time jobs and developing resample is a hobby to us. We would be happy to have resample in SciPy so that our work can benefit from the visibility that Scipy enjoys, while Scipy can benefit from the functionality that resample offers.

Best regards,
Hans

PS: My credentials in case you need them:
I program in Python since 15 years as a scientist working on big data. I have expertise in both user-friendly interface design and hardware-near numerical programming. I am the author of the Boost.Histogram library in C++14 on www.boost.org and co-author of the corresponding Python module boost-histogram. I contributed to matplotlib and maintain the iminuit Python module, a numerical minimiser and error computation tool that is popular in high energy physics.

PPS: Last week, I had the opportunity to listen live to a talk from Brad Efron himself, the inventor of the bootstrap. Fantastic guy.

> On 18. Jun 2020, at 17:53, Hans Dembinski <hans.dembinski at gmail.com> wrote:
> 
> Dear Warren, (Daniel in CC)
> 
>> On 18. Jun 2020, at 16:15, Warren Weckesser <warren.weckesser at gmail.com> wrote:
>> 
>> On 6/18/20, Hans Dembinski <hans.dembinski at gmail.com> wrote:
>>> Dear all,
>>> 
>>> since there was no reply to my first attempt, I am repeat my message. Daniel
>>> Saxton and I are working on a Python library called `resample`, which
>>> implements the bootstrap and jackknife. We would like to work toward merging
>>> bootstrap functions into Scipy and it would be great to get some feedback
>>> about this. We would be pleased to collaborate with people who are already
>>> working on this in Scipy. We are both pretty decent programmers,
>>> knowledgable about statistics in general and the bootstrap in particular.
>> 
>> Thanks, Hans.  We would be very interested in adding bootstrap methods to SciPy!
>> 
>> I might not get to it for a few days, but I'll take a look at your
>> library and see if it makes sense to incorporate it into SciPy.  If
>> anyone other SciPy devs can get to it sooner, please take a look!
> 
> that is excellent, thanks! The basic functionality is all there. We are currently working on refining the interface, the docs need more work, and we want to add more unit tests. Currently, the project is not at a quality-level fit for SciPy, but I am sure we can get there.
> 
> Best regards,
> Hans



More information about the SciPy-Dev mailing list