[SciPy-Dev] Bootstrap confidence limits code

Wed Aug 8 14:38:04 EDT 2012

Hello everyone,

A few years ago I implemented a scikit for bootstrap confidence limits
(https://github.com/cgevans/scikits-bootstrap). I didn’t think much
about it after that until recently, when I realized that some people
are actually using it, and that there’s apparently been some talk
about implementing this functionality in either scipy.stats or
statsmodels (I should thank Randal Olson for discussing this and
bringing it to my attention).

As such I’ve rewritten most of the code, and written up some
docstrings. The current code can do confidence intervals with basic
percentile interval, bias-corrected accelerated, and approximate
bootstrap confidence methods, and can also provide bootstrap and
jackknife indexes. Most of it is implemented from the descriptions in
Efron and Tibshirani’s Introduction to the Bootstrap, but the ABC code
at the moment is a port from the modified-BSD-licensed bootstrap
package for R (not the boot package) as I’m not entirely confident in
my understanding of the method.

And so, I have a few questions for everyone:

* Is there any interest in including this sort of code in either
scipy.stats or statsmodels? If so, where do people think would be the
better place? The code is relatively small; at the moment it is less
than 200 lines, with docstrings probably making up 100 of those lines.
* Also, if so, what would need to be changed, added, and improved
beyond what is mentioned in the Contributing to Scipy part of the
reference guide? I’m never a fan of my own code, and imagine quite a
bit would need to be fixed; I know tests will need to be added too.

In addition, I have a few questions about what would be better
practice for the API, and I haven’t really found a guide on best
practices for Scipy:

* When I started writing the code, I wrote a single function ci for
confidence intervals, with a method argument to choose the method.
This is easy for users, especially so that they don’t have to look
through documentation to realize that BCA is the most generally useful
method (at least from everything I’ve read) and that there really
isn’t any reason to use many of the simpler methods. However, ABC
takes different paramenters, and needs a statistic function that takes
weights, which makes this single-function organization trickier. At
the moment, I have a separate function for ABC. Would it be better to
split up all the methods to their own functions?
* ABC requires a statistic function that takes weights. I’ve noticed
that things like np.average takes a weights= argument. Would it be
better to require input of a stat(data,weights) function, or input of
a stat(data,weights=) with weights as a named argument? The latter
would be nice in terms of allowing the same function to be used for
all methods, but would make it impossible to use a lambda for the
function. Is there some other method of doing this entirely?
* Are there any missing features that anyone thinks should be added?

I apologize if much of this is answered elsewhere, I just haven’t
found any of it; I also apologize if this is far too long-winded and
confusing!

Regards,
Constantine Evans