[SciPy-user] Bootstrap?

Joshua Stults joshua.stults at gmail.com
Tue Jul 7 22:07:15 EDT 2009


Joe,

Thanks for the tip.

On Tue, Jul 7, 2009 at 9:36 PM, Joe Harrington<jh at physics.ucf.edu> wrote:
> On Tue, Jul 7, 2009 at 6:28 AM, Joshua Stults<joshua.stults at gmail.com> wrote:
>
>> I was wondering if scipy had something similar to Octave/Matlab's
>> empricial_rnd(). ?Here's the blurb from Octave's help describing the
>> function:
>>
>> ?-- Function File: ?empirical_rnd (N, DATA)
>> ?-- Function File: ?empirical_rnd (DATA, R, C)
>> ?-- Function File: ?empirical_rnd (DATA, SZ)
>> ? ? Generate a bootstrap sample of size N from the empirical
>> ? ? distribution obtained from the univariate sample DATA.
>>
>> ? ? If R and C are given create a matrix with R rows and C columns. Or
>> ? ? if SZ is a vector, create a matrix of size SZ.
>>
>> So basically you pass it an array of data, and it returns bootstrap
>> samples (resampling from the array with replacement).
>>
>
> Be very careful and be certain you can derive the statistical
> justification for what you are doing when you use bootstrap.  There
> are numerous cases in which bootstrapping will not give you the right
> answer, such as when fitting a function that has a parameter that is
> set in just a small subset of the data, because in some samples the
> subset may be omitted completely or in large part, admitting wildly
> wrong parameter values.

I was doing a toy problem with 0-1 data (1=success, 0=failure),
estimating a reliability.  So my statistic was just:
sum(bootstrap_sample) / n.

Does your criticism apply to bootstrapping the residuals too?  I'd
appreciate if you could point me towards any accessible (I'm not a
statistician) references.

> While you didn't specify exactly what you are
> trying to do, for many problems Markov-Chain Monte Carlo is both
> better and faster, and is often easier to code.  Plus, there is Python
> for it (pymc, I think).

Could you give an example where it's easier to code an MCMC method?
Doing a bootstrap is one or two lines of code in most high level
languages (eg Matlab/Octave), and turns out Python too using the
random indexing method that Josef and Ernest posted (of course you
have to put it in an interpreted loop, which is not very scalable).

>
> --jh--
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Thanks again, I've been consistently impressed by the quality of
responses on this list.

-- 
Joshua Stults
Website: http://j-stults.blogspot.com



More information about the SciPy-User mailing list