numpy/scipy: correlation
sturlamolden
sturlamolden at yahoo.no
Sun Nov 12 18:16:25 EST 2006
robert wrote:
> > t = r * sqrt( (n-2)/(1-r**2) )
> yet too lazy/practical for digging these things from there. You obviously got it - out of that, what would be a final estimate for an error range of r (n big) ?
> that same "const. * (1-r**2)/sqrt(n)" which I found in that other document ?
I gave you th formula. Solve for r and you get the confidence interval.
You will need to use the inverse cumulative Student t distribution.
Another quick-and-dirty solution is to use bootstrapping.
from numpy import mean, std, sum, sqrt, sort
from numpy.random import randint
def bootstrap_correlation(x,y):
idx = randint(len(x),size=(1000,len(x)))
bx = x[idx] # reasmples x with replacement
by = y[idx] # resamples y with replacement
mx = mean(bx,1)
my = mean(by,1)
sx = std(bx,1)
sy = std(by,1)
r = sort(sum( (bx - mx.repeat(len(x),0).reshape(bx.shape)) *
(by - my.repeat(len(y),0).reshape(by.shape)), 1) /
((len(x)-1)*sx*sy))
#bootstrap confidence interval (NB! biased)
return (r[25],r[975])
> My main concern is, how to respect the fact, that the (x,y) points may not distribute well along the regression line.
The bootstrap is "non-parametric" in the sense that it is distribution
free.
More information about the Python-list
mailing list