numpy/scipy: correlation

Sun Nov 12 13:37:34 EST 2006

sturlamolden wrote:
> First, are you talking about rounding error (due to floating point
> arithmetics) or statistical sampling error?

About measured data. rounding and sampling errors with special distrutions are neglegible. Thus by default assuming gaussian noise in x and y. 
(This may explain that factor of ~0.7 in the rectangle M.C. test)
The (x,y) points may not distribute "nicely" along the assumed regression diagonale.

> If you are talking about the latter, I suggest you look it up in a
> statistics text book. E.g. if x and y are normally distributed, then
> 
> t = r * sqrt( (n-2)/(1-r**2) )
> 
> has a Student t-distribution with n-2 degrees of freedom. And if you
> don't know how to get the p-value from that, you should not be messing
> with statistics anyway.

yet too lazy/practical for digging these things from there. You obviously got it - out of that, what would be a final estimate for an error range of r (n big) ?   
that same "const. * (1-r**2)/sqrt(n)" which I found in that other document ?

The const. ~1 is less the problem. 

My main concern is, how to respect the fact, that the (x,y) points may not distribute well along the regression line. E.g. due to the nature of the experiment more points are around (0,0) but only few distribute along the interesting part of the diagonale and thus few points have great effect on m & r. above formulas will possibly not respect that.
I could try a weighting technique, but maybe there is a (commonly used) speedy formula for r/r_err respecting that directly?

Robert