[SciPy-dev] stats - kstest

Manuel Metz mmetz at astro.uni-bonn.de
Fri Jul 16 10:37:06 EDT 2004


Hi,
hopefully I'm at the right place to manifest my suggestion.

As far as I understand the "kstest" from the book "Numerical recipes in 
C++" (Chapt. 14.3, Kolmogorov-Smirnov Test) the kstest algorithm is not 
correctly implementet in SciPy. (or NR ?) I think the error is in the 
second last line of kstest():

 >>> D = max(abs(cdfvals - sb.arange(1.0,N+1)/N))

In comparison from NR:

 >>> double en = data.size()
 >>> for( j=0; j<n; j++) {
 >>> 	fn = (j+1)/en;
 >>> 	ff = func( data[j] );
 >>> 	dt = max( fabs(fo-ff), fabs(fn-ff)
 >>> 	if (dt > d) d=dt;
 >>> 	fo = fn;
 >>> }

So the main difference is, that in the NR algorithm the "D" is 
calculated as the maximum distance D = max |S_N(x) - P(x)| by 
calculating the distances to the upper AND the lower side of P(X) to the 
step function S_N(x), while in the SciPy routine only the distance to 
the upper side is calculated.

Is my suggestion right, that the error is in the SciPy algorithm? If 
yes, could anyone correct it with the next release of SciPy?

Manuel




More information about the SciPy-Dev mailing list