[SciPy-dev] stats - kstest

Mon Jul 19 12:25:32 EDT 2004

Robert Kern wrote:

> Manuel Metz wrote:
>
>> Hi,
>> hopefully I'm at the right place to manifest my suggestion.
>>
>> As far as I understand the "kstest" from the book "Numerical recipes 
>> in C++" (Chapt. 14.3, Kolmogorov-Smirnov Test) the kstest algorithm 
>> is not correctly implementet in SciPy. (or NR ?) I think the error is 
>> in the second last line of kstest():
>>
>>  >>> D = max(abs(cdfvals - sb.arange(1.0,N+1)/N))
>>
>> In comparison from NR:
>>
>>  >>> double en = data.size()
>>  >>> for( j=0; j<n; j++) {
>>  >>>     fn = (j+1)/en;
>>  >>>     ff = func( data[j] );
>>  >>>     dt = max( fabs(fo-ff), fabs(fn-ff)
>>  >>>     if (dt > d) d=dt;
>>  >>>     fo = fn;
>>  >>> }
>>
>> So the main difference is, that in the NR algorithm the "D" is 
>> calculated as the maximum distance D = max |S_N(x) - P(x)| by 
>> calculating the distances to the upper AND the lower side of P(X) to 
>> the step function S_N(x), while in the SciPy routine only the 
>> distance to the upper side is calculated.
>>
>> Is my suggestion right, that the error is in the SciPy algorithm? If 
>> yes, could anyone correct it with the next release of SciPy?
>
>
> Yes, I believe you are correct.
>
I reviewed what was done again and now believe we were correct.  The 
distribution that is being used in kstest is the Kolmogorov one-sided 
distribution, KS+   Because this is the distribution used, the test is 
done with a one-sided statistic.

SciPy only has an approximate two-sided statistic which is valid for 
large N.  We do not have it wrapped in a kstest-like command, but the 
distribution is available as kstwobign. 

We could modify kstest or make a new command for the two-sided test.

Questions and/or comments welcome.

-Travis O.