[SciPy-User] ks_2samp is not giving the same results as ks.test in R

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Nov 1 21:42:03 EDT 2012


On Thu, Nov 1, 2012 at 9:41 PM,  <josef.pktd at gmail.com> wrote:
> On Thu, Nov 1, 2012 at 9:14 PM,  <josef.pktd at gmail.com> wrote:
>> On Thu, Nov 1, 2012 at 8:28 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> Hi,
>>>
>>> The ks_2samp function does not give the same answer as ks.test in R.
>>> Does anybody know why they are different? Is ks_2samp compute
>>> something different?
>>>
>>> helium:~/linux/test/python/man/library/scipy/stats/ks_2samp$ Rscript main.R
>>>> ks.test(1:5, 11:15)
>>>
>>>         Two-sample Kolmogorov-Smirnov test
>>>
>>> data:  1:5 and 11:15
>>> D = 1, p-value = 0.007937
>>> alternative hypothesis: two-sided
>>>
>>>> ks.test(1:5, 11:15, alternative='less')
>>>
>>>         Two-sample Kolmogorov-Smirnov test
>>>
>>> data:  1:5 and 11:15
>>> D^- = 0, p-value = 1
>>> alternative hypothesis: the CDF of x lies below that of y
>>>
>>>> ks.test(1:5, 11:15, alternative='greater')
>>>
>>>         Two-sample Kolmogorov-Smirnov test
>>>
>>> data:  1:5 and 11:15
>>> D^+ = 1, p-value = 0.006738
>>> alternative hypothesis: the CDF of x lies above that of y
>>>
>>>>
>>>>
>>> helium:~/linux/test/python/man/library/scipy/stats/ks_2samp$ ./main.py
>>> (1.0, 0.0037813540593701006)
>>> helium:~/linux/test/python/man/library/scipy/stats/ks_2samp$ cat main.py
>>> #!/usr/bin/env python
>>>
>>> from scipy.stats import ks_2samp
>>> print ks_2samp([1,2,3,4,5], [11,12,13,14,15])
>>
>> R uses by default an "exact" distribution for small samples if there
>> are no ties.
>> If there are ties or with a large sample, R uses the asymptotic distribution.
>>
>> If I read the function correctly, then scipy.stats is using a small
>> sample approximation by Stephens. (But I would have to look up the
>> formula to verify this.)
>
> http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test
> has the weighted sample size: en = np.sqrt(n1*n2/float(n1+n2))
> the small sample weighting ((en+0.12+0.11/en)*d) is the same as in
> Stephens (1970, 1985?) for the one sample test.
> I don't have a reference for the two sample approximation right now.
>
> (another bit of random information)
> tables are often only available for 0.01 to 0.25 and approximations
(hit send too fast)  0.001 to 0.25
> are targeted on that range and might not be as accurate outside of it
>
> Josef
>
>
>>
>> In the example below with a bit larger sample and no ties, our
>> approximation is closer to R's "exact" pvalue than the asymptotic
>> distribution if exact=FALSE.
>>
>>>  ks.test(1:25, (10:30)-0.5, exact=FALSE)
>>
>>         Two-sample Kolmogorov-Smirnov test
>>
>> data:  1:25 and (10:30) - 0.5
>> D = 0.36, p-value = 0.1038
>> alternative hypothesis: two-sided
>>
>>>  ks.test(1:25, (10:30)-0.5, exact=TRUE)
>>
>>         Two-sample Kolmogorov-Smirnov test
>>
>> data:  1:25 and (10:30) - 0.5
>> D = 0.36, p-value = 0.07608
>> alternative hypothesis: two-sided
>>
>>
>>>>> stats.ks_2samp(np.arange(1.,26), np.arange(10,31.)-0.5)
>> (0.35999999999999999, 0.078993426961291274)
>>
>>
>> For the 1 sample kstest I used (when I rewrote stats.kstest) an
>> approximation that is closer to the exact distribution than the
>> asymptotic distribution, but it's also not exact.
>>
>> It would be good to have better small sample approximations or exact
>> distributions, but I worked on this in scipy.stats when I barely had
>> any idea about goodness-of-fit tests.
>> Also, ks_2samp never got the enhancement for one-sided alternatives.
>> (In statsmodels I have been working so far only on one sample tests,
>> but not on two-sample tests.)
>>
>> (I don't remember if there is a minimum size recommendation, but the
>> examples I usually checked were larger.)
>
> matlab help: http://www.mathworks.com/help/stats/kstest2.html
> "The asymptotic p value becomes very accurate for large sample sizes,
> and is believed to be reasonably accurate for sample sizes n1 and n2
> such that (n1*n2)/(n1 + n2) >= 4."
>>
>>
>> since it's a community project: Pull Request are welcome
>>
>> Josef
>>
>>>
>>>
>>> --
>>> Regards,
>>> Peng
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list