[SciPy-User] [SciPy-user] Two Sample Kolmogorov-Smirnov Test scipy vs R

amundell andrewhdmundell at gmail.com
Wed Dec 19 07:16:39 EST 2012


I am currently creating a statistical app where I am comparing my hypothesis
test results with R and Python (scipy) libraries. So far so good with most
test. However I have found a discrepancy with the R and Python results for
the Two-Sample Kolmogorov-Smirnov Tests. Below are data vectors I have been
using obviously formatted for both R ks.test and
scipy.stat.msstats.ks_twosamp methods.

sample1=[23.4, 30.9, 18.8, 23.0, 21.4, 1, 24.6, 23.8, 24.1, 18.7, 16.3,
20.3,
             14.9, 35.4, 21.6, 21.2, 21.0, 15.0, 15.6, 24.0, 34.6, 40.9,
30.7, 
             24.5, 16.6, 1, 21.7, 1, 23.6, 1, 25.7, 19.3, 46.9, 23.3, 21.8,
33.3, 
             24.9, 24.4, 1, 19.8, 17.2, 21.5, 25.5, 23.3, 18.6, 22.0, 29.8,
33.3,
             1, 21.3, 18.6, 26.8, 19.4, 21.1, 21.2, 20.5, 19.8, 26.3, 39.3,
21.4, 
             22.6, 1, 35.3, 7.0, 19.3, 21.3, 10.1, 20.2, 1, 36.2, 16.7,
21.1, 39.1,
             19.9, 32.1, 23.1, 21.8, 30.4, 19.62, 15.5]

sample2=[16.5, 1, 22.6, 25.3, 23.7, 1, 23.3, 23.9, 16.2, 23.0, 21.6, 10.8,
12.2,
             23.6, 10.1, 24.4, 16.4, 11.7, 17.7, 34.3, 24.3, 18.7, 27.5,
25.8, 22.5,
             14.2, 21.7, 1, 31.2, 13.8, 29.7, 23.1, 26.1, 25.1, 23.4, 21.7,
24.4, 13.2,
             22.1, 26.7, 22.7, 1, 18.2, 28.7, 29.1, 27.4, 22.3, 13.2, 22.5,
25.0, 1,
             6.6, 23.7, 23.5, 17.3, 24.6, 27.8, 29.7, 25.3, 19.9, 18.2,
26.2, 20.4,
             23.3, 26.7, 26.0, 1, 25.1, 33.1, 35.0, 25.3, 23.6, 23.2, 20.2,
24.7, 22.6,
            39.1, 26.5, 22.7]

Running the tests:
R: 
TT = ks.test(sample1, sample2)
TG = ks.test(sample1, sample2, alternative="greater")
TL = ks.test(sample1, sample2, alternative="less")

TT Result: D = 0.2204, p-value = 0.04205   alternative hypothesis: two-sided 
TG Result: D^+ = 0.2204, p-value = 0.02102  alternative hypothesis: the CDF
of x lies above that of y 
TL Result: D^- = 0.1242, p-value = 0.2933  alternative hypothesis: the CDF
of x lies below that of y 

Scipy:

TT=scipy.stats.mstats.ks_twosamp(sample1, sample2)
TU=scipy.stats.mstats.ks_twosamp(sample1, sample2, alternative='greater')
TL=scipy.stats.mstats.ks_twosamp(sample1, sample2, alternative='less')

TT Result: D= 0.220411392405, p-value= 0.0420492678738 
TU Result: D= 0.124208860759 p-value: 0.293327703926 
TL Result: D=: 0.220411392405, p-value: 0.0210248293393 

So as it can be seen from the results the one tailed upper and lower values
seemed to be reversed. In my app my results were more consistent with R's.
Am I missing something obvious here i.e. with definitions? or is there
potentially a bug in the scipy code?
Any help will be much appreciated. Cheers.




-- 
View this message in context: http://old.nabble.com/Two-Sample-Kolmogorov-Smirnov-Test-scipy-vs-R-tp34814758p34814758.html
Sent from the Scipy-User mailing list archive at Nabble.com.




More information about the SciPy-User mailing list