[SciPy-User] ks_2samp and searchsorted on concatenated array

josef.pktd at gmail.com josef.pktd at gmail.com
Sun May 22 15:52:30 EDT 2011


I was looking again at Kolmogorov-Smirnov and other gof tests

from ks_2samp: (data1, data2 are 1d)

    data1 = np.sort(data1)
    data2 = np.sort(data2)
    data_all = np.concatenate([data1,data2])
    cdf1 = np.searchsorted(data1,data_all,side='right')/(1.0*n1)
    cdf2 = (np.searchsorted(data2,data_all,side='right'))/(1.0*n2)
    d = np.max(np.absolute(cdf1-cdf2))

What does searchsorted do with an array that is the concatenation of
two sorted arrays?

I don't understand why data_all doesn't need to be sorted (after the
concatenation).

(I wrote this in 2008 just after learning about searchsorted, but the
MonteCarlos, that I did, looked good. And I didn't find a reference
why I did it this way.)

Bug or not? (maybe I'm just slow in thinking today)

Josef



More information about the SciPy-User mailing list