[SciPy-user] Usage of scipy KS test

Wed Jan 2 15:15:25 EST 2008

On 02/01/2008, Alexander Dietz <Alexander.Dietz at astro.cf.ac.uk> wrote:

> On Jan 2, 2008 5:08 PM, Anne Archibald <peridot.faceted at gmail.com > wro

> > scipy.stats.kstest(x,dict(zip(x,m)).get)

> When I use your suggestion, I get an error:
>
>  File
> "/usr/lib/python2.4/site-packages/scipy/stats/stats.py",
> line 1716, in kstest
>     cdfvals = cdf(vals, *args)
> TypeError: unhashable type
>
> I tried with get(), but this also did not work.  Also, in this example I do
> not see the vector 'm' containing the modeled values. They must enter
> somehow the expression....

Well, if x is the list of x values (floats) and m is the list of CDF
values (also floats), then zip(x,m) is the list of pairs (x, CDF(x)).
If you have arrays, you might need to convert them to lists first
(x=list(x) for example). dict(zip(x,m)) makes a dictionary out of such
a list of pairs. dict(zip(x,m)).get is a function that maps xs to ms.
Unfortunately it only maps a single x to a single m; you need to use
numpy.vectorize on it:

scipy.stats.kstest(x,numpy.vectorize(dict(zip(x,m)).get))

numpy.vectorize makes it able to map an array of xs to an array of ms.
That should work. But if you can, you should give kstest your real
CDF-calculating function (possibly wrapped in numpy.vectorize, if it
doesn't work on arrays).

> Assumed, I calculate the D-value by myself. Can I then use stats.ksprob to
> calculate the probability? Do I have to use sqrt(n)*D as argument?

I'm not sure what ksprob wants. It will really be clearer to use kstest.

I should warn you, if your probability distribution is not continuous
- like, for example, a Poisson distribution - kstest will not work.

Anne