[SciPy-user] kstest and scipy.stats
Robin
robince at gmail.com
Thu Nov 20 11:56:45 EST 2008
Hi,
I am having trouble using kstest and the scipy.stats package which I
suspect is due to a misunderstanding.
Basically I'm confused by the below:
O is an array of observed (integer) values:
In [344]: O.shape
Out[344]: (1400,)
In [345]: O.max()
Out[345]: 21
In [346]: O.min()
Out[346]: 0
Now I am trying to use the kstest to determine how closely they
described this vector of data. But I was getting low values with
kstest (always p of zero - even when plotting the distributions shows
that by eye they are a very good fit).
But the thing that really confuses me is this:
In [337]: kstest(O,
stats.rv_discrete(name='test',values=(r_[0:25],prob(O,25))).cdf)
Out[337]: (0.31071428571428572, 0.0)
Prob is a small function of mine that returns a probability vector
from a vector of integers (shown below - I have been using it for ages
and I'm sure there is no mistake there). rv_discrete seems to
construct the right distribution (mean and so on match) - so how come
the p value is 0, when I am comparing to the distribution directly
sampled from the data?
Any help greatfully appreciated,
Robin
----
Source:
def prob(x, r):
"""Sample probabity of integer sequence using bincount
Inputs:
x - integer sequence
r - number of possible responses (max(x)<r)
Returns full probability vector (float)
"""
if (not np.issubdtype(x.dtype, np.int)):
raise ValueError, "Input must be of integer type"
P = np.bincount(x).astype(np.float)
n = P.size
if n < r: # resize if any responses missed
P.resize((r,))
P[n:]=0
P /= x.size
return P
More information about the SciPy-User
mailing list