[SciPy-User] stats.chisquare issues
josef.pktd at gmail.com
josef.pktd at gmail.com
Sun Sep 26 16:17:16 EDT 2010
On Sun, Sep 26, 2010 at 3:02 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
> Hello,
> Consider these examples:
> I[35]: np.histogram(ydata, bins=6)
> O[35]:
> (array([4, 1, 3, 0, 0, 1]),
> array([ 2.8 , 146.33333333, 289.86666667, 433.4 ,
> 576.93333333, 720.46666667, 864. ]))
> I[36]: np.histogram(ypred, bins=6)
> O[36]:
> (array([4, 2, 2, 0, 0, 1]),
> array([ 22.08895 , 166.34439167, 310.59983333, 454.855275 ,
> 599.11071667, 743.36615833, 887.6216 ]))
> I[45]: stats.chisquare([4, 1, 3, 0, 0, 1], [4, 2, 2, 0, 0,
> 1])---------------------------------------------------------------------------
> AttributeError Traceback (most recent call last)
> /home/gsever/Desktop/<ipython console> in <module>()
> /usr/lib/python2.6/site-packages/scipy/stats/stats.pyc in chisquare(f_obs,
> f_exp, ddof)
> 2516 if f_exp is None:
> 2517 f_exp = array([np.sum(f_obs,axis=0)/float(k)] *
> len(f_obs),float)
> -> 2518 f_exp = f_exp.astype(float)
> 2519 chisq = np.add.reduce((f_obs-f_exp)**2 / f_exp)
> 2520 return chisq, chisqprob(chisq, k-1-ddof)
> AttributeError: 'list' object has no attribute 'astype'
> Here, I expect any scipy function including chisquare should be able to
> handle lists???
> ############################################
> This one throws:
> I[46]: stats.chisquare(np.array([4, 1, 3, 0, 0, 1]), np.array([4, 2, 2, 0,
> 0, 1]))
> O[46]: (nan, nan)
> again I should be aware since the division has 0 in it.
> after masking:
> I[47]: a1 = np.ma.masked_equal([4,1,3,0,0,1], 0)
> I[48]: a2 = np.ma.masked_equal([4,2,2,0,0,1], 0)
> Further,
> I[49]: stats.chisquare(a1, a2)
> O[49]: (1.0, 0.96256577324729631)
> I[50]: stats.mstats.chisquare(a1, a2)
> O[50]: (1.0, 0.80125195690120077)
masking doesn't remove the values, so when you have a masked array,
then you should use compressed or similar
dropping the zero bins
>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 2, 1.]))
(1.0, 0.80125195690120077)
Not accepting list is a bug
Returning nans in the case when you expect zero in a bin might be by
design. But we need to check this.
>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 0, 1.]))
(inf, nan)
>>> stats.chisquare(np.array([4, 0, 3, 1.]),np.array([4, 2, 2, 1.]))
(2.5, 0.4752910833430205)
For chisquare test the standard recommendation is that you need at
least 5 expected observations in each bin, otherwise you should
combine bins.
Josef
> p-values differ, is this expected?
> Thanks.
>
> --
> Gökhan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
More information about the SciPy-User
mailing list