[SciPy-User] stats.chisquare issues

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Sep 26 16:17:16 EDT 2010


On Sun, Sep 26, 2010 at 3:02 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
> Hello,
> Consider these examples:
> I[35]: np.histogram(ydata, bins=6)
> O[35]:
> (array([4, 1, 3, 0, 0, 1]),
>  array([   2.8       ,  146.33333333,  289.86666667,  433.4       ,
>         576.93333333,  720.46666667,  864.        ]))
> I[36]: np.histogram(ypred, bins=6)
> O[36]:
> (array([4, 2, 2, 0, 0, 1]),
>  array([  22.08895   ,  166.34439167,  310.59983333,  454.855275  ,
>         599.11071667,  743.36615833,  887.6216    ]))
> I[45]: stats.chisquare([4, 1, 3, 0, 0, 1], [4, 2, 2, 0, 0,
> 1])---------------------------------------------------------------------------
> AttributeError                            Traceback (most recent call last)
> /home/gsever/Desktop/<ipython console> in <module>()
> /usr/lib/python2.6/site-packages/scipy/stats/stats.pyc in chisquare(f_obs,
> f_exp, ddof)
>    2516     if f_exp is None:
>    2517         f_exp = array([np.sum(f_obs,axis=0)/float(k)] *
> len(f_obs),float)
> -> 2518     f_exp = f_exp.astype(float)
>    2519     chisq = np.add.reduce((f_obs-f_exp)**2 / f_exp)
>    2520     return chisq, chisqprob(chisq, k-1-ddof)
> AttributeError: 'list' object has no attribute 'astype'
> Here, I expect any scipy function including chisquare should be able to
> handle lists???
> ############################################
> This one throws:
> I[46]: stats.chisquare(np.array([4, 1, 3, 0, 0, 1]), np.array([4, 2, 2, 0,
> 0, 1]))
> O[46]: (nan, nan)
> again I should be aware since the division has 0 in it.
> after masking:
> I[47]: a1 = np.ma.masked_equal([4,1,3,0,0,1], 0)
> I[48]: a2 = np.ma.masked_equal([4,2,2,0,0,1], 0)
> Further,
> I[49]: stats.chisquare(a1, a2)
> O[49]: (1.0, 0.96256577324729631)
> I[50]: stats.mstats.chisquare(a1, a2)
> O[50]: (1.0, 0.80125195690120077)

masking doesn't remove the values, so when you have a masked array,
then you should use compressed or similar

dropping the zero bins
>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 2, 1.]))
(1.0, 0.80125195690120077)

Not accepting list is a bug

Returning nans in the case when you expect  zero in a bin might be by
design. But we need to check this.

>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 0, 1.]))
(inf, nan)
>>> stats.chisquare(np.array([4, 0, 3, 1.]),np.array([4, 2, 2, 1.]))
(2.5, 0.4752910833430205)

For chisquare test the standard recommendation is that you need at
least 5 expected observations in each bin, otherwise you should
combine bins.

Josef


> p-values differ, is this expected?
> Thanks.
>
> --
> Gökhan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>



More information about the SciPy-User mailing list