[SciPy-User] stats.chisquare issues
Bruce Southey
bsouthey at gmail.com
Mon Sep 27 15:41:36 EDT 2010
On 09/26/2010 03:17 PM, josef.pktd at gmail.com wrote:
> On Sun, Sep 26, 2010 at 3:02 PM, Gökhan Sever<gokhansever at gmail.com> wrote:
>> Hello,
>> Consider these examples:
>> I[35]: np.histogram(ydata, bins=6)
>> O[35]:
>> (array([4, 1, 3, 0, 0, 1]),
>> array([ 2.8 , 146.33333333, 289.86666667, 433.4 ,
>> 576.93333333, 720.46666667, 864. ]))
>> I[36]: np.histogram(ypred, bins=6)
>> O[36]:
>> (array([4, 2, 2, 0, 0, 1]),
>> array([ 22.08895 , 166.34439167, 310.59983333, 454.855275 ,
>> 599.11071667, 743.36615833, 887.6216 ]))
>> I[45]: stats.chisquare([4, 1, 3, 0, 0, 1], [4, 2, 2, 0, 0,
>> 1])---------------------------------------------------------------------------
>> AttributeError Traceback (most recent call last)
>> /home/gsever/Desktop/<ipython console> in<module>()
>> /usr/lib/python2.6/site-packages/scipy/stats/stats.pyc in chisquare(f_obs,
>> f_exp, ddof)
>> 2516 if f_exp is None:
>> 2517 f_exp = array([np.sum(f_obs,axis=0)/float(k)] *
>> len(f_obs),float)
>> -> 2518 f_exp = f_exp.astype(float)
>> 2519 chisq = np.add.reduce((f_obs-f_exp)**2 / f_exp)
>> 2520 return chisq, chisqprob(chisq, k-1-ddof)
>> AttributeError: 'list' object has no attribute 'astype'
>> Here, I expect any scipy function including chisquare should be able to
>> handle lists???
>> ############################################
>> This one throws:
>> I[46]: stats.chisquare(np.array([4, 1, 3, 0, 0, 1]), np.array([4, 2, 2, 0,
>> 0, 1]))
>> O[46]: (nan, nan)
>> again I should be aware since the division has 0 in it.
>> after masking:
>> I[47]: a1 = np.ma.masked_equal([4,1,3,0,0,1], 0)
>> I[48]: a2 = np.ma.masked_equal([4,2,2,0,0,1], 0)
>> Further,
>> I[49]: stats.chisquare(a1, a2)
>> O[49]: (1.0, 0.96256577324729631)
>> I[50]: stats.mstats.chisquare(a1, a2)
>> O[50]: (1.0, 0.80125195690120077)
> masking doesn't remove the values, so when you have a masked array,
> then you should use compressed or similar
>
> dropping the zero bins
You should use the masked version of chisquare() in mstats for masked
array inputs. However, hiding zeros is not correct unless both observed
and expected equal zero.
>>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 2, 1.]))
> (1.0, 0.80125195690120077)
>
> Not accepting list is a bug
It is not a bug because the docstring says arrays not array-like.
> Returning nans in the case when you expect zero in a bin might be by
> design. But we need to check this.
>
>>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 0, 1.]))
> (inf, nan)
This is correct since the expected value for a cell is zero (results in
division by zero). You can not use the chi-square test in this
situation. You might be able to get the fisher exact test (see ticket
956 http://projects.scipy.org/scipy/ticket/956) to work here.
If you are doing something like density estimation then you probably
need to select your bins (especially in the tails) more carefully to
avoid this from happening.
Bruce
More information about the SciPy-User
mailing list