[SciPy-Dev] chi-square test for a contingency (R x C) table
Warren Weckesser
warren.weckesser at enthought.com
Thu Jun 17 10:41:21 EDT 2010
Bruce Southey wrote:
> On 06/16/2010 11:58 PM, Warren Weckesser wrote:
>
>> The feedback in this thread inspired me to generalize my original code
>> to the n-way test of independence. I have attached the revised code to
>> a new ticket:
>>
>> http://projects.scipy.org/scipy/ticket/1203
>>
>> More feedback would be great!
>>
>> Warren
>>
>>
>>
>>
> The handling for a one way table is wrong:
> >>>print 'One way', chisquare_nway([6, 2])
> (0.0, 1.0, 0, array([ 6., 2.]))
>
> It should also do the marginal independence tests.
>
As I explained in the description of the ticket and in the docstring,
this function is not intended for doing the 'one-way' goodness of fit.
stats.chisquare should be used for that. Calling chisquare_nway with a
1D array amounts to doing a test of independence between groupings but
only giving a single grouping, hence the trivial result. This is
intentional.
I guess the question is: should there be a "clever" chi-square function
that figures out what the user probably wants to do?
> I would have expected the conversion of the input into an array in the
> chisquare_nway function. If the input is is not an array, then there is
> a potential bug waiting to happen because you expect numpy to correctly
> compute the observed minus expected. For example, if the input is a list
> then it relies on numpy doing a list minus a ndarray. It is also
> inefficient in the sense that you have to convert the input twice (once
> for the expected values and once for the observed minus expected
> calculation.
I was going to put in something like table = np.asarray(table), but then
I noticed that, since `expected` had already been converted to an array,
the calculation worked even if `table` was a list. E.g.
In [4]: chisquare_nway([[10,10],[5,25]])
Out[4]:
(6.3492063492063489,
0.011743382301172606,
1,
array([[ 6., 14.],
[ 9., 21.]]))
But I will put in the conversion--that will make it easier to do a few
other sanity checks on the input before trying to do any calculations.
> You can also get interesting errors with a string input
> where the reason may not be obvious:
>
> >>>print 'twoway', chisquare_nway([['6', '2'], ['4', '11']])
> File "chisquare_nway.py", line 132, in chisquare_nway
> chi2 = ((table - expected)**2 / expected).sum()
> TypeError: unsupported operand type(s) for -: 'list' and 'numpy.ndarray'
>
>
> I don't recall how np.asarray handles very large numbers but I would
> also suggest an optional dtype argument instead of forcing float64 dtype:
> "table = np.asarray(table, dtype=np.float64)"
>
>
Sure, I can add that.
> In expected_nway(), you could prestore a variable with the 'range(d)'
> although the saving is little for small tables.
> Also, I would like to remove the usage of set() in the loop.
> If k=2:
>
> >>> list(set(range(d))-set([k]))
> [0, 1, 3, 4]
> >>> rd=range(5) #which would be outside the loop
> >>> [ elem for elem in rd if elem != k ]
> [0, 1, 3, 4]
>
>
Looks good--I'll make that change.
> Bruce
>
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
More information about the SciPy-Dev
mailing list