[SciPy-Dev] chi-square test for a contingency (R x C) table
Bruce Southey
bsouthey at gmail.com
Thu Jun 17 10:09:09 EDT 2010
On 06/16/2010 11:58 PM, Warren Weckesser wrote:
> The feedback in this thread inspired me to generalize my original code
> to the n-way test of independence. I have attached the revised code to
> a new ticket:
>
> http://projects.scipy.org/scipy/ticket/1203
>
> More feedback would be great!
>
> Warren
>
>
>
The handling for a one way table is wrong:
>>>print 'One way', chisquare_nway([6, 2])
(0.0, 1.0, 0, array([ 6., 2.]))
It should also do the marginal independence tests.
I would have expected the conversion of the input into an array in the
chisquare_nway function. If the input is is not an array, then there is
a potential bug waiting to happen because you expect numpy to correctly
compute the observed minus expected. For example, if the input is a list
then it relies on numpy doing a list minus a ndarray. It is also
inefficient in the sense that you have to convert the input twice (once
for the expected values and once for the observed minus expected
calculation. You can also get interesting errors with a string input
where the reason may not be obvious:
>>>print 'twoway', chisquare_nway([['6', '2'], ['4', '11']])
File "chisquare_nway.py", line 132, in chisquare_nway
chi2 = ((table - expected)**2 / expected).sum()
TypeError: unsupported operand type(s) for -: 'list' and 'numpy.ndarray'
I don't recall how np.asarray handles very large numbers but I would
also suggest an optional dtype argument instead of forcing float64 dtype:
"table = np.asarray(table, dtype=np.float64)"
In expected_nway(), you could prestore a variable with the 'range(d)'
although the saving is little for small tables.
Also, I would like to remove the usage of set() in the loop.
If k=2:
>>> list(set(range(d))-set([k]))
[0, 1, 3, 4]
>>> rd=range(5) #which would be outside the loop
>>> [ elem for elem in rd if elem != k ]
[0, 1, 3, 4]
Bruce
More information about the SciPy-Dev
mailing list