[SciPy-Dev] chi-square test for a contingency (R x C) table

Bruce Southey bsouthey at gmail.com
Thu Jun 17 10:09:09 EDT 2010


On 06/16/2010 11:58 PM, Warren Weckesser wrote:
> The feedback in this thread inspired me to generalize my original code
> to the n-way test of independence.  I have attached the revised code to
> a new ticket:
>
>      http://projects.scipy.org/scipy/ticket/1203
>
> More feedback would be great!
>
> Warren
>
>
>    
The handling for a one way table is wrong:
 >>>print 'One way', chisquare_nway([6, 2])
(0.0, 1.0, 0, array([ 6.,  2.]))

It should also do the marginal independence tests.

I would have expected the conversion of the input into an array in the 
chisquare_nway function.  If the input is is not an array, then there is 
a potential bug waiting to happen because you expect numpy to correctly 
compute the observed minus expected. For example, if the input is a list 
then it relies on numpy doing a list minus a ndarray.  It is also 
inefficient in the sense that you have to convert the input twice (once 
for the expected values and once for the observed minus expected 
calculation. You can also get interesting errors with a string input 
where the reason may not be obvious:

 >>>print 'twoway', chisquare_nway([['6', '2'], ['4', '11']])
   File "chisquare_nway.py", line 132, in chisquare_nway
     chi2 = ((table - expected)**2 / expected).sum()
TypeError: unsupported operand type(s) for -: 'list' and 'numpy.ndarray'

I don't recall how np.asarray handles very large numbers but I would 
also suggest an optional dtype argument instead of forcing float64 dtype:
"table = np.asarray(table, dtype=np.float64)"

In expected_nway(), you could prestore a variable with the  'range(d)' 
although the saving is little for small tables.
Also, I would like to remove the usage of set() in the loop.
If k=2:

 >>> list(set(range(d))-set([k]))
[0, 1, 3, 4]
 >>> rd=range(5) #which would be outside the loop
 >>> [ elem for elem in rd if elem != k ]
[0, 1, 3, 4]

Bruce








More information about the SciPy-Dev mailing list