[SciPy-Dev] chi-square test for a contingency (R x C) table

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jun 7 10:15:56 EDT 2010


On Fri, Jun 4, 2010 at 2:12 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> On 06/03/2010 08:27 AM, Warren Weckesser wrote:
>>>
>>> Just letting you know that I'm not ignoring all the great comments from
>>> josef, Neil and Bruce about my suggestion for chisquare_contingency.
>>> Unfortunately, I won't have time to think about all the deeper
>>> suggestions for another week or so.   For now, I'll just say that I
>>> agree with josef's and Neil's suggestions for the docstring, and that
>>> Neil's summary of the function as simply a convenience function that
>>> calls stats.chisquare with appropriate arguments to perform a test of
>>> independence on a contingency table is exactly what I had in mind.
>>>
>>> Warren
>>>
>>>
>>>
>>
>> Hi,
>> I looked at how SAS handles n-way tables. What it appears to do is break the
>> original table down into a set of 2-way tables and does the analysis on each
>> of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the
>> results of each 4 by 5 table presented. I do not know how Stata and R
>> analysis analyze n-way tables.
>>
>> Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way
>> tables by using recursion. There should be some Python way to do that
>> recursion for any number of dimensions. I also added the 1-way table (but
>> that has a different hypothesis than the 2-way table) so users can send a
>> 1-d table.
>
> (very briefly because I don't have much time today)
>
> I think, these are good extensions, but to handle all cases, the
> function is getting too large and would need several options.
>
> On your code and SAS, Z(correct me if my quick reading is wrong)
> You seem to be calculating conditional independence for the last two
> variables conditional on the values of the first variables. I think
> this could be generalized to all pairwise independence tests.
>
> Similar, I'm a bit surprised that SAS uses conditional and not
> marginal independence, I would have thought that the test for marginal
> independence (aggregate out all but 2 variables) would be the more
> common use case.

just some more questions and comments (until I have time to check this)

looking at conditional independence looks similar to linear regression
models, where the effect of other variables is taken out. However,
looking at all chisquare tests (conditional on all possible other
values) runs into the multiple test problem. Is the some kind of
post-hoc or Bonferroni correction or is there a distribution for eg.
the max of all chisquare test statistics.

with an iterator (numpy mailinglist), my version for the conditional
independence of the last two variables for all values of the earlier
variables looks like

for ind in allbut2ax_iterator(table3, axes=(-2,-1)):
    print chisquare_contingency(table3[ind])

Josef

>
> Initially, I was thinking just about independence of all variables in
> a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z)
>
> My opinion is that these variations of tests would fit better in a
> class where all pairwise conditional, and marginal and joint
> hypotheses can be supplied as methods, or split it up into a group of
> functions.
>
> Thanks,
>
> Josef
>
>>
>> The data used is from two SAS examples and I added a dimension to get a
>> 4-way table. I included the SAS values but these are only to 4 decimal
>> places for reference.
>>
>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm
>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm
>>
>> What is missing:
>> 1) Docstring and tests but those are dependent what is ultimately decided
>> 2) Other test statistics but scipy.stats versions are not very friendly in
>> that these do not accept a 2-d array
>> 3) A way to do recursion
>> 4) Ability to label the levels etc.
>> 5) Correct handling of input types.
>>
>> Bruce
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>



More information about the SciPy-Dev mailing list