[SciPy-Dev] chi-square test for a contingency (R x C) table

Warren Weckesser warren.weckesser at enthought.com
Sat Jun 19 09:26:39 EDT 2010


josef.pktd at gmail.com wrote:
> <snip>
>   
> Forget any merging of the functions.
>
> Statistical functions should also be defined by their purpose, we are
> not creating universal f_tests and t_tests. Unless someone is
> proposing the merge and unify various t_tests, ... ?
> misquoting: "The user's hypothesis is totally irrelevant ..." ???
>
> Testing for goodness-of-fit is a completely different use case, with
> different extensions, e.g. power discrepancy. What if I have a 2d
> array and want to check goodness-of-fit along each axis, which might
> be useful once group-by extensions to bincount handle more than 1d
> weights.


So you are anticipating something like this (where `obs` is, say, 2D):

 >>> chisquare_fit(table, axis=-1)

Then the result would also be 2D, with the last axis having length 2 and 
holding the (chi2, p) values?

>  Or if we extend it to multivariate distributions, then the
> default might be uniform for each column (and not independence.)
> This is a standard test for distributions, and should not be mixed
> with contingency tables
>
>   

Could you elaborate on this use case?  I don't know enough about it to 
be able to decide if this is something that could be implemented right 
away, or if it is something that might not happen for years, if ever.


> contingency tables are a different case, which I never use, and where
> I would go with whatever statisticians prefer. But I think, going by
> null hypothesis makes functions for statistical tests much cleaner
> (easier to categorize, explain, find) than one-stop statistics (at
> least for functions and not methods in classes) as is the current
> tradition of scipy.stats.
>
> "fit" in your function name is very misleading chisquare_fit, because
> your function doesn't do any fitting. If a rename is desired, I would
> call it chisquare_gof, but I use a similar name for the actual gof
> test based on the sample data, with automatic binning.
> Fitting the distribution parameters raises other issues which I don't
> think should be mixed with the basic chisquare-test
>
>   

Yes, I agree.  I only used "fit" to distinguish it from "ind".  I didn't 
want to use "oneway" and "nway", because those names might lead one to 
think that "oneway" is the n=1 case of "nway", but it is not.


Warren




More information about the SciPy-Dev mailing list