[SciPy-User] Bug/Error with chi-squared distribution and df<1
josef.pktd at gmail.com
josef.pktd at gmail.com
Tue Sep 22 11:22:42 EDT 2009
On Tue, Sep 22, 2009 at 10:59 AM, Kevin Dunn <kgdunn at gmail.com> wrote:
> Hi there,
>
> I'm not an expert on distributions, but as far as I can tell, the chi2
> distribution is defined for degrees of freedom >0. I'm getting "nan"
> results however when df is less than one (but still greater than 0).
> The chi2 CDF values agree between R, MATLAB and Scipy when the the
> degrees of freedom are >= 1. For example:
> * R: pchisq(0.95, 1)
> * MATLAB: chi2cdf(0.95, 1)
> * SciPy: scipy.stats.chi2.cdf(0.95, 1)
>
> However, changing the last argument to 0.5 returns a NaN in SciPy, but
> gives a result (0.8392259) in R and MATLAB.
>>> stats.chi2.veccdf(0.95, 0.5)
array(0.83922587961194761)
>
> I'm suspecting there is something wrong with my SciPy installation,
> because the example included with SciPy, (found using
> scipy.stats.chi2? inside iPython) calls the chi2.cdf function with
> df=0.9, yet when I run that example as follows:
Where is this example? maybe it is an autogenerated example with wrong numbers?
>
> from scipy.stats import chi2
> import numpy as np
> numargs = chi2.numargs
> [ df ] = [0.9,]*numargs
> rv = chi2(df)
> x = np.linspace(0,np.minimum(rv.dist.b,3))
> prb = chi2.cdf(x,df)
>
> My result for prb is as follows; which I don't think would have been
> used as an example if this is the expected output.
> array([ 0., NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
> NaN, NaN, NaN, NaN, NaN, NaN])
>
> Is my SciPy install messed up? I'm using MacOSX and downloaded
> version 0.7.1 from SourceForge this morning. I just tried it in
> Ubuntu Linux (version 0.7.0 though), and get the same results.
>
> Thanks,
> Kevin
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
chi2 uses scipy.special
class chi2_gen
def _cdf(self, x, df):
return special.chdtr(df, x)
which obviously cannot handle df<1 , which I don't know if it would
ever show up in the usual statistical tests.
pdf doesn't seem to have any problems
>>> stats.chi2.pdf(np.arange(5),0.99)
array([ Inf, 0.24042373, 0.10275665, 0.05078513, 0.02663761])
>>> stats.chi2.pdf(np.arange(5),0.5)
array([ Inf, 0.14067411, 0.05073346, 0.02270277, 0.01109756])
>>> stats.chi2.pdf(np.arange(5),0.25)
array([ Inf, 0.07382471, 0.02441481, 0.01038547, 0.00489731])
so numerical integration should also work
>>> stats.chi2.veccdf(np.arange(1,5),0.25)
array([ 0.92605422, 0.96918041, 0.98539847, 0.99265551])
>>> stats.chi2.veccdf(np.linspace(0.01,10.0,11),0.25)
array([ 0.54726537, 0.92671456, 0.96937499, 0.98547096, 0.99268483,
0.99618434, 0.99796201, 0.99889274, 0.99939058, 0.99966116,
0.99981006])
>>> stats.chi2.veccdf(np.linspace(0.01,10.0,11),0.5)
array([ 0.29308089, 0.84774539, 0.93248332, 0.96674206, 0.98278044,
0.99081467, 0.99500114, 0.99723983, 0.9984591 , 0.99913231,
0.99950797])
since pdf at zero is inf, I don't know how good the numbers are if the
cdf is calculated for points close to zero
e.g.
>>> stats.chi2.veccdf(1e-8,0.5)
array(0.0092772960765327081)
Since I don't think df<1 is a common case, I wouldn't want to switch
to numerical integration by default. But veccdf although only a
private function should work although it bypasses some of the checking
and broadcasting.
Somehow this sounds familiar, I need to check whether this or a
similar case is already on record.
Hope that helps,
Josef
More information about the SciPy-User
mailing list