[SciPy-User] Bug/Error with chi-squared distribution and df<1

Tue Sep 22 11:22:42 EDT 2009

On Tue, Sep 22, 2009 at 10:59 AM, Kevin Dunn <kgdunn at gmail.com> wrote:
> Hi there,
>
> I'm not an expert on distributions, but as far as I can tell, the chi2
> distribution is defined for degrees of freedom >0.  I'm getting "nan"
> results however when df is less than one (but still greater than 0).
> The chi2 CDF values agree between R, MATLAB and Scipy when the the
> degrees of freedom are >= 1.  For example:
> * R: pchisq(0.95, 1)
> * MATLAB: chi2cdf(0.95, 1)
> * SciPy: scipy.stats.chi2.cdf(0.95, 1)
>

> However, changing the last argument to 0.5 returns a NaN in SciPy, but
> gives a result (0.8392259) in R and MATLAB.

>>> stats.chi2.veccdf(0.95, 0.5)
array(0.83922587961194761)

>
> I'm suspecting there is something wrong with my SciPy installation,
> because the example included with SciPy, (found using
> scipy.stats.chi2? inside iPython) calls the chi2.cdf function with
> df=0.9, yet when I run that example as follows:

Where is this example? maybe it is an autogenerated example with wrong numbers?

>
> from scipy.stats import chi2
> import numpy as np
> numargs = chi2.numargs
> [ df ] = [0.9,]*numargs
> rv = chi2(df)
> x = np.linspace(0,np.minimum(rv.dist.b,3))
> prb = chi2.cdf(x,df)
>
> My result for prb is as follows; which I don't think would have been
> used as an example if this is the expected output.
> array([  0.,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,
>        NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,
>        NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,
>        NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,
>        NaN,  NaN,  NaN,  NaN,  NaN,  NaN])
>
> Is my SciPy install messed up?  I'm using MacOSX and downloaded
> version 0.7.1 from SourceForge this morning.  I just tried it in
> Ubuntu Linux (version 0.7.0 though), and get the same results.
>
> Thanks,
> Kevin
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

chi2 uses scipy.special

class chi2_gen

def _cdf(self, x, df):
        return special.chdtr(df, x)

which obviously cannot handle df<1 , which I don't know if it would
ever show up in the usual statistical tests.

pdf doesn't seem to have any problems

>>> stats.chi2.pdf(np.arange(5),0.99)
array([        Inf,  0.24042373,  0.10275665,  0.05078513,  0.02663761])
>>> stats.chi2.pdf(np.arange(5),0.5)
array([        Inf,  0.14067411,  0.05073346,  0.02270277,  0.01109756])
>>> stats.chi2.pdf(np.arange(5),0.25)
array([        Inf,  0.07382471,  0.02441481,  0.01038547,  0.00489731])

so numerical integration should also work

>>> stats.chi2.veccdf(np.arange(1,5),0.25)
array([ 0.92605422,  0.96918041,  0.98539847,  0.99265551])
>>> stats.chi2.veccdf(np.linspace(0.01,10.0,11),0.25)
array([ 0.54726537,  0.92671456,  0.96937499,  0.98547096,  0.99268483,
        0.99618434,  0.99796201,  0.99889274,  0.99939058,  0.99966116,
        0.99981006])
>>> stats.chi2.veccdf(np.linspace(0.01,10.0,11),0.5)
array([ 0.29308089,  0.84774539,  0.93248332,  0.96674206,  0.98278044,
        0.99081467,  0.99500114,  0.99723983,  0.9984591 ,  0.99913231,
        0.99950797])

since pdf at zero is inf, I don't know how good the numbers are if the
cdf is calculated for points close to zero
e.g.

>>> stats.chi2.veccdf(1e-8,0.5)
array(0.0092772960765327081)

Since I don't think df<1 is a common case, I wouldn't want to switch
to numerical integration by default. But veccdf although only a
private function should work although it bypasses some of the checking
and broadcasting.

Somehow this sounds familiar, I need to check whether this or a
similar case is already on record.

Hope that helps,

Josef