[SciPy-dev] Bug/Error with chi-squared distribution and df<1

Kevin Dunn kgdunn at gmail.com
Tue Sep 22 14:04:37 EDT 2009


Thanks for the prompt reply Josef; please see my comments below.

> On Tue, Sep 22, 2009 at 10:59 AM, Kevin Dunn <kgdunn at gmail.com> wrote:
>> Hi there,
>>
>> I'm not an expert on distributions, but as far as I can tell, the chi2
>> distribution is defined for degrees of freedom >0. ?I'm getting "nan"
>> results however when df is less than one (but still greater than 0).
>> The chi2 CDF values agree between R, MATLAB and Scipy when the the
>> degrees of freedom are >= 1. ?For example:
>> * R: pchisq(0.95, 1)
>> * MATLAB: chi2cdf(0.95, 1)
>> * SciPy: scipy.stats.chi2.cdf(0.95, 1)
>>
>
>
>> However, changing the last argument to 0.5 returns a NaN in SciPy, but
>> gives a result (0.8392259) in R and MATLAB.
>
>>>> stats.chi2.veccdf(0.95, 0.5)
> array(0.83922587961194761)
>
>>
>> I'm suspecting there is something wrong with my SciPy installation,
>> because the example included with SciPy, (found using
>> scipy.stats.chi2? inside iPython) calls the chi2.cdf function with
>> df=0.9, yet when I run that example as follows:
>
> Where is this example? maybe it is an autogenerated example with wrong numbers?

The example was found when typing "scipy.stats.chi2?" in ipython after
importing scipy.stats.  The code below was taken from the example,
only I omitted the matplotlib plotting commands.

>> from scipy.stats import chi2
>> import numpy as np
>> numargs = chi2.numargs
>> [ df ] = [0.9,]*numargs
>> rv = chi2(df)
>> x = np.linspace(0,np.minimum(rv.dist.b,3))
>> prb = chi2.cdf(x,df)
>>
>> My result for prb is as follows; which I don't think would have been
>> used as an example if this is the expected output.
>> array([ ?0., ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN,
>> ? ? ? ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN,
>> ? ? ? ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN,
>> ? ? ? ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN,
>> ? ? ? ?NaN, ?NaN, ?NaN, ?NaN, ?NaN, ?NaN])
>>
>> Is my SciPy install messed up? ?I'm using MacOSX and downloaded
>> version 0.7.1 from SourceForge this morning. ?I just tried it in
>> Ubuntu Linux (version 0.7.0 though), and get the same results.
>>
>
> chi2 uses scipy.special
>
> class chi2_gen
>
> def _cdf(self, x, df):
>        return special.chdtr(df, x)
>
> which obviously cannot handle df<1 , which I don't know if it would
> ever show up in the usual statistical tests.

I don't believe it is obvious why special.chdtr can't handle values df
< 1. Please see my additional comments below.

>
> pdf doesn't seem to have any problems
>
>>>> stats.chi2.pdf(np.arange(5),0.99)
> array([        Inf,  0.24042373,  0.10275665,  0.05078513,  0.02663761])
>>>> stats.chi2.pdf(np.arange(5),0.5)
> array([        Inf,  0.14067411,  0.05073346,  0.02270277,  0.01109756])
>>>> stats.chi2.pdf(np.arange(5),0.25)
> array([        Inf,  0.07382471,  0.02441481,  0.01038547,  0.00489731])
>
> so numerical integration should also work
>
>>>> stats.chi2.veccdf(np.arange(1,5),0.25)
> array([ 0.92605422,  0.96918041,  0.98539847,  0.99265551])
>>>> stats.chi2.veccdf(np.linspace(0.01,10.0,11),0.25)
> array([ 0.54726537,  0.92671456,  0.96937499,  0.98547096,  0.99268483,
>        0.99618434,  0.99796201,  0.99889274,  0.99939058,  0.99966116,
>        0.99981006])
>>>> stats.chi2.veccdf(np.linspace(0.01,10.0,11),0.5)
> array([ 0.29308089,  0.84774539,  0.93248332,  0.96674206,  0.98278044,
>        0.99081467,  0.99500114,  0.99723983,  0.9984591 ,  0.99913231,
>        0.99950797])
>
> since pdf at zero is inf, I don't know how good the numbers are if the
> cdf is calculated for points close to zero
> e.g.
>
>>>> stats.chi2.veccdf(1e-8,0.5)
> array(0.0092772960765327081)
>
>
> Since I don't think df<1 is a common case, I wouldn't want to switch
> to numerical integration by default. But veccdf although only a
> private function should work although it bypasses some of the checking
> and broadcasting.

Thanks for pointing out the numerical integration approach; 'll give
it a try.  As for df < 1, please see my other comments below.

>
> Somehow this sounds familiar, I need to check whether this or a
> similar case is already on record.
>
> Hope that helps,
>
> Josef

And regarding your other reply:

> I didn't find anything on the scipy trac nor on the mailinglist, since
> I'm subscribed to them. If you think that df<1 is an important case
> you could file a ticket for scipy.special. I have no idea how
> difficult it would be to extend the chi squared related functions for
> this or whether there exists another workaround.

I just downloaded the latest SVN code and found the code that does the
work is /scipy/special/cephes/chdtr.c

In that code it seems that when the degrees of freedom are less than
one, the code is (artificially) forced to return a NaN:
        if (df < 1.0)
        {
            mtherr( "chdtrc", DOMAIN );
            return(NPY_NAN);
        }
       return( igamc( df/2.0, x/2.0 ) );

This seems a somewhat artificial constraint to me, since the gamma
function can accept values of 0 < df < 1.  Does someone else on this
list know why that constraint is there for df<1?

As for a practical usage case, I'm computing CDF values for a variable
by matching moments, and the value for the degrees of freedom term is
a computed value.  This value is always non-integer, and can sometimes
be less than one.

Additional justification for df < 1 can be sought by looking at plots
of the chi-squared distribution (e.g.
http://en.wikipedia.org/wiki/Chi-square_distribution) when plotting
with a degrees of freedom parameter (they call it "k" in the Wikipedia
article).  There's no reason why df can't be less than one, anymore
than there's reason for df being non-integer.

I'll give it a go compiling SciPy with removing those constraints in
chdtr.c and see how it works.  I've never done this before, but I'm
busy installing the NumPy/SciPy installation requirements at the
moment, and I'll report how it went.

If things works out, I'll file a trac report.  I've also cc'd the
scipy-dev list on this reply.

Thanks for your help,
Kevin

> The only similar story was  for the negative binomial for n<1,
> http://projects.scipy.org/scipy/ticket/978 . In that case
> scipy.special.nbdtr didn't allow the extension for n<1.
>
> Josef



More information about the SciPy-Dev mailing list