[SciPy-dev] fix for ncx2 bug?

Wed Jun 17 15:00:43 EDT 2009

josef.pktd at gmail.com wrote:

> On Wed, Jun 17, 2009 at 1:53 PM, Neal Becker<ndbecker2 at gmail.com> wrote:
>> josef.pktd at gmail.com wrote:
>>
>>> On Wed, Jun 17, 2009 at 1:08 PM, Neal Becker<ndbecker2 at gmail.com> wrote:
>>>> josef.pktd at gmail.com wrote:
>>>>
>>>>> On Wed, Jun 17, 2009 at 10:45 AM, Neal Becker<ndbecker2 at gmail.com>
>>>>> wrote:
>>>>>> As reported earlier, there is a bug in ncx2 that causes a strange
>>>>>> discontinuity.
>>>>>>
>>>>>> Is there a patch available?
>>>>>>
>>>>>
>>>>> It's a ticket, but it could take some time until someone finds all the
>>>>> imprecision for "outlier" cases in scipy.special.
>>>>>
>>>>> I'm not sure these functions are even designed to have a high
>>>>> precision, because for distributions such as chisquare or ncx2, that
>>>>> are primarily used for test statistics, it make only sense to report a
>>>>> few digits.
>>>>>
>>>>> I would have recommended ncx2.veccdf for your case, but a quick check
>>>>> showed that over a large range ncx2.veccdf and ncx2.cdf only agree up
>>>>> to 1e-8. For ncx2.cdf close to one, there might also be a precision
>>>>> loss.
>>>>>
>>>>> What's your use case that you need ncx2 at high precision?
>>>>>
>>>>> Josef
>>>>
>>>> I'm using this to compute what's often called "Receiver Operating
>>>> Characteristic", which is a signal detection problem.  You may see over
>>>> some range that probability of miss or false detection is quite low.
>>>> Probably you don't really care about accuracy here, but it's making the
>>>> plots look bad.
>>>
>>> If speed doesn't matter too much, you can still use veccdf, I don't
>>> think it's more accurate but it is smooth, no discontinuities and it
>>> also goes to 1 in the upper tail.
>>>
>>> veccdf is a private method, that avoids the argument check, but as
>>> long as you call it with valid arguments you get good solutions, and
>>> it is vectorized (based on np.vectorize). The generic methods are
>>> pretty heavily tested, so besides slower speed from the numerical
>>> integration, I wouldn't expect any problems.
>>>
>>> In contrast to central chisquare, I didn't see any formulas that would
>>> allow a direct calculation of the ncx2.cdf as function of some other
>>> special functions.
>>>
>>> Josef
>>
>> I'm no expert on this.  I do see 2 refs:
>>
>> Continuous Univariate Distributions, Vol 2, 2nd edition, Johnson, Kotz,
>> Balakrishnan has an entire chapter on the subject.
>>
>> 
http://www.boost.org/doc/libs/1_39_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/dists/nc_chi_squared_dist.html
>>
>> The code says:
>> //
>> // Computes the complement of the Non-Central Chi-Square
>> // Distribution CDF by summing a weighted sum of complements
>> // of the central-distributions.  The weighting factor is
>> // a Poisson Distribution.
>> //
>> // This is an application of the technique described in:
>> //
>> // Computing discrete mixtures of continuous
>> // distributions: noncentral chisquare, noncentral t
>> // and the distribution of the square of the sample
>> // multiple correlation coeficient.
>> // D. Benton, K. Krishnamoorthy.
>> // Computational Statistics & Data Analysis 43 (2003) 249 - 267
>>
>>
>> I see that the 1st ref above also talks about the fact that cdf on ncx2
>> can be obtained as weighted sum of central distributions, but it also
>> goes on to give other methods of calculation.
>>
> 
> The problem is that the sum has an infinite number of terms. I looked
> at JKB vol.2 but they only discuss the central chisquare distribution
> extensively, but I didn't see much for non-central chi-square.
> 
> Wikipidia as the infinite series expression for the ncx2.cdf
>  http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution
> 
> 
> But I'm not a numerical precision person, and I usually don't worry
> about numerical, floating point precision of convergence of a series
> with an infinite number of terms. My preferred bugfix in this would be
> special.hyp2f1.  But, I don't have the background and I prefer to
> chase the "big bugs", 1e10 and 1e1 instead of 1e-8 and 1e-15.
> 
> The infinite mixture might be relatively easy to calculate for
> standard cases, but it might be difficult to maintain a high precision
> when users start to plug in "funny" parameters and/or expect it to
> work for every tail case.
> 
> Josef

How about the boost reference?  This is an existence proof of code that is 
supposed to be good quality.  Refer to non_central_chi_squared_cdf in 
boost/math/distributions/non_central_chi_squared.hpp.