[SciPy-dev] Warning about remaining issues in stats.distributions ?

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Dec 9 16:28:09 EST 2008


On Tue, Dec 9, 2008 at 1:32 PM, Yaroslav Halchenko <lists at onerussian.com> wrote:
>> * distributions that have problems for some range of parameters
> so a good (imho) piece to add to unittests for the 'issues' to be fixed:
>
> scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps)
>

In know that there are several problems where the calculations are not
very precise at the boundaries. But I didn't know about this case and
I hope to get rid of the exceptions. So please file tickets for any of
these cases.

In my testing so far, I was quite happy if I got results like these.

stats.rdist(1.32, 0, 1).cdf(-1.0+1e-13)
1.2060822041379395e-009

But I guess there might still be a systematic problem with the
treatment of open parameter spaces or open domains, c>0, x>0, which I
have not tried out at all.

Per Brodtkorb has proposed some improvements in numerical precision,
which we will comitt after 0.7 is out.

For some numerical methods e.g. ppf,  it won't be possible to solve it
arbitrarily close to 0 or 1 for every distribution, given the way the
generic method finds the inverse function, in the test suite, I think,
I used 0.001 and 0.999.

In my fuzz testing there were some cases, where, for example, the
shape parameter has the restriction c>0, but seems to work only for
c>2, but I postponed these bugs for now, since I have to go over each
distribution individually to try to figure out what the problem is.
Additionally, for many distributions I don't know if anyone would ever
need the distribution for that parameter range.

Back to the rdist example
In my bugfixes, I temporarily removed the rdist._cdf, since the
generic method works also for large parameters, while
special.hyp2f1(0.5,1.0-c/2.0,1.5,x*x) does not work for large c. Once
I know over which domain the special functions are reliable,
dispatching to the generic methods only for some part of the parameter
space will be an improvement, but this requires some time consuming
testing.

So any help, reporting and patches are very welcome, especially from
users who actually use the specific distribution.

Josef



More information about the SciPy-Dev mailing list