[SciPy-User] scipy.stats.beta does not handle small parameters

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Apr 20 09:52:02 EDT 2010


On Mon, Apr 19, 2010 at 9:49 AM, John Reid <j.reid at mail.cryst.bbk.ac.uk> wrote:
> I've been using scipy.stats.beta and I've noticed it returns NaN when
> the parameters are small. I've tried using the same parameters in R code
> and it handles them just fine. R also handles parameters up to 17 orders
> of magnitude smaller. Is there any documentation on which parameter
> ranges are acceptable? Can I expect similar results with other
> distributions? Should I file a bug report?

General answer:
Many distributions have numerical problems in the boundary range of
the valid distribution parameters, and there are no systematic tests
for these. In the last two years, we improved the numerical precision
in many of these cases. However, in other cases, working around
numerical precision problems is too much effort, too difficult or
would require too many special cases, so it's not really worth it for
the "common" usage.
Filing a bug report is always good, (and even better with a patch), in
the worst case it adds to documenting the limitation of the current
implementation.

In the beta.rvs case the relevant code is in numpy.random
    def _rvs(self, a, b):
        return mtrand.beta(a,b,self._size)

which means digging through the pyrex or c code to find the source of
the nans. Most of the other code in distributions.beta is from
scipy.special (either c or fortran code).

I just tried your first example (the following is also intended as
documentation for myself)

>>> alpha, beta
(7.1e-008, 4.2199999999999999e-007)

This is very close to a two point distribution, cdf in scipy and R aggree
>>> stats.beta.cdf(1e-8, alpha, beta)
0.85598265330617773
>>> r.pbeta(1e-8, alpha, beta)[0]
0.85598265330617762
>>> stats.beta.sf(1-1e-8, alpha, beta)
0.14401510767081682
>>> 1-r.pbeta(1-1e-8, alpha, beta)[0]
0.14401510767081682
>>> stats.beta.cdf(1e-8, alpha, beta) + stats.beta.sf(1-1e-8, alpha, beta)
0.99999776097699455

quantile/ppf: R and scipy disagree on what almost zero

>>> r.qbeta(0.5, alpha, beta)[0]
1.0547236079445126e-230
>>> stats.beta.ppf(0.5, alpha, beta)
4.9406564584124654e-324
>>> r.qbeta(0.85, alpha, beta)[0]
1.7763568394002505e-015
>>> stats.beta.ppf(0.85, alpha, beta)
0.0

pdf looks ok
>>> r.dbeta(1e-8, alpha, beta)[0]
6.0774768992486097
>>> stats.beta.pdf(1e-8, alpha, beta)
6.0774768992486035
>>> r.dbeta(1-1e-12, alpha, beta)[0]
60775.483679644421
>>> stats.beta.pdf(1-1e-12, alpha, beta)
60775.483679644392

ppf cdf roundtripping looks ok in one direction

>>> stats.beta.ppf(stats.beta.cdf(0.5, alpha, beta), alpha, beta)
0.49999999974425491
>>> r.qbeta(r.pbeta(0.5, alpha, beta)[0], alpha, beta)[0]
0.50000000006892631

roundtripping doesn't work in R nor in scipy in the other direction
>>> r.qbeta(r.qbeta(0.5, alpha, beta)[0], alpha, beta)[0]
4.0301331108853522e-308
>>> stats.beta.cdf(stats.beta.ppf(0.5, alpha, beta), alpha, beta)
0.85593853078304538

#check roundtripping formula is correct
>>> stats.norm.cdf(stats.norm.ppf(0.5, alpha, beta), alpha, beta)
0.5
>>> stats.norm.cdf(stats.norm.ppf(0.25, alpha, beta), alpha, beta)
0.25

rvs in R looks ok
>>> rrvs = r.rbeta(1000, alpha, beta)
>>> rrvsp = [rrvs[i] for i in range(1000)]
>>> (np.array(rrvsp)<0.5).mean()
0.85499999999999998
>>> (np.array(rrvsp)>0.5).mean()
0.14499999999999999

but not in numpy
>>> np.random.beta(alpha, beta, size=4)
array([ NaN,  NaN,  NaN,  NaN])

So it might be worth a look into numpy.random.beta to see what causes the nan.
beta.ppf and qbeta both have problems in the neighborhood of zero (and one ?)

Josef



> Here's the code I used:
> import scipy.stats as S, numpy as N
> from rpy2.robjects import r
>
> alpha, beta = 0.0710, 0.4222
> for i in xrange(20):
>     x_from_scipy = S.beta.rvs(alpha, beta)
>     x_from_R = r.rbeta(1, alpha, beta)
>     print 'Alpha=%.2e; Beta=%.2e; scipy.stats.beta.rvs=%.2e;
> R.rbeta=%.2e' % (alpha, beta, x_from_scipy, x_from_R[0])
>     alpha /= 10.
>     beta /= 10.
>
> and the output from it:
> Alpha=7.10e-02; Beta=4.22e-01; scipy.stats.beta.rvs=2.75e-11;
> R.rbeta=6.60e-02
> Alpha=7.10e-03; Beta=4.22e-02; scipy.stats.beta.rvs=3.73e-84;
> R.rbeta=4.50e-124
> Alpha=7.10e-04; Beta=4.22e-03; scipy.stats.beta.rvs=1.00e+00;
> R.rbeta=1.00e+00
> Alpha=7.10e-05; Beta=4.22e-04; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-313
> Alpha=7.10e-06; Beta=4.22e-05; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-07; Beta=4.22e-06; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-315
> Alpha=7.10e-08; Beta=4.22e-07; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-316
> Alpha=7.10e-09; Beta=4.22e-08; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-317
> Alpha=7.10e-10; Beta=4.22e-09; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-318
> Alpha=7.10e-11; Beta=4.22e-10; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-319
> Alpha=7.10e-12; Beta=4.22e-11; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-320
> Alpha=7.10e-13; Beta=4.22e-12; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-321
> Alpha=7.10e-14; Beta=4.22e-13; scipy.stats.beta.rvs=nan; R.rbeta=3.95e-322
> Alpha=7.10e-15; Beta=4.22e-14; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-16; Beta=4.22e-15; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-17; Beta=4.22e-16; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> Alpha=7.10e-18; Beta=4.22e-17; scipy.stats.beta.rvs=nan; R.rbeta=1.00e+00
> Alpha=7.10e-19; Beta=4.22e-18; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> Alpha=7.10e-20; Beta=4.22e-19; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
> Alpha=7.10e-21; Beta=4.22e-20; scipy.stats.beta.rvs=nan; R.rbeta=0.00e+00
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list