[SciPy-User] Loss of precision with powers and factorials?

Fri May 6 09:23:50 EDT 2016

Use the stats module instead of doing this calculation yourself.  A huge
amount of work has already been put in to calculate things like this in
numerically robust ways.

In [1]: from scipy import stats

In [2]: stats.binom.cdf(250,36000, 0.00625)

Out[2]: 0.95406011210591368

On Fri, May 6, 2016 at 2:55 AM, Evgeni Burovski <evgeny.burovskiy at gmail.com>
wrote:

> On Fri, May 6, 2016 at 7:39 AM, Alok Singhal <gandalf013 at gmail.com> wrote:
> > Chris Cameron <chris <at> upnix.com> writes:
> >>
> >> ############
> >> import numpy as np
> >> import scipy as sp
> >>
> >> n = 36000
> >> i = np.array(range(0, 250))
> >>
> >> np.sum(
> >>     (sp.misc.factorial(n) / \
> >>     (sp.misc.factorial(n-i) * sp.misc.factorial(i))) * \
> >>     np.power(0.00625, i) * np.power((1-0.00625), (n-i)))
> >> ############
> >>
> >> When I execute this I get ‘nan’.
> >>
> >> In Mathematica this same formula and values get me a seemingly
> >> reasonable answer:
> >> Sum[36000!/((36000 - i)!*i!)*0.00625^i*(1 - 0.00625)^(36000 - i),
> >>  {i, 0, 250}] = 0.95406
> >>
> >> I’m running Python 2.7.11, Numpy 1.10.4, and SciPy 0.17 from inside
> >> iPython 4.1.2
> >>
> >> From trying to execute this piece by piece, it seems like the
> “np.power()”
> >> function  is just returning “0” when the result is super small. Could
> the
> >> problem be not enough precision is being held?
> >
> > You are trying to calculate factorial(36000), which is ~1e148395, so your
> > calculation overflows in the first step.  Similarly for factorial(36000
> - i),
> > and then their ratio is nan.
> >
> > Instead of calculating the factorials and then dividing, you should
> "divide
> > as you go", so that the numbers don't get too large.  gmpy automatically
> does
> > that for you:
> >
> >>>> import gmpy
> >>>> a = gmpy.mpf('0.00625')
> >>>> ivals = [gmpy.mpz(x) for x in range(251)]
> >>>> n = 36000
> >>>> result = sum(gmpy.comb(n, x) * a**x * (1-a)**(n-x) for x in ivals)
> >>>> float(result)
> > 0.954060112106136
> >
> > This is pretty fast and gives an answer that matches the answer you get
> from
> > Mathematica.
> >
> > In fact, you don't even need to calculate the sum yourself.  You can use
> the
> > formula for CDF of binomial distribution from
> > http://mathworld.wolfram.com/BinomialDistribution.html to get the
> answer you
> > want:
> >
> > result = 1.0 - I_p(n+1, N-n) (where p = 0.00625, n = 250, N = 36000)
> >
> > result = B(p; n+1, N-n) / B(n+1, N-n)
> >
> > So, using scipy:
> >
> >>>> import scipy.special
> >>>> betainc = scipy.special.betainc
> >>>> n = 250
> >>>> N = 36000
> >>>> p = 0.00625
> >>>> result = 1.0 - betainc(n+1, N-n, p) / betainc(n+1, N-n, 1.0)
> >>>> print(result)
> > 0.95406011210468011
> >
> > (Note that in your Python implementation, you are summing from 0 to 249,
> > whereas in your Mathematica implementation, you are summing from 0 to
> 250.
> > I am including 250 in my examples above).
>
>
> I'd add that you can in principle use python loooong integers for
> factorials:
>
> >>> from scipy.special import factorial as f
> >>> f(36000)
> array(inf)
> >>> x = f(36000, exact=True)
> >>> y = f(35999, exact=True)
> >>> x // y
> 36000
> >>> type(x)
> <class 'int'>
>
> >>> import scipy
> >>> scipy.__version__
> '0.18.0.dev0+8b07439'
>
> Not saying it's a good idea to use them for calculations of binomials
> (it isn't), but at least technically the option is there.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20160506/f4b5ff0e/attachment.html>