Hypergeometric distribution

Scott David Daniels scott.daniels at acm.org
Sun Jan 1 22:48:53 EST 2006


Steven D'Aprano wrote:
> On Sun, 01 Jan 2006 14:24:39 -0800, Raven wrote:
> 
>> Thanks Steven for your very interesting post.
>>
>> This was a critical instance from my problem:
>>
>>>> >from scipy import comb
>>>>> comb(14354,174)
>> inf
> 
> Curious. It wouldn't surprise me if scipy was using floats, because 'inf'
> is usually a floating point value, not an integer.
> 
> Using my test code from yesterday, I got:
> 
>>>> bincoeff(14354,174)
> ...
>> Yes I am calculating hundreds of hypergeometric probabilities so I
>> need fast calculations
> 
> Another possibility, if you want exact integer maths rather than floating
> point with logarithms, is to memoise the binomial coefficients. Something
> like this:
> 
> # untested
> def bincoeff(n,r, \
>              cache={}):
>     try:
>         return cache((n,r))
>     except KeyError:
>         x = 1
>         for i in range(r+1, n+1):
>             x *= i
>         for i in range(1, n-r+1):
>             x /= i
>         cache((n,r)) = x
>         return x

Well, there is a much better optimization to use first:

     def bincoeff1(n, r):
         if r < n - r:
             r = n - r
         x = 1
         for i in range(n, r, -1):
             x *= i
         for i in range(n - r, 1, -1):
             x /= i
         return x

Then, if you still need to speed it up:

     def bincoeff2(n, r, cache={}):
         if r < n - r:
             r = n - r
         try:
             return cache[n, r]
         except KeyError:
             pass
         x = 1
         for i in range(n, r, -1):
             x *= i
         for i in range(n - r, 1, -1):
             x /= i
         cache[n, r] = x
         return x


--Scott David Daniels
scott.daniels at acm.org



More information about the Python-list mailing list