Hypergeometric distribution
Scott David Daniels
scott.daniels at acm.org
Sun Jan 1 22:48:53 EST 2006
Steven D'Aprano wrote:
> On Sun, 01 Jan 2006 14:24:39 -0800, Raven wrote:
>
>> Thanks Steven for your very interesting post.
>>
>> This was a critical instance from my problem:
>>
>>>> >from scipy import comb
>>>>> comb(14354,174)
>> inf
>
> Curious. It wouldn't surprise me if scipy was using floats, because 'inf'
> is usually a floating point value, not an integer.
>
> Using my test code from yesterday, I got:
>
>>>> bincoeff(14354,174)
> ...
>> Yes I am calculating hundreds of hypergeometric probabilities so I
>> need fast calculations
>
> Another possibility, if you want exact integer maths rather than floating
> point with logarithms, is to memoise the binomial coefficients. Something
> like this:
>
> # untested
> def bincoeff(n,r, \
> cache={}):
> try:
> return cache((n,r))
> except KeyError:
> x = 1
> for i in range(r+1, n+1):
> x *= i
> for i in range(1, n-r+1):
> x /= i
> cache((n,r)) = x
> return x
Well, there is a much better optimization to use first:
def bincoeff1(n, r):
if r < n - r:
r = n - r
x = 1
for i in range(n, r, -1):
x *= i
for i in range(n - r, 1, -1):
x /= i
return x
Then, if you still need to speed it up:
def bincoeff2(n, r, cache={}):
if r < n - r:
r = n - r
try:
return cache[n, r]
except KeyError:
pass
x = 1
for i in range(n, r, -1):
x *= i
for i in range(n - r, 1, -1):
x /= i
cache[n, r] = x
return x
--Scott David Daniels
scott.daniels at acm.org
More information about the Python-list
mailing list