Hypergeometric distribution

Steven D'Aprano steve at REMOVETHIScyber.com.au
Sun Jan 1 21:31:23 EST 2006


On Sun, 01 Jan 2006 14:24:39 -0800, Raven wrote:

> Thanks Steven for your very interesting post.
> 
> This was a critical instance from my problem:
> 
>>>>from scipy import comb
>>>> comb(14354,174)
> inf

Curious. It wouldn't surprise me if scipy was using floats, because 'inf'
is usually a floating point value, not an integer.

Using my test code from yesterday, I got:

>>> bincoeff(14354,174)
11172777193562324917353367958024437473336018053487854593870
07090637489405604489192488346144684402362344409632515556732
33563523161308145825208276395238764441857829454464446478336
90173777095041891067637551783324071233625370619908633625448
31076677382448616246125346667737896891548166898009878730510
57476139515840542769956414204130692733629723305869285300247
645972456505830620188961902165086857407612722931651840L

Took about three seconds on my system.



> Yes I am calculating hundreds of hypergeometric probabilities so I
> need fast calculations

Another possibility, if you want exact integer maths rather than floating
point with logarithms, is to memoise the binomial coefficients. Something
like this:

# untested
def bincoeff(n,r, \
             cache={}):
    try:
        return cache((n,r))
    except KeyError:
        x = 1
        for i in range(r+1, n+1):
            x *= i
        for i in range(1, n-r+1):
            x /= i
        cache((n,r)) = x
        return x


-- 
Steven.




More information about the Python-list mailing list