Hypergeometric distribution

Steven D'Aprano steve at REMOVETHIScyber.com.au
Mon Dec 26 20:23:00 EST 2005


On Mon, 26 Dec 2005 12:18:55 -0800, Raven wrote:

> Hi to all, I need to calculate the hpergeometric distribution:
> 
> 
>                        choose(r, x) * choose(b, n-x)
>         p(x; r,b,n) =  -----------------------------
>                            choose(r+b, n)
> 
> choose(r,x) is the binomial coefficient
> I use the factorial to calculate the above formula but since I am using
> large numbers, the result of choose(a,b) (ie: the binomial coefficient)
> is too big even for large int.

Are you sure about that? Python long ints can be as big as you have enough
memory for. My Python can print 10L**10000 to the console with a barely
detectable pause, and 10L**100000 with about a ten second delay. (Most of
that delay is printing it, not calculating it.)

25206 is the first integer whose factorial exceeds 10L**100000, so even if
you are calculating the binomial coefficient using the most naive
algorithm, calculating the factorials and dividing, you should easily be
able to generate it for a,b up to 20,000 unless you have a severe
shortage of memory.

> I've tried the scipy library, but this
> library calculates
> the hypergeometric using the factorials too, so the problem subsist.

What exactly is your problem? What values of hypergeometric(x; r,b,n) fail
for you?



-- 
Steven.




More information about the Python-list mailing list