Hypergeometric distribution

Raven balckraven at gmail.com
Mon Jan 2 06:35:33 EST 2006


Well, what to say? I am very happy for all the solutions you guys have
posted :-)
For Paul:
I would prefer not to use Stirling's approximation


The problem with long integers is that to calculate the hypergeometric
I need to do float division and multiplication because integer division
returns 0. A solution could be to calculate log(Long_Factorial_Integer)
and finally calculate the hypergeometric with the logarithmic values.
I've done a test: iterated 1000 times two different functions for the
hypergeometric, the first one based on scipy.special.gammaln:

from scipy.special import gammaln

def lnchoose(n, m):
  nf = gammaln(n + 1)
  mf = gammaln(m + 1)
  nmmnf = gammaln(n - m + 1)
  return nf - (mf + nmmnf)

def hypergeometric_gamma(k, n1, n2, t):
  if t > n1 + n2:
    t = n1 + n2
  if k > n1 or k > t:
    return 0
  elif t > n2 and ((k + n2) < t):
    return 0
  else:
    c1 = lnchoose(n1,k)
    c2 = lnchoose(n2, t - k)
    c3 = lnchoose(n1 + n2 ,t)

    return exp(c1 + c2 - c3)

and the second one based on the code by Steven and Scott:


import time
from math import log, exp

def bincoeff1(n, r):
  if r < n - r:
    r = n - r
  x = 1
  for i in range(n, r, -1):
    x *= i
  for i in range(n - r, 1, -1):
    x /= i
  return x

def hypergeometric(k, n1, n2, t):
  if t > n1 + n2:
    t = n1 + n2
  if k > n1 or k > t:
    return 0
  elif t > n2 and ((k + n2) < t):
    return 0
  else:
    c1 = log(raw_bincoeff1(n1,k))
    c2 = log(raw_bincoeff1(n2, t - k))
    c3 = log(raw_bincoeff1(n1 + n2 ,t))

    return exp(c1 + c2 - c3)

def main():
  t = time.time()
  for i in range(1000):
    r = hypergeometric(6,6,30,6)
  print time.time() - t

  t = time.time()
  for i in range(1000):
    r = hypergeometric_gamma(6,6,30,6)
  print time.time() - t


if __name__ == "__main__":
  main()


and the result is:

0.0386447906494
0.192448139191

The first approach is faster so I think I will adopt it.




More information about the Python-list mailing list