[SciPy-User] Off by one bug in Scipy.stats.hypergeom

Jacob Biesinger jake.biesinger at gmail.com
Sun Aug 1 01:16:20 EDT 2010


I have seen the error of my ways and apologize for the noise in the list-- I
should have been using the probability mass function instead of the survival
function.  Doing the summation manually shows that all is well:

def hypergeom(x, fgTotal, bgMatching, bgTotal):
        return scipy.comb(fgTotal,x) *
scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)

>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
range(4,min(10+1,50+1)))
0.0040835205497557125
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
0.0040835205497557186


>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
range(5,min(10+1,50+1)))
0.00011893749174045688
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
0.00011893749174045679


Thanks anyway!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587


On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger
<jake.biesinger at gmail.com>wrote:

> Hi!
>
> Perhaps I'm using the module incorrectly, but it looks like the x parameter
> in scipy.stats.hypergeom is off by one.  Specifically, I think it's
> one-too-high.
>
> From the wikipedia article
> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
> know they could be wrong-- just hear me out on this),
>
> scipy.stats.hypergeom?
> Hypergeometric distribution
>
>        Models drawing objects from a bin.
>        M is total number of objects, n is total number of Type I objects.
>        RV counts number of Type I objects in N drawn without replacement
> from
>        population.
> So translating wikipedia's example...
> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
> choose(50,10) = .003964583
> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
> choose(50,10) = .0001189375
>
> Which you can check with the python code:
> from scipy import comb as chse   # "combination" => choose
>
> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) /
> chse(50,10,exact=1)  # example one
> 0.0039645830580150654155  # okay!
>
> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) /
> chse(50,10,exact=1) # example two
> 0.00011893749174045196247  # okay!
>
> Try example one with scipy.stats.hypergeom:
> # scipy.stats.hypergeom.sf(x, M, n, N)
> scipy.stats.hypergeom.sf(4,50,5,10)
> 0.00011893749169422652     # correct value for x=5, not x=4
> scipy.stats.hypergeom.sf(5,50,5,10)
> -4.6185277824406512e-14    # wrong
>
> It seems that changing the loc value from =0 (default) to =1 fixes the
> issue...
> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073    # close enough
>
> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652   # okay!
>
> Am I using the package wrong?
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100731/5c74fd9c/attachment.html>


More information about the SciPy-User mailing list