[SciPy-User] fisherexact.py returns "NA" for large #s

josef.pktd at gmail.com josef.pktd at gmail.com
Fri May 7 16:15:19 EDT 2010


On Fri, May 7, 2010 at 3:45 PM, Pete Shepard <peter.shepard at gmail.com> wrote:
> Hello List,
>
>
> I am using "fisherexact.py" to calculate the p-value of two ratios however,
> when large #s are involved, it returns "NA". Is there a way to override
> this?


You mean fisherexact in http://projects.scipy.org/scipy/ticket/956 ?

Do you have an example? Can you add it to the ticket?

Do you have large ratios or large numbers in each cell?
If you have a large number of entries in each cell, then the chisquare
test or similar
asymptotic tests should be pretty reliable.

Last time I tried, I didn't manage to get rid of incorrect results if
the first cell is zero.
And I didn't understand the details of the algorithm well enough to
figure out what's
going on (within a reasonable time).

If you add some print statements, you could find out if the nan comes from a
0./0. division or from the hypergeometric distribution.
Do you get the same result if you permute rows or columns?

fisherexact works very well over a large range of values, but I'm
waiting for someone
to provide a patch for the cases that don't work.

Josef





>
> TIA
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>



More information about the SciPy-User mailing list