[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Bruce Southey bsouthey at gmail.com
Mon Jun 13 14:56:24 EDT 2011


On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>>
>> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>
>>> On Sun, Jun 12, 2011 at 7:52 PM,  <josef.pktd at gmail.com> wrote:
>>> >
>>> > All the p-values agree for the alternatives two-sided, less, and
>>> > greater, the odds ratio is defined differently as explained pretty
>>> > well in the docstring.
>>> >
>>> > Josef
>>> Yes, but you said to follow BOTH R and SAS - that means providing all
>>> three:
>>>
>>> The FREQ Procedure
>>>
>>> Table of Exposure by Response
>>>
>>> Exposure     Response
>>>
>>> Frequency|       0|       1|  Total
>>> ---------+--------+--------+
>>>       0 |    190 |    800 |    990
>>> ---------+--------+--------+
>>>       1 |    200 |    900 |   1100
>>> ---------+--------+--------+
>>> Total         390     1700     2090
>>>
>>>
>>> Statistics for Table of Exposure by Response
>>>
>>> Statistic                     DF       Value      Prob
>>> ------------------------------------------------------
>>> Chi-Square                     1      0.3503    0.5540
>>> Likelihood Ratio Chi-Square    1      0.3500    0.5541
>>> Continuity Adj. Chi-Square     1      0.2869    0.5922
>>> Mantel-Haenszel Chi-Square     1      0.3501    0.5541
>>> Phi Coefficient                       0.0129
>>> Contingency Coefficient               0.0129
>>> Cramer's V                            0.0129
>>>
>>>
>>>     Pearson Chi-Square Test
>>> ----------------------------------
>>> Chi-Square                  0.3503
>>> DF                               1
>>> Asymptotic Pr >  ChiSq      0.5540
>>> Exact      Pr >= ChiSq      0.5741
>>>
>>>
>>>       Fisher's Exact Test
>>> ----------------------------------
>>> Cell (1,1) Frequency (F)       190
>>> Left-sided Pr <= F          0.7416
>>> Right-sided Pr >= F         0.2960
>>>
>>> Table Probability (P)       0.0376
>>> Two-sided Pr <= P           0.5741
>>>
>>> Sample Size = 2090
>>>
>>> Thus providing all three is the correct answer.
>>>
>> Eh, we do. The interface is the same as that of R, and all three of
>> {two-sided, less, greater} are extensively checked against R. It looks like
>> you are reacting to only one statement Josef made to explain his
>> interpretation of less/greater. Please check the actual commit and then
>> comment if you see anything wrong.
>>
>> Ralf
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> I have looked at it (again) and the comments still stand:
>> A user should not have to read a statistical book and then the code to
>> figure out what was actually implemented here.  So I do strongly object to
>> Josef's statements as you just can not interpret Fisher's test in that way.
>> Just look at how SAS presents the results as should give a huge clue that
>> the two-sided tests is different than the other one-sided tests.
>
> Okay, I am pasting the entire docstring below. You seem to know a lot about
> this, so can you please suggest wording for things to be added/changed?
>
> I have compared with the R doc
> (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
> that's not much different as far as I can tell.
>
> Thanks a lot,
> Ralf

You are assuming a lot by saying that I even agree with  R documentation :-)
If you noticed, I never referred to it because it is not correct
compared SAS and other sources given.


>
>
>     Performs a Fisher exact test on a 2x2 contingency table.
>
>     Parameters
>     ----------
>     table : array_like of ints
>         A 2x2 contingency table.  Elements should be non-negative integers.
>     alternative : {'two-sided', 'less', 'greater'}, optional
>         Which alternative hypothesis to the null hypothesis the test uses.
>         Default is 'two-sided'.
>
>     Returns
>     -------
>     oddsratio : float
>         This is prior odds ratio and not a posterior estimate.
>     p_value : float
>         P-value, the probability of obtaining a distribution at least as
>         extreme as the one that was actually observed, assuming that the
>         null hypothesis is true.
>
>     See Also
>     --------
>     chisquare : inexact alternative that can be used when sample sizes are
>                 large enough.
>
>     Notes
>     -----
>     The calculated odds ratio is different from the one R uses. In R
> language,
>     this implementation returns the (more common) "unconditional Maximum
>     Likelihood Estimate", while R uses the "conditional Maximum Likelihood
>     Estimate".
>
>     For tables with large numbers the (inexact) `chisquare` test can also be
>     used.
>
>     Examples
>     --------
>     Say we spend a few days counting whales and sharks in the Atlantic and
>     Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark, in
> the
>     Indian ocean 2 whales and 5 sharks. Then our contingency table is::
>
>                 Atlantic  Indian
>         whales     8        2
>         sharks     1        5
>
>     We use this table to find the p-value:
>
>     >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
>     >>> pvalue
>     0.0349...
>
>     The probability that we would observe this or an even more imbalanced
> ratio
>     by chance is about 3.5%.  A commonly used significance level is 5%, if
> we
>     adopt that we can therefore conclude that our observed imbalance is
>     statistically significant; whales prefer the Atlantic while sharks
> prefer
>     the Indian ocean.
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>

So did two of the six whales give birth?
That docstring is incomplete and probably does not meet the Scipy
documentation guidelines because not everything is explained. It is
not a small amount of effort to clean this up to be technically
correct -  0.0349 is not 'about 3.5%'.

Bruce



More information about the SciPy-User mailing list