[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?
Bruce Southey
bsouthey at gmail.com
Mon Jun 13 14:56:24 EDT 2011
On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
>>
>> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>
>>> On Sun, Jun 12, 2011 at 7:52 PM, <josef.pktd at gmail.com> wrote:
>>> >
>>> > All the p-values agree for the alternatives two-sided, less, and
>>> > greater, the odds ratio is defined differently as explained pretty
>>> > well in the docstring.
>>> >
>>> > Josef
>>> Yes, but you said to follow BOTH R and SAS - that means providing all
>>> three:
>>>
>>> The FREQ Procedure
>>>
>>> Table of Exposure by Response
>>>
>>> Exposure Response
>>>
>>> Frequency| 0| 1| Total
>>> ---------+--------+--------+
>>> 0 | 190 | 800 | 990
>>> ---------+--------+--------+
>>> 1 | 200 | 900 | 1100
>>> ---------+--------+--------+
>>> Total 390 1700 2090
>>>
>>>
>>> Statistics for Table of Exposure by Response
>>>
>>> Statistic DF Value Prob
>>> ------------------------------------------------------
>>> Chi-Square 1 0.3503 0.5540
>>> Likelihood Ratio Chi-Square 1 0.3500 0.5541
>>> Continuity Adj. Chi-Square 1 0.2869 0.5922
>>> Mantel-Haenszel Chi-Square 1 0.3501 0.5541
>>> Phi Coefficient 0.0129
>>> Contingency Coefficient 0.0129
>>> Cramer's V 0.0129
>>>
>>>
>>> Pearson Chi-Square Test
>>> ----------------------------------
>>> Chi-Square 0.3503
>>> DF 1
>>> Asymptotic Pr > ChiSq 0.5540
>>> Exact Pr >= ChiSq 0.5741
>>>
>>>
>>> Fisher's Exact Test
>>> ----------------------------------
>>> Cell (1,1) Frequency (F) 190
>>> Left-sided Pr <= F 0.7416
>>> Right-sided Pr >= F 0.2960
>>>
>>> Table Probability (P) 0.0376
>>> Two-sided Pr <= P 0.5741
>>>
>>> Sample Size = 2090
>>>
>>> Thus providing all three is the correct answer.
>>>
>> Eh, we do. The interface is the same as that of R, and all three of
>> {two-sided, less, greater} are extensively checked against R. It looks like
>> you are reacting to only one statement Josef made to explain his
>> interpretation of less/greater. Please check the actual commit and then
>> comment if you see anything wrong.
>>
>> Ralf
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> I have looked at it (again) and the comments still stand:
>> A user should not have to read a statistical book and then the code to
>> figure out what was actually implemented here. So I do strongly object to
>> Josef's statements as you just can not interpret Fisher's test in that way.
>> Just look at how SAS presents the results as should give a huge clue that
>> the two-sided tests is different than the other one-sided tests.
>
> Okay, I am pasting the entire docstring below. You seem to know a lot about
> this, so can you please suggest wording for things to be added/changed?
>
> I have compared with the R doc
> (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
> that's not much different as far as I can tell.
>
> Thanks a lot,
> Ralf
You are assuming a lot by saying that I even agree with R documentation :-)
If you noticed, I never referred to it because it is not correct
compared SAS and other sources given.
>
>
> Performs a Fisher exact test on a 2x2 contingency table.
>
> Parameters
> ----------
> table : array_like of ints
> A 2x2 contingency table. Elements should be non-negative integers.
> alternative : {'two-sided', 'less', 'greater'}, optional
> Which alternative hypothesis to the null hypothesis the test uses.
> Default is 'two-sided'.
>
> Returns
> -------
> oddsratio : float
> This is prior odds ratio and not a posterior estimate.
> p_value : float
> P-value, the probability of obtaining a distribution at least as
> extreme as the one that was actually observed, assuming that the
> null hypothesis is true.
>
> See Also
> --------
> chisquare : inexact alternative that can be used when sample sizes are
> large enough.
>
> Notes
> -----
> The calculated odds ratio is different from the one R uses. In R
> language,
> this implementation returns the (more common) "unconditional Maximum
> Likelihood Estimate", while R uses the "conditional Maximum Likelihood
> Estimate".
>
> For tables with large numbers the (inexact) `chisquare` test can also be
> used.
>
> Examples
> --------
> Say we spend a few days counting whales and sharks in the Atlantic and
> Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark, in
> the
> Indian ocean 2 whales and 5 sharks. Then our contingency table is::
>
> Atlantic Indian
> whales 8 2
> sharks 1 5
>
> We use this table to find the p-value:
>
> >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
> >>> pvalue
> 0.0349...
>
> The probability that we would observe this or an even more imbalanced
> ratio
> by chance is about 3.5%. A commonly used significance level is 5%, if
> we
> adopt that we can therefore conclude that our observed imbalance is
> statistically significant; whales prefer the Atlantic while sharks
> prefer
> the Indian ocean.
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
So did two of the six whales give birth?
That docstring is incomplete and probably does not meet the Scipy
documentation guidelines because not everything is explained. It is
not a small amount of effort to clean this up to be technically
correct - 0.0349 is not 'about 3.5%'.
Bruce
More information about the SciPy-User
mailing list