[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Mon Jun 13 15:19:29 EDT 2011

On Mon, Jun 13, 2011 at 8:56 PM, Bruce Southey <bsouthey at gmail.com> wrote:

> On Mon, Jun 13, 2011 at 11:36 AM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> >
> >
> > On Mon, Jun 13, 2011 at 6:18 PM, Bruce Southey <bsouthey at gmail.com>
> wrote:
> >>
> >> On 06/13/2011 02:46 AM, Ralf Gommers wrote:
> >>
> >> On Mon, Jun 13, 2011 at 3:50 AM, Bruce Southey <bsouthey at gmail.com>
> wrote:
> >>>
> >>> On Sun, Jun 12, 2011 at 7:52 PM,  <josef.pktd at gmail.com> wrote:
> >>> >
> >>> > All the p-values agree for the alternatives two-sided, less, and
> >>> > greater, the odds ratio is defined differently as explained pretty
> >>> > well in the docstring.
> >>> >
> >>> > Josef
> >>> Yes, but you said to follow BOTH R and SAS - that means providing all
> >>> three:
> >>>
> >>> The FREQ Procedure
> >>>
> >>> Table of Exposure by Response
> >>>
> >>> Exposure     Response
> >>>
> >>> Frequency|       0|       1|  Total
> >>> ---------+--------+--------+
> >>>       0 |    190 |    800 |    990
> >>> ---------+--------+--------+
> >>>       1 |    200 |    900 |   1100
> >>> ---------+--------+--------+
> >>> Total         390     1700     2090
> >>>
> >>>
> >>> Statistics for Table of Exposure by Response
> >>>
> >>> Statistic                     DF       Value      Prob
> >>> ------------------------------------------------------
> >>> Chi-Square                     1      0.3503    0.5540
> >>> Likelihood Ratio Chi-Square    1      0.3500    0.5541
> >>> Continuity Adj. Chi-Square     1      0.2869    0.5922
> >>> Mantel-Haenszel Chi-Square     1      0.3501    0.5541
> >>> Phi Coefficient                       0.0129
> >>> Contingency Coefficient               0.0129
> >>> Cramer's V                            0.0129
> >>>
> >>>
> >>>     Pearson Chi-Square Test
> >>> ----------------------------------
> >>> Chi-Square                  0.3503
> >>> DF                               1
> >>> Asymptotic Pr >  ChiSq      0.5540
> >>> Exact      Pr >= ChiSq      0.5741
> >>>
> >>>
> >>>       Fisher's Exact Test
> >>> ----------------------------------
> >>> Cell (1,1) Frequency (F)       190
> >>> Left-sided Pr <= F          0.7416
> >>> Right-sided Pr >= F         0.2960
> >>>
> >>> Table Probability (P)       0.0376
> >>> Two-sided Pr <= P           0.5741
> >>>
> >>> Sample Size = 2090
> >>>
> >>> Thus providing all three is the correct answer.
> >>>
> >> Eh, we do. The interface is the same as that of R, and all three of
> >> {two-sided, less, greater} are extensively checked against R. It looks
> like
> >> you are reacting to only one statement Josef made to explain his
> >> interpretation of less/greater. Please check the actual commit and then
> >> comment if you see anything wrong.
> >>
> >> Ralf
> >>
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >>
> >> I have looked at it (again) and the comments still stand:
> >> A user should not have to read a statistical book and then the code to
> >> figure out what was actually implemented here.  So I do strongly object
> to
> >> Josef's statements as you just can not interpret Fisher's test in that
> way.
> >> Just look at how SAS presents the results as should give a huge clue
> that
> >> the two-sided tests is different than the other one-sided tests.
> >
> > Okay, I am pasting the entire docstring below. You seem to know a lot
> about
> > this, so can you please suggest wording for things to be added/changed?
> >
> > I have compared with the R doc
> > (http://rss.acs.unt.edu/Rdoc/library/stats/html/fisher.test.html), and
> > that's not much different as far as I can tell.
> >
> > Thanks a lot,
> > Ralf
>
> You are assuming a lot by saying that I even agree with  R documentation
> :-)
>

Didn't assume that.

> If you noticed, I never referred to it because it is not correct
> compared SAS and other sources given.
>
>
> >
> >
> >     Performs a Fisher exact test on a 2x2 contingency table.
> >
> >     Parameters
> >     ----------
> >     table : array_like of ints
> >         A 2x2 contingency table.  Elements should be non-negative
> integers.
> >     alternative : {'two-sided', 'less', 'greater'}, optional
> >         Which alternative hypothesis to the null hypothesis the test
> uses.
> >         Default is 'two-sided'.
> >
> >     Returns
> >     -------
> >     oddsratio : float
> >         This is prior odds ratio and not a posterior estimate.
> >     p_value : float
> >         P-value, the probability of obtaining a distribution at least as
> >         extreme as the one that was actually observed, assuming that the
> >         null hypothesis is true.
> >
> >     See Also
> >     --------
> >     chisquare : inexact alternative that can be used when sample sizes
> are
> >                 large enough.
> >
> >     Notes
> >     -----
> >     The calculated odds ratio is different from the one R uses. In R
> > language,
> >     this implementation returns the (more common) "unconditional Maximum
> >     Likelihood Estimate", while R uses the "conditional Maximum
> Likelihood
> >     Estimate".
> >
> >     For tables with large numbers the (inexact) `chisquare` test can also
> be
> >     used.
> >
> >     Examples
> >     --------
> >     Say we spend a few days counting whales and sharks in the Atlantic
> and
> >     Indian oceans. In the Atlantic ocean we find 6 whales and 1 shark, in
> > the
> >     Indian ocean 2 whales and 5 sharks. Then our contingency table is::
> >
> >                 Atlantic  Indian
> >         whales     8        2
> >         sharks     1        5
> >
> >     We use this table to find the p-value:
> >
> >     >>> oddsratio, pvalue = stats.fisher_exact([[8, 2], [1, 5]])
> >     >>> pvalue
> >     0.0349...
> >
> >     The probability that we would observe this or an even more imbalanced
> > ratio
> >     by chance is about 3.5%.  A commonly used significance level is 5%,
> if
> > we
> >     adopt that we can therefore conclude that our observed imbalance is
> >     statistically significant; whales prefer the Atlantic while sharks
> > prefer
> >     the Indian ocean.
> >
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
>
> So did two of the six whales give birth?
>
That docstring is incomplete and probably does not meet the Scipy
> documentation guidelines because not everything is explained.

Yes, which ones do? It's a lot better than it was, and more complete than
your average scipy docstring. Same for the tests. So I'm just going to be
satisfied with the bug fix and added functionality.

It is not a small amount of effort to clean this up to be technically
> correct -  0.0349 is not 'about 3.5%'.
>

Note the ellipsis? It's also not exactly 0.0349. So I fail to see the
problem. There are bigger fish to fry.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110613/29431e08/attachment.html>