[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Ralf Gommers ralf.gommers at googlemail.com
Tue Jun 7 17:40:15 EDT 2011


On Mon, Jun 6, 2011 at 9:34 PM, <josef.pktd at gmail.com> wrote:

> On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote:
> >> What should be the policy on one-sided versus two-sided?
> > Yes :-)
> >
> >> The main reason right now for looking at this is
> >> http://projects.scipy.org/scipy/ticket/1394 which specifies a
> >> "one-sided" alternative and provides both lower and upper tail.
> > That refers to the Fisher's test rather than the more 'traditional'
> > one-sided tests. Each value of the Fisher's test has special meanings
> > about the value or probability of the 'first cell' under the null
> > hypothesis.  So it is necessary to provide those three values.
> >
> >> I would prefer that we follow the alternative patterns similar to R
> >>
> >> currently only kstest has    alternative : 'two_sided' (default),
> >> 'less' or 'greater'
> >> but this should be added to other tests where it makes sense
> > I think that these Kolmogorov-Smirnov  tests are not the traditional
> > meaning either. It is a little mind-boggling to try to think about cdfs!
> >
> >> R fisher.exact
> >> """alternative        indicates the alternative hypothesis and must be
> one
> >> of "two.sided", "greater" or "less". You can specify just the initial
> >> letter. Only used in the 2 by 2 case."""
> >>
> >> mannwhitneyu reports a one-sided test without actually specifying
> >> which alternative is used  (I thought I remembered other cases like
> >> this but don't find any right now)
> >>
> >> related:
> >> in many cases in the two-sided tests the test statistic has a sign
> >> that indicates in which tail the test-statistic falls.
> >> This is useful in ttests for example, because the one-sided tests can
> >> be backed out from the two-sided tests. (With symmetric distributions
> >> one-sided p-value is just half of the two-sided pvalue)
> >>
> >> In the discussion of https://github.com/scipy/scipy/pull/8  I argued
> >> that this might mislead users to interpret a two-sided result as a
> >> one-sided result. However, I doubt now that this is a strong argument
> >> against not reporting the signed test statistic.
> > (I do not follow pull requests so is there a relevant ticket?)
> >
> >> After going through scipy.stats.stats, it looks like we always report
> >> the signed test statistic.
> >>
> >> The test statistic in ks_2samp is in all cases defined as a max value
> >> and doesn't have a sign in R either, so adding a sign there would
> >> break with the standard definition.
> >> one-sided option for ks_2samp would just require to find the
> >> distribution of the test statistics D+, D-
> >>
> >> ---
> >>
> >> So my proposal for the general pattern (with exceptions for special
> >> reasons) would be
> >>
> >> * add/offer alternative : 'two_sided' (default), 'less' or 'greater'
> >> http://projects.scipy.org/scipy/ticket/1394  for now,
> >> and adjustments of existing tests in the future (adding the option can
> >> be mostly done in a backwards compatible way and for symmetric
> >> distributions like ttest it's just a convenience)
> >> mannwhitneyu seems to be the only "weird" one
>

This would actually make the fisher_exact implementation more consistent,
since only one p-value is returned in all cases. I just don't like the R
naming much; alternative="greater" does not convey to me that this is a
one-sided test using the upper tail. How about:
    test : {"two-tailed", "lower-tail", "upper-tail"}
with two-tailed the default?

Ralf



> >>
> >> * report signed test statistic for two-sided alternative (when a
> >> signed test statistic exists):  which is the status quo in
> >> stats.stats, but I didn't know that this is actually pretty consistent
> >> across tests.
> >>
> >> Opinions ?
> >>
> >> Josef
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> > I think that there is some valid misunderstanding here (as I was in the
> > same situation) regarding what is meant here. My understanding is that
> > under a one-sided hypothesis, all the values of the null hypothesis only
> > exist in one tail of the test distribution. In contrast the values of
> > null distribution exist in both tails with a two-sided hypothesis. Yet
> > that interpretation does not have the same meaning as the tails in the
> > Fisher or Kolmogorov-Smirnov tests.
>
> The tests have a clear Null Hypothesis (equality) and Alternative
> Hypothesis (not equal or directional, less or greater).
> So the "alternative" should be clearly specified in the function
> argument, as in R.
>
> Whether this corresponds to left and right tails of the distribution
> is an "implementation detail" which holds for ttests but not for
> kstest/ks_2samp.
>
> kstest/ks2sample   H0: cdf1 == cdf2  and H1:  cdf1 != cdf2 or H1:
> cdf1 < cdf2 or H1:  cdf1 > cdf2
> (looks similar to comparing two survival curves in Kaplan-Meier ?)
>
> fisher_exact (2 by 2)  H0: odds-ratio == 1 and H1: odds-ratio != 1 or
> H1: odds-ratio < 1 or H1: odds-ratio > 1
>
> I know the kolmogorov-smirnov tests, but for fisher exact and
> contingency tables I rely on R
>
> from R-help:
> For 2 by 2 tables, the null of conditional independence is equivalent
> to the hypothesis that the odds ratio equals one. <...> The
> alternative for a one-sided test is based on the odds ratio, so
> alternative = "greater" is a test of the odds ratio being bigger than
> or.
> Two-sided tests are based on the probabilities of the tables, and take
> as ‘more extreme’ all tables with probabilities less than or equal to
> that of the observed table, the p-value being the sum of such
> probabilities.
>
> Josef
>
>
> >
> > I never paid much attention to the frequency based tests but it does not
> > surprise if there are no one-sided tests. Most are rank-based so it is
> > rather hard to do in a simply manner - actually I am not even sure how
> > to use a permutation test.
> >
> > Bruce
> >
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110607/546b11dd/attachment.html>


More information about the SciPy-User mailing list