[SciPy-User] scipy.stats one-sided two-sided less, greater, signed ?

Sun Jun 12 09:56:51 EDT 2011

On Sun, Jun 12, 2011 at 9:36 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Sun, Jun 12, 2011 at 5:20 AM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
>>
>>
>> On Wed, Jun 8, 2011 at 12:56 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Tue, Jun 7, 2011 at 10:37 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> > On Tue, Jun 7, 2011 at 4:40 PM, Ralf Gommers
>>> > <ralf.gommers at googlemail.com> wrote:
>>> >>
>>> >>
>>> >> On Mon, Jun 6, 2011 at 9:34 PM, <josef.pktd at gmail.com> wrote:
>>> >>>
>>> >>> On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey <bsouthey at gmail.com>
>>> >>> wrote:
>>> >>> > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote:
>>> >>> >> What should be the policy on one-sided versus two-sided?
>>> >>> > Yes :-)
>>> >>> >
>>> >>> >> The main reason right now for looking at this is
>>> >>> >> http://projects.scipy.org/scipy/ticket/1394 which specifies a
>>> >>> >> "one-sided" alternative and provides both lower and upper tail.
>>> >>> > That refers to the Fisher's test rather than the more 'traditional'
>>> >>> > one-sided tests. Each value of the Fisher's test has special
>>> >>> > meanings
>>> >>> > about the value or probability of the 'first cell' under the null
>>> >>> > hypothesis.  So it is necessary to provide those three values.
>>> >>> >
>>> >>> >> I would prefer that we follow the alternative patterns similar to R
>>> >>> >>
>>> >>> >> currently only kstest has    alternative : 'two_sided' (default),
>>> >>> >> 'less' or 'greater'
>>> >>> >> but this should be added to other tests where it makes sense
>>> >>> > I think that these Kolmogorov-Smirnov  tests are not the traditional
>>> >>> > meaning either. It is a little mind-boggling to try to think about
>>> >>> > cdfs!
>>> >>> >
>>> >>> >> R fisher.exact
>>> >>> >> """alternative        indicates the alternative hypothesis and must
>>> >>> >> be
>>> >>> >> one
>>> >>> >> of "two.sided", "greater" or "less". You can specify just the
>>> >>> >> initial
>>> >>> >> letter. Only used in the 2 by 2 case."""
>>> >>> >>
>>> >>> >> mannwhitneyu reports a one-sided test without actually specifying
>>> >>> >> which alternative is used  (I thought I remembered other cases like
>>> >>> >> this but don't find any right now)
>>> >>> >>
>>> >>> >> related:
>>> >>> >> in many cases in the two-sided tests the test statistic has a sign
>>> >>> >> that indicates in which tail the test-statistic falls.
>>> >>> >> This is useful in ttests for example, because the one-sided tests
>>> >>> >> can
>>> >>> >> be backed out from the two-sided tests. (With symmetric
>>> >>> >> distributions
>>> >>> >> one-sided p-value is just half of the two-sided pvalue)
>>> >>> >>
>>> >>> >> In the discussion of https://github.com/scipy/scipy/pull/8  I
>>> >>> >> argued
>>> >>> >> that this might mislead users to interpret a two-sided result as a
>>> >>> >> one-sided result. However, I doubt now that this is a strong
>>> >>> >> argument
>>> >>> >> against not reporting the signed test statistic.
>>> >>> > (I do not follow pull requests so is there a relevant ticket?)
>>> >>> >
>>> >>> >> After going through scipy.stats.stats, it looks like we always
>>> >>> >> report
>>> >>> >> the signed test statistic.
>>> >>> >>
>>> >>> >> The test statistic in ks_2samp is in all cases defined as a max
>>> >>> >> value
>>> >>> >> and doesn't have a sign in R either, so adding a sign there would
>>> >>> >> break with the standard definition.
>>> >>> >> one-sided option for ks_2samp would just require to find the
>>> >>> >> distribution of the test statistics D+, D-
>>> >>> >>
>>> >>> >> ---
>>> >>> >>
>>> >>> >> So my proposal for the general pattern (with exceptions for special
>>> >>> >> reasons) would be
>>> >>> >>
>>> >>> >> * add/offer alternative : 'two_sided' (default), 'less' or
>>> >>> >> 'greater'
>>> >>> >> http://projects.scipy.org/scipy/ticket/1394  for now,
>>> >>> >> and adjustments of existing tests in the future (adding the option
>>> >>> >> can
>>> >>> >> be mostly done in a backwards compatible way and for symmetric
>>> >>> >> distributions like ttest it's just a convenience)
>>> >>> >> mannwhitneyu seems to be the only "weird" one
>>> >>
>>> >> This would actually make the fisher_exact implementation more
>>> >> consistent,
>>> >> since only one p-value is returned in all cases. I just don't like the
>>> >> R
>>> >> naming much; alternative="greater" does not convey to me that this is a
>>> >> one-sided test using the upper tail. How about:
>>> >>     test : {"two-tailed", "lower-tail", "upper-tail"}
>>> >> with two-tailed the default?
>>>
>>> I think matlab uses (in general) larger and smaller, the advantage of
>>> less/smaller and greater/larger is that it directly refers to the
>>> alternative hypothesis, while the meaning in terms of tails is not
>>> always clear (in kstest and I guess some others the test statistics is
>>> just reversed and uses the same tail in both cases)
>>>
>>> so greater smaller is mostly "future proof" across tests, while
>>> reference to the tail can only be used where this is an unambiguous
>>> statement. but see below
>>>
>> I think I understand your terminology a bit better now, and consistency
>> across all tests is important. So I've updated the Fisher's exact patch to
>> use alternative={'two-sided', 'less', greater'} and sent a pull request:
>> https://github.com/scipy/scipy/pull/32
>>
>> Cheers,
>> Ralf
>>
>>>
>>>
>>> >>
>>> >> Ralf
>>> >>
>>> >>
>>> >>>
>>> >>> >>
>>> >>> >> * report signed test statistic for two-sided alternative (when a
>>> >>> >> signed test statistic exists):  which is the status quo in
>>> >>> >> stats.stats, but I didn't know that this is actually pretty
>>> >>> >> consistent
>>> >>> >> across tests.
>>> >>> >>
>>> >>> >> Opinions ?
>>> >>> >>
>>> >>> >> Josef
>>> >>> >> _______________________________________________
>>> >>> >> SciPy-User mailing list
>>> >>> >> SciPy-User at scipy.org
>>> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >>> > I think that there is some valid misunderstanding here (as I was in
>>> >>> > the
>>> >>> > same situation) regarding what is meant here. My understanding is
>>> >>> > that
>>> >>> > under a one-sided hypothesis, all the values of the null hypothesis
>>> >>> > only
>>> >>> > exist in one tail of the test distribution. In contrast the values
>>> >>> > of
>>> >>> > null distribution exist in both tails with a two-sided hypothesis.
>>> >>> > Yet
>>> >>> > that interpretation does not have the same meaning as the tails in
>>> >>> > the
>>> >>> > Fisher or Kolmogorov-Smirnov tests.
>>> >>>
>>> >>> The tests have a clear Null Hypothesis (equality) and Alternative
>>> >>> Hypothesis (not equal or directional, less or greater).
>>> >>> So the "alternative" should be clearly specified in the function
>>> >>> argument, as in R.
>>> >>>
>>> >>> Whether this corresponds to left and right tails of the distribution
>>> >>> is an "implementation detail" which holds for ttests but not for
>>> >>> kstest/ks_2samp.
>>> >>>
>>> >>> kstest/ks2sample   H0: cdf1 == cdf2  and H1:  cdf1 != cdf2 or H1:
>>> >>> cdf1 < cdf2 or H1:  cdf1 > cdf2
>>> >>> (looks similar to comparing two survival curves in Kaplan-Meier ?)
>>> >>>
>>> >>> fisher_exact (2 by 2)  H0: odds-ratio == 1 and H1: odds-ratio != 1 or
>>> >>> H1: odds-ratio < 1 or H1: odds-ratio > 1
>>> >>>
>>> >>> I know the kolmogorov-smirnov tests, but for fisher exact and
>>> >>> contingency tables I rely on R
>>> >>>
>>> >>> from R-help:
>>> >>> For 2 by 2 tables, the null of conditional independence is equivalent
>>> >>> to the hypothesis that the odds ratio equals one. <...> The
>>> >>> alternative for a one-sided test is based on the odds ratio, so
>>> >>> alternative = "greater" is a test of the odds ratio being bigger than
>>> >>> or.
>>> >>> Two-sided tests are based on the probabilities of the tables, and take
>>> >>> as ‘more extreme’ all tables with probabilities less than or equal to
>>> >>> that of the observed table, the p-value being the sum of such
>>> >>> probabilities.
>>> >>>
>>> >>> Josef
>>> >>>
>>> >>>
>>> >>> >
>>> >>> > I never paid much attention to the frequency based tests but it does
>>> >>> > not
>>> >>> > surprise if there are no one-sided tests. Most are rank-based so it
>>> >>> > is
>>> >>> > rather hard to do in a simply manner - actually I am not even sure
>>> >>> > how
>>> >>> > to use a permutation test.
>>> >>> >
>>> >>> > Bruce
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > SciPy-User mailing list
>>> >>> > SciPy-User at scipy.org
>>> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >>> >
>>> >>> _______________________________________________
>>> >>> SciPy-User mailing list
>>> >>> SciPy-User at scipy.org
>>> >>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> SciPy-User mailing list
>>> >> SciPy-User at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >>
>>> >>
>>> >
>>> > But that is NOT the correct interpretation  here!
>>> > I tried to explain to you that this is the not the usual idea
>>> > one-sided vs two-sided tests.
>>> > For example:
>>> > http://www.msu.edu/~fuw/teaching/Fu_ch10_2_categorical.ppt
>>> > "The test holds the marginal totals fixed and computes the
>>> > hypergeometric probability that n11 is at least as large as the
>>> > observed value"
>>>
>>> this still sounds like a less/greater test to me
>>>
>>>
>>> > "The output consists of three p-values:
>>> > Left: Use this when the alternative to independence is that there is
>>> > negative association between the variables.  That is, the observations
>>> > tend to lie in lower left and upper right.
>>> > Right: Use this when the alternative to independence is that there is
>>> > positive association between the variables. That is, the observations
>>> > tend to lie in upper left and lower right.
>>> > 2-Tail: Use this when there is no prior alternative.
>>> > "
>>> > There is also the book "Categorical data analysis: using the SAS
>>> > system  By Maura E. Stokes, Charles S. Davis, Gary G. Koch" that came
>>> > up via Google that also refers to the n11 cell.
>>> >
>>> > http://www.langsrud.com/fisher.htm
>>>
>>> I was trying to read the Agresti paper referenced there but it has too
>>> much detail to get through in 15 minutes :)
>>>
>>> > "The output consists of three p-values:
>>> >
>>> >    Left: Use this when the alternative to independence is that there
>>> > is negative association between the variables.
>>> >    That is, the observations tend to lie in lower left and upper right.
>>> >    Right: Use this when the alternative to independence is that there
>>> > is positive association between the variables.
>>> >    That is, the observations tend to lie in upper left and lower right.
>>> >    2-Tail: Use this when there is no prior alternative.
>>> >
>>> > NOTE: Decide to use Left, Right or 2-Tail before collecting (or
>>> > looking at) the data."
>>> >
>>> > But you will get a different p-value if you switch rows and columns
>>> > because of the dependence on the n11 cell. If you do that then the
>>> > p-values switch between left and right sides as these now refer to
>>> > different hypotheses regarding that first cell.
>>>
>>> switching row and columns doesn't change the p-value in R
>>> reversing columns changes the definition of less and greater, reverses
>>> them
>>>
>>> The problem with 2 by 2 contingency tables with given marginals, i.e.
>>> row and column totals, is that we only have one free entry. Any test
>>> on one entry, e.g. element 0,0, pins down all the other ones and
>>> (many) tests then become equivalent.
>>>
>>>
>>> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000658.htm
>>> some math got lost
>>> """
>>> For <2 by 2> tables, one-sided -values for Fisher’s exact test are
>>> defined in terms of the frequency of the cell in the first row and
>>> first column of the table, the (1,1) cell. Denoting the observed (1,1)
>>> cell frequency by , the left-sided -value for Fisher’s exact test is
>>> the probability that the (1,1) cell frequency is less than or equal to
>>> . For the left-sided -value, the set includes those tables with a
>>> (1,1) cell frequency less than or equal to . A small left-sided -value
>>> supports the alternative hypothesis that the probability of an
>>> observation being in the first cell is actually less than expected
>>> under the null hypothesis of independent row and column variables.
>>>
>>> Similarly, for a right-sided alternative hypothesis, is the set of
>>> tables where the frequency of the (1,1) cell is greater than or equal
>>> to that in the observed table. A small right-sided -value supports the
>>> alternative that the probability of the first cell is actually greater
>>> than that expected under the null hypothesis.
>>>
>>> Because the (1,1) cell frequency completely determines the table when
>>> the marginal row and column sums are fixed, these one-sided
>>> alternatives can be stated equivalently in terms of other cell
>>> probabilities or ratios of cell probabilities. The left-sided
>>> alternative is equivalent to an odds ratio less than 1, where the odds
>>> ratio equals (). Additionally, the left-sided alternative is
>>> equivalent to the column 1 risk for row 1 being less than the column 1
>>> risk for row 2, . Similarly, the right-sided alternative is equivalent
>>> to the column 1 risk for row 1 being greater than the column 1 risk
>>> for row 2, . See Agresti (2007) for details.
>>> R C Tables
>>> """
>>>
>>> I'm not a user of Fisher's exact test (and I have a hard time keeping
>>> the different statements straight), so if left/right or lower/upper
>>> makes more sense to users, then I don't complain.
>>>
>>> To me they are all just independence tests with possible one-sided
>>> alternatives that one distribution dominates the other. (with the same
>>> pattern as ks_2samp or ttest_2samp)
>>>
>>> Josef
>>>
>>> >
>>> >
>>> > Bruce
>>> > _______________________________________________
>>> > SciPy-User mailing list
>>> > SciPy-User at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>>> >
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
> This is just wrong and plain ignorant! Please read the references and
> stats books about what the tails actually mean!
>
> You really need all three tests because these have different meanings
> that you do not know in advance which you need.

Sorry, but I'm perfectly happy to follow R and SAS in this.

Josef

>
> Bruce
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>