[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Sun Jun 10 23:09:49 EDT 2012

On Sun, Jun 10, 2012 at 11:49 AM, Junkshops <junkshops at gmail.com> wrote:
>
>
> On 6/10/2012 4:29 AM, Nathaniel Smith wrote:
>> On Sun, Jun 10, 2012 at 11:33 AM, Ralf Gommers
>> <ralf.gommers at googlemail.com>  wrote:
>>> *snip*
>>>
>>> The PR is now merged, with 0/0 -->  NaN, and equal_var=True.
>>>
>>> Two things left to decide:
>>> 1) Do we want to transition to equal_var is False?
>>> 2) Do we want to unify the current 3 t-test function into one, like R/SAS?
>>>
>>> My answer to 2) would be yes, which also allows to do 1) without generating
>>> a deprecation warning. IMO this would simplify the API quite a bit, making
>>> things more understandable also for non-statisticians. Comparing APIs, I
>>> find ours quite poor:
>>>
>>> R: ttest
>>> SAS: TTEST
>>> Matlab: ttest, ttest2
>>> SciPy: ttest_ind, ttest_1samp, ttest_rel
>>>
>>> The signature of a combined function ttest() would still be simple:
>>>
>>> def ttest(a, b=None, axis=0, popmean=0, equal_var=False)
>> You need at least an argument for paired versus non-paired as well. R
>> also has an argument to specify whether you want a two-tailed or
>> one-tailed test (alternative="two.sided"/"less"/"greater"), which I
>> guess is handy.
>>
>> I do think the combined signature is a little confusing, since many of
>> the arguments only make sense for specific values of the other
>> arguments. popmean is only meaningful for 1 sample tests (and paired
>> tests, I guess, if we choose to interpret as the expected difference
>> in that case?), equal_var and paired are only meaningful for
>> two-sample tests, equal_var is only meaningful if paired is False.
>> OTOH, I don't know if anyone cares -- obviously the rest of the world
>> is getting by just fine with only 1 entry-point, and it's probably
>> easier to find in the docs that way.
>>
>> -N
>
> If we go with
>
> def ttest(a, b=None, axis=0, popmean=0, equal_var=False, paired=False)

def ttest(a, b=None, value=0, paired=False, equal_var=False, axis=0)

I wanted to suggest merging b and popmean, but I think the better
solution is to allow for a more general version of the ttest, where
the comparison is not necessarily 0.

null hypothesis is
mean(a) = value            #ttest_1samp
mean(a) - mean(b) = value  #ttest_ind
mean(a - b) = value         #ttest_rel

The only option that would then be ignored in two cases is equal_var,
and paired would be irrelevant if b is None.

(I didn't see an equal_var=False option for paired ttest in the SAS manual. ?)

>
> Should the function have a hierarchy of tests that are performed when
> the function input is ambiguous regarding the desired test or raise an
> exception? Also, I'd suggest the default popmean=None to avoid ambiguity
> between the 1 sample and Welch's tests.
>
> An alternative:
>
> def ttest(a, b=None, axis=0, popmean=None, test='2sample_uneq_var')
>
> This doesn't have quite as many arguments, but breaks from the R-style
> parameters and the 'test' kw input strings might be unwieldy. However,
> the test the user wants to perform is unambiguous. Alternatively we
> could define constants rather than using strings, e.g. test =
> stats.2SAMPLE_UNEQ_VAR.
>
> The popmean and test keyword args could even be combined, but that is
> almost certainly too confusing.

There is no gain in usability in this case.
I'd rather choose among three functions (that are next to each other
with tab completion), than specifying the same three functions as a
keyword argument.

I find the current split into 3 functions for 3 different sampling
cases quite natural.

Josef

>
> -g
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev