[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Sun Jun 10 11:49:42 EDT 2012

On 6/10/2012 4:29 AM, Nathaniel Smith wrote:
> On Sun, Jun 10, 2012 at 11:33 AM, Ralf Gommers
> <ralf.gommers at googlemail.com>  wrote:
>> *snip*
>>
>> The PR is now merged, with 0/0 -->  NaN, and equal_var=True.
>>
>> Two things left to decide:
>> 1) Do we want to transition to equal_var is False?
>> 2) Do we want to unify the current 3 t-test function into one, like R/SAS?
>>
>> My answer to 2) would be yes, which also allows to do 1) without generating
>> a deprecation warning. IMO this would simplify the API quite a bit, making
>> things more understandable also for non-statisticians. Comparing APIs, I
>> find ours quite poor:
>>
>> R: ttest
>> SAS: TTEST
>> Matlab: ttest, ttest2
>> SciPy: ttest_ind, ttest_1samp, ttest_rel
>>
>> The signature of a combined function ttest() would still be simple:
>>
>> def ttest(a, b=None, axis=0, popmean=0, equal_var=False)
> You need at least an argument for paired versus non-paired as well. R
> also has an argument to specify whether you want a two-tailed or
> one-tailed test (alternative="two.sided"/"less"/"greater"), which I
> guess is handy.
>
> I do think the combined signature is a little confusing, since many of
> the arguments only make sense for specific values of the other
> arguments. popmean is only meaningful for 1 sample tests (and paired
> tests, I guess, if we choose to interpret as the expected difference
> in that case?), equal_var and paired are only meaningful for
> two-sample tests, equal_var is only meaningful if paired is False.
> OTOH, I don't know if anyone cares -- obviously the rest of the world
> is getting by just fine with only 1 entry-point, and it's probably
> easier to find in the docs that way.
>
> -N

If we go with

def ttest(a, b=None, axis=0, popmean=0, equal_var=False, paired=False)

Should the function have a hierarchy of tests that are performed when 
the function input is ambiguous regarding the desired test or raise an 
exception? Also, I'd suggest the default popmean=None to avoid ambiguity 
between the 1 sample and Welch's tests.

An alternative:

def ttest(a, b=None, axis=0, popmean=None, test='2sample_uneq_var')

This doesn't have quite as many arguments, but breaks from the R-style 
parameters and the 'test' kw input strings might be unwieldy. However, 
the test the user wants to perform is unambiguous. Alternatively we 
could define constants rather than using strings, e.g. test = 
stats.2SAMPLE_UNEQ_VAR.

The popmean and test keyword args could even be combined, but that is 
almost certainly too confusing.

-g