[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Sun Jun 10 07:29:44 EDT 2012

On Sun, Jun 10, 2012 at 11:33 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Sat, Jun 9, 2012 at 1:04 PM, Ralf Gommers <ralf.gommers at googlemail.com>
> wrote:
>>
>>
>>
>> On Thu, Jun 7, 2012 at 10:13 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>> On Thu, Jun 7, 2012 at 5:29 AM, Junkshops <junkshops at gmail.com> wrote:
>>> > - I'll merge the two 2 sample t-test functions
>>> > - add an uneq_var=False kw arg, setting to true will use the new code
>>>
>>> equal_var would be a better name, to avoid the double-negative.
>>>
>>> Would it be possible/desireable to make equal_var=False the default?
>>> Obviously this would require a deprecation period, but as semantic
>>> changes go it's relatively low risk -- anyone who misses the warnings
>>> etc. would just find one day that their t tests were producing more
>>> conservative/realistic values.
>>
>>
>> I'm not in favor of adding a deprecation warning for this. It's a minor
>> thing, and warnings are annoying - it does require the user to go and figure
>> out what changed. My preference would be to merge the current PR as is, and
>> add a new function that combines all four t-tests with an interface similar
>> to R. There the new default can be equal_var=False without annoying anyone.
>>
>>>
>>>
>>> (R defaults to doing the unequal variances test, and I have actually
>>> seen this fact used in their advocacy, as evidence for their branding
>>> as the tool for people who care about statistical rigor and
>>> soundness.)
>>>
>>> > - add an zoz=np.nan kw arg and a check that it's np.nan, 0 or 1.
>>> > Otherwise raise ValueError
>>>
>>> Let's please not add this "zoz=" feature. Adding features has a real
>>> cost (in terms of testing, writing docs, maintenance, and most
>>> importantly, the total time spent by all users reading about this
>>> pointless thing in the docs and being distracted by it). It's only
>>> benefit would be to smooth over this debate on the mailing list; I
>>> can't believe that any real user will actually care about this, ever.
>>
>>
>> Agreed.
>>
>> And +1 for 0/0 --> NaN.
>
>
> The PR is now merged, with 0/0 --> NaN, and equal_var=True.
>
> Two things left to decide:
> 1) Do we want to transition to equal_var is False?
> 2) Do we want to unify the current 3 t-test function into one, like R/SAS?
>
> My answer to 2) would be yes, which also allows to do 1) without generating
> a deprecation warning. IMO this would simplify the API quite a bit, making
> things more understandable also for non-statisticians. Comparing APIs, I
> find ours quite poor:
>
> R: ttest
> SAS: TTEST
> Matlab: ttest, ttest2
> SciPy: ttest_ind, ttest_1samp, ttest_rel
>
> The signature of a combined function ttest() would still be simple:
>
> def ttest(a, b=None, axis=0, popmean=0, equal_var=False)

You need at least an argument for paired versus non-paired as well. R
also has an argument to specify whether you want a two-tailed or
one-tailed test (alternative="two.sided"/"less"/"greater"), which I
guess is handy.

I do think the combined signature is a little confusing, since many of
the arguments only make sense for specific values of the other
arguments. popmean is only meaningful for 1 sample tests (and paired
tests, I guess, if we choose to interpret as the expected difference
in that case?), equal_var and paired are only meaningful for
two-sample tests, equal_var is only meaningful if paired is False.
OTOH, I don't know if anyone cares -- obviously the rest of the world
is getting by just fine with only 1 entry-point, and it's probably
easier to find in the docs that way.

-N