[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Sun Jun 10 06:33:27 EDT 2012

On Sat, Jun 9, 2012 at 1:04 PM, Ralf Gommers <ralf.gommers at googlemail.com>wrote:

>
>
> On Thu, Jun 7, 2012 at 10:13 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Thu, Jun 7, 2012 at 5:29 AM, Junkshops <junkshops at gmail.com> wrote:
>> > - I'll merge the two 2 sample t-test functions
>> > - add an uneq_var=False kw arg, setting to true will use the new code
>>
>> equal_var would be a better name, to avoid the double-negative.
>>
>> Would it be possible/desireable to make equal_var=False the default?
>> Obviously this would require a deprecation period, but as semantic
>> changes go it's relatively low risk -- anyone who misses the warnings
>> etc. would just find one day that their t tests were producing more
>> conservative/realistic values.
>>
>
> I'm not in favor of adding a deprecation warning for this. It's a minor
> thing, and warnings are annoying - it does require the user to go and
> figure out what changed. My preference would be to merge the current PR as
> is, and add a new function that combines all four t-tests with an interface
> similar to R. There the new default can be equal_var=False without annoying
> anyone.
>
>
>>
>> (R defaults to doing the unequal variances test, and I have actually
>> seen this fact used in their advocacy, as evidence for their branding
>> as the tool for people who care about statistical rigor and
>> soundness.)
>>
>> > - add an zoz=np.nan kw arg and a check that it's np.nan, 0 or 1.
>> > Otherwise raise ValueError
>>
>> Let's please not add this "zoz=" feature. Adding features has a real
>> cost (in terms of testing, writing docs, maintenance, and most
>> importantly, the total time spent by all users reading about this
>> pointless thing in the docs and being distracted by it). It's only
>> benefit would be to smooth over this debate on the mailing list; I
>> can't believe that any real user will actually care about this, ever.
>>
>
> Agreed.
>
> And +1 for 0/0 --> NaN.
>

The PR is now merged, with 0/0 --> NaN, and equal_var=True.

Two things left to decide:
1) Do we want to transition to equal_var is False?
2) Do we want to unify the current 3 t-test function into one, like R/SAS?

My answer to 2) would be yes, which also allows to do 1) without generating
a deprecation warning. IMO this would simplify the API quite a bit, making
things more understandable also for non-statisticians. Comparing APIs, I
find ours quite poor:

R: ttest
SAS: TTEST
Matlab: ttest, ttest2
SciPy: ttest_ind, ttest_1samp, ttest_rel

The signature of a combined function ttest() would still be simple:

def ttest(a, b=None, axis=0, popmean=0, equal_var=False)

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120610/fe900531/attachment.html>