[SciPy-dev] changes to stats t-tests

Sat Dec 20 14:33:19 EST 2008

I finally looked at the t-test in more detail and I would like to make
three changes to the ttests

1) ttest_1samp: add axis argument, it is the only ttest and similar
function without an axis argument. Currently the retrun for multi
dimensional arrays is wrong, the array is flattened for the
calculation of mean and variance, but the number of observations is
taken along axis 0 (Test suite only check for 1-dim case).
One problem is the default axis, the usual default axis in statistics,
scipy.stats is zero, but it is not always consistently used
ttest_rel(a,b,axis=None), ttest_ind(a, b, axis=0), sem(a, axis=0)

2) return for t statistic: return number instead of 0-dimensional
array, no changes for higher dimensions
     Same for all three tests.

current return (t-statistic, p-value)
(array(0.81248591389165681), 0.41846234511362179)
proposed return
(0.81248591389165681, 0.41846234511362179)

3) clean up handling of zero division problem: current zero division
problem doesn't make sense, it return a t-statistic  arbitrarily set
to one. This only applies to cases with zero variance
    proposed change, return either inf (if numerator different from
zero) or zero (if numerator is also zero.
     Same for all three tests.

current:
>>> print st.ttest_rel( [0,0], [1,1],0)    #mean is different
(array(1.0), 0.49999999999999956)
>>> print st.ttest_rel( [0,0], [0,0],0)   #mean is the same
(array(1.0), 0.49999999999999956)

proposed:
>>> print ttest_rel( [0,0], [1,1],0)
(-1.#INF, 0.0)
>>> print ttest_rel( [0,0], [0,0],0)
(0.0, 1.0)

Since, we are just before release, I don't know if I can make these
changes now or after the release. Most of it fixes incorrect returns,
and I'm writing the tests for all cases.

Josef