[SciPy-Dev] Adding t-test with unequal variances to stats.py

Junkshops junkshops at gmail.com
Wed May 23 03:38:34 EDT 2012


Hi all,

I've issued a pull request (http://github.com/scipy/scipy/pull/227) for 
a version of scipy/stats/stats.py with the following changes:

1) Adds a method for running a t-test with unequal or unknown population 
variances. ttest_ind assumes that population variances are equal.
2) Refactored common code in the 4 t-test methods into shared methods.
3) This section of code, which has variations in multiple methods, looks 
buggy to me:

d = np.mean(a,axis) - np.mean(b,axis)
svar = ((n1-1)*v1+(n2-1)*v2) / float(df)

t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2))
t = np.where((d==0)*(svar==0), 1.0, t) #define t=0/0 = 0, identical means

Surely if d=0, regardless of svar, t should be set to 0, not 1. 
Similarly, if svar = 0 then both variances are zero (assuming that each 
data set has at least 2 points - perhaps there should be a check for 
this?). In that case, if d==0 t should be zero. Otherwise, t should be 
+/-inf. Hence, (svar==0) is redundant.

Accordingly, I've changed the lines in all functions to be the equivalent of

t = np.where((d==0), 0.0, t)

This handles the case where both d and svar are 0. The respective tests 
have also been changed.

If I'm missing something here, please let me know.

Thanks, Gavin




More information about the SciPy-Dev mailing list