[SciPy-Dev] Adding t-test with unequal variances to stats.py
Junkshops
junkshops at gmail.com
Wed May 23 03:38:34 EDT 2012
Hi all,
I've issued a pull request (http://github.com/scipy/scipy/pull/227) for
a version of scipy/stats/stats.py with the following changes:
1) Adds a method for running a t-test with unequal or unknown population
variances. ttest_ind assumes that population variances are equal.
2) Refactored common code in the 4 t-test methods into shared methods.
3) This section of code, which has variations in multiple methods, looks
buggy to me:
d = np.mean(a,axis) - np.mean(b,axis)
svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2))
t = np.where((d==0)*(svar==0), 1.0, t) #define t=0/0 = 0, identical means
Surely if d=0, regardless of svar, t should be set to 0, not 1.
Similarly, if svar = 0 then both variances are zero (assuming that each
data set has at least 2 points - perhaps there should be a check for
this?). In that case, if d==0 t should be zero. Otherwise, t should be
+/-inf. Hence, (svar==0) is redundant.
Accordingly, I've changed the lines in all functions to be the equivalent of
t = np.where((d==0), 0.0, t)
This handles the case where both d and svar are 0. The respective tests
have also been changed.
If I'm missing something here, please let me know.
Thanks, Gavin
More information about the SciPy-Dev
mailing list