[SciPy-Dev] Adding t-test with unequal variances to stats.py
Junkshops
junkshops at gmail.com
Sun Jun 3 02:13:33 EDT 2012
After some of the recent list mails advising checking results against R,
I checked this code against the R t.test() calls. I hadn't bothered
before since I had checked the t statistic and df values, and they were
right, and as I was just calling distribution.t.sf() like every other
t-test function assumed I should be OK. However, it turns out that the
statistics book I used as a source had the df function wrong, believe it
or not.
Hence I closed the old PR and opened a new one with a fix and more tests
here: https://github.com/scipy/scipy/pull/235
Apparently the thing to do is to beg for your PRs to go into the
upcoming version. I'm certainly willing to do so, but I was wondering if
the scipy devs respond better to other inveiglements - say, threats of
world destruction, under the table 'gifts', favors of a lascivious
nature, or cake. All proposals considered.
Cheers, Gavin
On 5/23/2012 12:38 AM, Junkshops wrote:
> Hi all,
>
> I've issued a pull request (http://github.com/scipy/scipy/pull/227)
> for a version of scipy/stats/stats.py with the following changes:
>
> 1) Adds a method for running a t-test with unequal or unknown
> population variances. ttest_ind assumes that population variances are
> equal.
> 2) Refactored common code in the 4 t-test methods into shared methods.
> 3) This section of code, which has variations in multiple methods,
> looks buggy to me:
>
> d = np.mean(a,axis) - np.mean(b,axis)
> svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
>
> t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2))
> t = np.where((d==0)*(svar==0), 1.0, t) #define t=0/0 = 0, identical means
>
> Surely if d=0, regardless of svar, t should be set to 0, not 1.
> Similarly, if svar = 0 then both variances are zero (assuming that
> each data set has at least 2 points - perhaps there should be a check
> for this?). In that case, if d==0 t should be zero. Otherwise, t
> should be +/-inf. Hence, (svar==0) is redundant.
>
> Accordingly, I've changed the lines in all functions to be the
> equivalent of
>
> t = np.where((d==0), 0.0, t)
>
> This handles the case where both d and svar are 0. The respective
> tests have also been changed.
>
> If I'm missing something here, please let me know.
>
> Thanks, Gavin
>
More information about the SciPy-Dev
mailing list