[SciPy-User] Bug t-test for identical means with no variance?

Fri Jul 8 18:51:56 EDT 2011

On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and
> ttest_1samp) in the case of identical means and no variance.
>
> Same means, no variance
>
> d1 = np.ones(10)
> d2 = np.array([1,1.])
> stats.ttest_ind(d1,d2)
> (1.0, 0.34089313230206009)
>
> Different means, no variance
>
> d1 = np.array([ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.])
> d2 = np.array([ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.])
> stats.ttest_ind(d1,d2)
> (inf, 0.0)
>
> The first result doesn't make sense. In the code there are conflicting
> notes (with each other and what the code does) for catching this
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873
> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963
> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044
>
> I think defining t = 0/0 to be 0 is the least wrong thing to do, but
> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending
> on sample sizes. Is there an accepted definition for this case? Does
> returning (nan, 1.0) make more sense?
>
> Skipper
>
> [1] http://projects.scipy.org/scipy/ticket/1475

scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the
original change.

If anyone finds a justification for the 0/0 case, ....

Josef

> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>