[SciPy-User] Bug t-test for identical means with no variance?
josef.pktd at gmail.com
josef.pktd at gmail.com
Fri Jul 8 18:51:56 EDT 2011
On Fri, Jul 8, 2011 at 6:41 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> A ticket was filed [1] for ttest_ind (same issue with ttest_rel and
> ttest_1samp) in the case of identical means and no variance.
>
> Same means, no variance
>
> d1 = np.ones(10)
> d2 = np.array([1,1.])
> stats.ttest_ind(d1,d2)
> (1.0, 0.34089313230206009)
>
> Different means, no variance
>
> d1 = np.array([ 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.])
> d2 = np.array([ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])
> stats.ttest_ind(d1,d2)
> (inf, 0.0)
>
> The first result doesn't make sense. In the code there are conflicting
> notes (with each other and what the code does) for catching this
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2873
> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L2963
> https://github.com/scipy/scipy/blob/master/scipy/stats/stats.py#L3044
>
> I think defining t = 0/0 to be 0 is the least wrong thing to do, but
> certainly not t = 0/0 as 1, which gives an arbitrary p-value depending
> on sample sizes. Is there an accepted definition for this case? Does
> returning (nan, 1.0) make more sense?
>
> Skipper
>
> [1] http://projects.scipy.org/scipy/ticket/1475
scipy dev mailing list "changes to stats t-tests" Dec 20, 2008 for the
original change.
If anyone finds a justification for the 0/0 case, ....
Josef
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list