[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Junkshops junkshops at gmail.com
Wed Jun 6 17:18:01 EDT 2012


Hi Nathaniel,

At the outset, I'll just say that if the consensus is that we should 
return NaN, I'll accept that. I'll still try and argue my case though.

> My R seems to throw an exception whenever the variance is zero
> (regardless of the mean difference), not return NaN:
Sorry, yes, that's correct.

> Like any parametric test, the t-test only makes sense under some kind
> of (at least approximate) assumptions about the data generating
> process. When the sample variance is 0, then those assumptions are
> clearly violated,
So this seems similar to argument J2, and I still don't understand it. 
Let's say we assume our population data is normally distributed and we 
take three samples from the population and get [1,1,1]. How does that 
prove our assumption is incorrect? It's certainly possible to pull the 
same number three times from a normal distribution.

> and it doesn't seem appropriate to me to start
> making up numbers according to some other rule that we hope might give
> some sort-of appropriate result ("In the face of ambiguity, refuse the
> temptation to guess."). So I actually like the R/Matlab option of
> throwing an exception or returning NaN.

Well, we're not making up numbers here - we absolutely know the means 
are the same. Hence p  = 1 and t = 0.

-g



More information about the SciPy-Dev mailing list