[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Jun 6 21:10:00 EDT 2012


On Wed, Jun 6, 2012 at 8:50 PM, Junkshops <junkshops at gmail.com> wrote:
> OK, I give! NaN it is.
>
> That being said:
>
> Skipper said:
>> This doesn't seem to be of all that much practical importance. In what
>> situation do you expect this to really matter?
> Eh, you're probably right. I tend to enjoy arguing back and forth (as
> long as it doesn't get heated) and sometimes pick pointless battles.
> Plus sometimes you learn a lot, and I'm not much of a statistician, so
> there's lots of opportunities for such.
>
> If you're pulling data from a discrete distribution it could happen
> though (unless I'm mistaken).

It can happen in the discrete distribution case, but in this case this
doesn't have zero probability and the calculations can follow the
standard theory (no 0/0)


>
> Nathan said:
>> Well, no, it isn't possible really -- taking n IID samples from a
>> normal distribution and getting exactly the same number twice is an
>> event that has probability zero.
> Would you mind humoring me and explaining why this is true? It seems
> counter intuitive that getting the same sample twice from independent
> random draws is impossible.

You have a continuum of numbers (an uncountable infinite number of
possibilities). Each point has zero probability of being selected.
(But we have a density that points in a neighborhood dx are selected.)
You can select a first point, but then the second point has to be the
same up to an infinite number of decimals. The probability that all
decimals are the same is zero.
With floating point doubles, someone might be able to calculate what
the tiny discrete probability is.

>
> OK, so what next? Shall I make the changes and push again? Or should we
> wait a bit and see if anyone else weighs in?
>
> If a push is warranted the other issue is the style of the 4 t-tests (1
> sample, paired, 2 sample equal variances, 2 sample unequal variances):
>
> A. 4 separate functions (as in the PR)
> B. 1 combined function, select test via keyword arg, keep old function
> stubs for backward compatibility
> C. Functions for 1 sample, paired, 2 sample with keyword selection of
> equal vs unequal variances.
>
> I don't have strong feelings either way, but I think C is a little weird
> - should be all or none IMO. We could also go with A for now and change
> to B after the release; I think it's more important that the
> functionality gets in than consolidation of functions.

I thought C is the most obvious solution, paired versus unpaired are
two different sampling schemes.
Assuming common versus heterogenous variances is just a detail
compared to that.

Users might want to test for the variance heteroscedasticity
assumption, but they will usually know whether the sampling scheme is
paired (repeated) or independent(unpaired).

Josef

>
> Cheers, Gavin
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev



More information about the SciPy-Dev mailing list