[SciPy-User] ttest_rel with unequal groups

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Nov 8 06:04:08 EST 2013


On Thu, Nov 7, 2013 at 10:51 PM, Horea Christian <h.chr at mail.ru> wrote:

> I managed to get tteste_rel to work by replacinf my missing values
> either with NaN or with False . I am yet to determine whether or not
> that distorts my data (could be that d = (a - b).astype(np.float64) is
> zero for entries where one value is false, or that false is read as
> zero and d = (a - b).astype(np.float64) will be -b[x] wherever a[x] is
> false...)
>

It will distort your results, since it is treated as non-missing
observation which affects both the estimated difference and the number of
observations, the degrees of freedom for the p-value.



>
> In any case, I am a bit uncertain as to the usage of this method - am I
> supposed to pass it a 1d array? or a 2d array? I am thinking 2d shouled
> be mandatory because otherwise the method can't tell which groups
> measures are related. I tried doing that (my array being
> N(participants) x N(measurements) ) but that gave me a 2d output - that
> can't be right, I just want one t and one p value, not a multidim array.
>
> So, how do I use this? (The docs are not very informative on what
> happens to 2d vs 1d inputs).
>

you need to give it two arrays, the difference between the arrays is
calculated internally.

If the arrays are 2d, then the test is calculated for each column (or along
axis) of the broadcasted difference.
These are separate tests for each column that give the same result as
looping over the columns.

If one array has only one column (for example a benchmark treatment), the
other array has several columns, then we get ttest_rel for each comparison
of a second column to the first array.

The result will be as many tstatistics and pvalues as there are columns.
There is no multiple testing correction for the pvalues

>>> outcome = np.random.randn(20, 4) + [0, 0, 1, 2]
>>> from scipy import stats
>>> stats.ttest_rel(outcome[:, :1], outcome[:, 1:])
(array([-1.60220806, -3.13556782, -7.1567637 ]), array([  1.25604679e-01,
5.44534856e-03,   8.41006537e-07]))

>>> [stats.ttest_rel(outcome[:, 0], outcome[:, k]) for k in range(1, 4)]
[(array(-1.6022080647700057), 0.12560467940402195),
(array(-3.135567822455234), 0.005445348556616313),
(array(-7.156763700790868), 8.4100653703218436e-07)]


aside: I think the following is doing the right thing for testing the joint
hypothesis

>>> diff = outcome[:, 1:] - outcome[:, :1]
>>> stats.f_oneway(*diff.T)
(10.606594036595835, 0.00012132595252973279)


Josef



>
> Cheers,
> christian
>
> On Do 07 Nov 2013 10:52:57 CET, Hjalmar Turesson wrote:
> > Hi,
> >
> > If I'm not confused, ttest_rel is a paired samples ttest
> > (http://en.wikipedia.org/wiki/Paired_difference_test), and thus
> > requires that all samples are paired (this does not depend on the
> > particular scipy implementation).
> > If occasional samples in a group are missing, and you still want
> > perform the paired ttest, then you will probably have to exclude the
> > corresponding sample in the other 2nd, or generate pseudo-values to
> > replace the missing values in the 1st group. Alternatively, you can
> > use ttest_ind
> > (http://en.wikipedia.org/wiki/Ttest#Independent_samples), which
> > doesn't require exactly the same number of samples in the two groups.
> >
> >
> > On Thu, Nov 7, 2013 at 2:18 AM, Horea Christian <h.chr at mail.ru
> > <mailto:h.chr at mail.ru>> wrote:
> >
> >     Hey there! I would like to use the ttest_rel function to compare
> >     reaction times for two conditions tested over 10 participants. We
> have
> >     done 100 trials per participant, but some of them had errors and were
> >     excluded. For instance for prticipants 1 and 2 I have condition1: 95
> >     trials, condition2: 100 trials AND condition1:100 trials and
> >     condition2:
> >     99 trials.
> >
> >     depending on whether or not I transpose my dataframe I get a
> complaint
> >     either at
> >
> >          if a.shape[axis] != b.shape[axis]:
> >              raise ValueError('unequal length arrays')
> >
> >     or at
> >
> >          d = (a - b).astype(np.float64)
> >
> >     .
> >
> >
> >     What can I do about this? I found it surprising that it doesn't "just
> >     work" since in most experiments it is expected for some of the
> >     measurements to fail.
> >
> >     Many Thanks!
> >     Christian
> >
> >     --
> >     Horea Christian
> >     http://chymera.eu
> >
> >     _______________________________________________
> >     SciPy-User mailing list
> >     SciPy-User at scipy.org <mailto:SciPy-User at scipy.org>
> >     http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
>
> --
> Horea Christian
> http://chymera.eu
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20131108/79397837/attachment.html>


More information about the SciPy-User mailing list