[SciPy-dev] scipy.stats.sem is wrong

David Huard david.huard at gmail.com
Wed Nov 15 09:21:56 EST 2006


Roman,

A couple of months ago was Statistical Review Month, where users and devs
were asked to look at functions froms stats, weed out the duplicates, add
docstrings, etc. If I remember correctly, at the end of the month,
unreviewed functions were to be stored in the sandbox (a good incentive if
you ask me).  The work is started (thanks to Robert), but it's not over. If
you want to have a go at it, look at the scipy trac site, there are dozens
of open tickets for statistical functions. That's also the place to submit
patches.

http://projects.scipy.org/scipy/scipy/report/8

Regards,
David




2006/11/15, Roman Bertle <bertle at smoerz.org>:
>
> Hello,
>
> i think scipy.stats.sem is wrong. It gives the same result as
> scipy.stats.stderr (using N-1 and not N), whereas scipy.stats.tsem
> uses N and gives the correct result. I have attached a patch correcting
> this.
>
> Related to this, i wonder why there are so many related functions in
> scipy.stats doing the same, but in a slightly different way. E.g. there
> are nanstd, std, tstd, some use numpy.std, some not, some take an axis
> argument, some not. And there is samplestd and samplevar, but sampleerr
> is called sem instead. Shouldn't these functions be unified somehow?
>
> Regards,
>
> Roman
> -------------------------
> diff -rud python-scipy-0.5.1/Lib/stats/stats.py python-scipy-0.5.1-new
> /Lib/stats/stats.py
> --- python-scipy-0.5.1/Lib/stats/stats.py       2006-08-29 11:58:
> 37.000000000 +0200
> +++ python-scipy-0.5.1-new/Lib/stats/stats.py   2006-11-15 12:18:
> 23.000000000 +0100
> @@ -1166,9 +1166,7 @@
> integer (the axis over which to operate)
> """
>      a, axis = _chk_asarray(a, axis)
> -    n = a.shape[axis]
> -    s = samplestd(a,axis) / sqrt(n-1)
> -    return s
> +    return samplestd(a,axis) / float(sqrt(a.shape[axis]))
>
>
> def z(a, score):
> diff -rud python-scipy-0.5.1/Lib/stats/tests/test_stats.py
> python-scipy-0.5.1-new/Lib/stats/tests/test_stats.py
> --- python-scipy-0.5.1/Lib/stats/tests/test_stats.py    2006-08-29 11:58:
> 37.000000000 +0200
> +++ python-scipy-0.5.1-new/Lib/stats/tests/test_stats.py        2006-11-15
> 12:11:29.000000000 +0100
> @@ -740,15 +740,16 @@
> ##        assert_approx_equal(y,0.775177399)
>          y = scipy.stats.stderr(self.testcase)
>          assert_approx_equal(y,0.6454972244)
> +
>      def check_sem(self):
>          """
>          this is not in R, so used
> -        sqrt(var(testcase)*3/4)/sqrt(3)
> +        sqrt(samplevar(testcase))/sqrt(4)
>          """
>          #y = scipy.stats.sem(self.shoes[0])
>          #assert_approx_equal(y,0.775177399)
>          y = scipy.stats.sem(self.testcase)
> -        assert_approx_equal(y,0.6454972244)
> +        assert_approx_equal(y,0.5590169944)
>
>      def check_z(self):
>          """
> -------------------------
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20061115/27a5fd86/attachment.html>


More information about the SciPy-Dev mailing list