[SciPy-user] information on statistical functions

Wed Dec 17 20:53:53 EST 2008

On Wed, Dec 17, 2008 at 7:58 PM, Tim Michelsen
<timmichelsen at gmx-topmail.de> wrote:
> Hello,
> I observed that there are 2 standard deviation functions in the
> scipy/numpy modules:
>
> Numpy:
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html#numpy.std
>
> Scipy:
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.std.html#scipy.stats.std
>
> What is the difference?
> There is no formula included within the docstrings.
>
> I suppose that np.std() is for the whole population and scipy.std is
> designed for a smaller sample in the population.
> Is that true?

difference between population (numpy) and sample (scipy.stats)
variance and standard deviation is whether the the estimator is
biased, i.e. 1/n, or not, i.e. 1/(n-1).  Look at description in source
http://docs.scipy.org/scipy/source/scipy/dist/lib64/python2.4/site-packages/scipy/stats/stats.py#1359
for depreciation warning.

See also distinction in your wikipedia reference for biased versus unbiased.

>
> Are there any functions for calculating the mean bias error (MBE)?
>
> I am looking for forumla 3 in
> http://en.wikipedia.org/wiki/Mean_squared_error#Examples

I'm not sure what your use case is but, in the referenced 3rd line,
the MSE is the theoretical MSE of the estimator and it is not
calculated from the sample.

Overall, this are one liners in any matrix/array package

For example when I do a Monte Carlo for an estimator, theta_hat, when
the true parameter is theta (a scalar constant), and theta_hat is the
array of estimators for the different runs, then the RMSE is just

RMSE = np.sqrt(np.sum( theta_hat - theta)**2 ) / float(n) )

For the first Wikipedia example: MSE for observed Y_i  compared to
predicted Yhat_i is just
MSE = np.sum(  (Y - Y_hat)**2  ) / float(n)

Josef