[SciPy-User] Standard error of the mean for weighted data

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Jan 22 09:08:02 EST 2014


On Wed, Jan 22, 2014 at 7:29 AM, Kevin Kunzmann <kevinkunzmann at gmx.net> wrote:
> Hey,
>
> er, why not use the weighted sample variance? Same Wiki site, lil'
> further down. Take care when using deriving the std from that, as the
> estimator is no longer unbiased,
>
> cheers,
>
> Kevin
>
> On 22.01.2014 11:46, Ramon Crehuet wrote:
>> Dear all,
>> I would like to calculate the standard error of the mean for data values that
>> each has some (normalized) weight. I guess this cannot be done with
>> scipy.stats.sem...
>> I thought of coding that, but I'm afraid I don't know what to code! For weighted
>> data, the SEM cannot be std/sqrt(N), even if the std is calculated from the
>> weighted data as explained here:
>> http://en.wikipedia.org/wiki/Weighted_arithmetic_mean
>> Imagine I have 1000 values, but only 2 have weights different from zero. It
>> makes no sense to divide the weighted std by sqrt(1000). Right?
>> Any help or suggestion is welcome. (I also looked at scikits.bootstrap but I
>> don't think I can define an array of weights anywhere).
>> Thanks in advance,

two ways using statsmodels

>>> import numpy as np
>>> import statsmodels.api as sm
>>> nobs=100
>>> x = 1 + np.random.randn(nobs)
>>> w = np.random.chisquare(5, size=nobs)

>>> res = sm.WLS(x, np.ones(nobs), weights=w).fit()
>>> res.params
array([ 1.22607483])
>>> res.bse
array([ 0.09177795])

need to normalize weights to the number of observations:

>>> ws = sm.stats.DescrStatsW(x, weights = w / w.sum() * nobs)
>>> ws.mean
1.2260748286171022
>>> ws.std_mean
0.09177794656529388


>>> ws.ttest_mean()
(13.359144266153654, 6.9422438873768692e-24, 99.0)
>>> res.tvalues, res.pvalues
(array([ 13.35914427]), array([  6.94224389e-24]))

>>> ws.ttest_mean(value=1)
(2.4632805273788185, 0.01549276757842121, 99.0)
>>> tt = res.t_test(r_matrix=[1], q_matrix=[1])
>>> tt.tvalue, tt.pvalue
(array([[ 2.46328053]]), array(0.015492767578421171))


http://statsmodels.sourceforge.net/devel/stats.html#basic-statistics-and-t-tests-with-frequency-weights

Josef

>> Ramon
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list