[Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

Thu Sep 9 23:32:07 EDT 2010

On Thu, Sep 9, 2010 at 8:07 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Thu, Sep 9, 2010 at 7:22 PM, cpblpublic <cpblpublic+numpy at gmail.com> wrote:
>> I am looking for some reaally basic statistical tools. I have some
>> sample data, some sample weights for those measurements, and I want to
>> calculate a mean and a standard error of the mean.
>
> How about using a bootstrap?
>
> Array and weights:
>
>>> a = np.arange(100)
>>> w = np.random.rand(100)
>>> w = w / w.sum()
>
> Initialize:
>
>>> n = 1000
>>> ma = np.zeros(n)
>
> Save mean of each bootstrap sample:
>
>>> for i in range(n):
>   ....:     idx = np.random.randint(0, 100, 100)
>   ....:     ma[i] = np.dot(a[idx], w[idx])
>   ....:
>   ....:
>
> Error in mean:
>
>>> ma.std()
>   3.854023384833674
>
> Sanity check:
>
>>> np.dot(w, a)
>   49.231127299096954
>>> ma.mean()
>   49.111478821225127
>
> Hmm...should w[idx] be renormalized to sum to one in each bootstrap sample?

Or perhaps there is no uncertainty about the weights, in which case:

>> for i in range(n):
   ....:     idx = np.random.randint(0, 100, 100)
   ....:     ma[i] = np.dot(a[idx], w)
   ....:
   ....:
>> ma.std()
   3.2548815339711115