[Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

Thu Sep 9 22:38:18 EDT 2010

Excerpts from cpblpublic's message of Thu Sep 09 22:22:05 -0400 2010:
> I am looking for some reaally basic statistical tools. I have some
> sample data, some sample weights for those measurements, and I want to
> calculate a mean and a standard error of the mean.
> 
> Here are obvious places to look:
> 
> numpy
> scipy.stats
> statsmodels
> 
> It seems to me that numpy's "mean" and "average" functions have their
> names backwards. That is, often a mean is defined more generally than
> average, and includes the possibility of weighting, but in this case
> it is "average" that has a weights argument. Can these functions be
> merged/renamed/deprecated in the future?  It's clear to me that "mean"
> should allow for weights.
> 
> None of these modules, above, offer standard error of the mean which
> incorporates weights. scipy.stats.sem() doesn't, and that's the closest
> thing. numpy's "var" doesn't allow weights.
> There aren't any weighted variances in the above modules.
> 
> Again, are there favoured codes for these functions? Can they be
> incorporated appropriately in the future?
> 
> Most immediately, I'd love to get code for weighted sem. I'll write it
> otherwise, but it might be crude and dumb...
> Thanks!
> Chris Barrington-Leigh
> UBC

This code below should do what you want.   It is part of the stat
sub-package of esutil  http://code.google.com/p/esutil/

Hope this helps,
Erin Scott Sheldon
Brookhaven National Laboratory

def wmom(arrin, weights_in, inputmean=None, calcerr=False, sdev=False):
    """
    NAME:
      wmom()

    PURPOSE:
      Calculate the weighted mean, error, and optionally standard deviation of
      an input array.  By default error is calculated assuming the weights are
      1/err^2, but if you send calcerr=True this assumption is dropped and the
      error is determined from the weighted scatter.

    CALLING SEQUENCE:
     wmean,werr = wmom(arr, weights, inputmean=None, calcerr=False, sdev=False)

    INPUTS:
      arr: A numpy array or a sequence that can be converted.
      weights: A set of weights for each elements in array.
    OPTIONAL INPUTS:
      inputmean: 
          An input mean value, around which them mean is calculated.
      calcerr=False: 
          Calculate the weighted error.  By default the error is calculated as
          1/sqrt( weights.sum() ).  If calcerr=True it is calculated as sqrt(
          (w**2 * (arr-mean)**2).sum() )/weights.sum()
      sdev=False: 
          If True, also return the weighted standard deviation as a third
          element in the tuple.

    OUTPUTS:
      wmean, werr: A tuple of the weighted mean and error. If sdev=True the
         tuple will also contain sdev: wmean,werr,wsdev

    REVISION HISTORY:
      Converted from IDL: 2006-10-23. Erin Sheldon, NYU

   """

    # no copy made if they are already arrays
    arr = numpy.array(arrin, ndmin=1, copy=False)

    # Weights is forced to be type double. All resulting calculations
    # will also be double
    weights = numpy.array(weights_in, ndmin=1, dtype='f8', copy=False)

    wtot = weights.sum()

    # user has input a mean value
    if inputmean is None:
        wmean = ( weights*arr ).sum()/wtot
    else:
        wmean=float(inputmean)

    # how should error be calculated?
    if calcerr:
        werr2 = ( weights**2 * (arr-wmean)**2 ).sum()
        werr = numpy.sqrt( werr2 )/wtot
    else:
        werr = 1.0/numpy.sqrt(wtot)

    # should output include the weighted standard deviation?
    if sdev:
        wvar = ( weights*(arr-wmean)**2 ).sum()/wtot
        wsdev = numpy.sqrt(wvar)
        return wmean,werr,wsdev
    else:
        return wmean,werr