[issue20479] Efficiently support weight/frequency mappings in the statistics module

Oscar Benjamin report at bugs.python.org
Sun Jan 20 16:11:48 EST 2019


Oscar Benjamin <oscar.j.benjamin at gmail.com> added the comment:

> I would find it very helpful if somebody has time to do a survey of
> other statistics libraries or languages (e.g. numpy, R, Octave, Matlab,
> SAS etc) and see how they handle data with weights.

Numpy has only sporadic support for this. The standard mean function
does not have any way to provide weights but there is an alternative
called average that computes the mean and has an optional weights
argument. I've never heard of average before searching for "numpy
weighted mean" just now. Numpy's API often has bits of old cruft from
where various numerical packages were joined together so I'm not sure
they would recommend their current approach. I don't think there are
any other numpy functions for providing weighted statistics.

Statsmodels does provide an API for this as explained here:
https://stackoverflow.com/a/36464881/9450991
Their API is that you create an object with data and weights and can
then call methods/attributes for statistics.

Matlab doesn't support even weighted mean as far as I can tell. There
is wmean on the matlab file exchange:

>
> - what APIs do they provide?
> - do they require weights to be positive integers, or do they
>   support arbitrary float weights?
> - including negative weights?
>   (what physical meaning does a negative weight have?)
>
> At the moment, a simple helper function seems to do the trick for
> non-negative integer weights:
>
> def flatten(items):
>     for item in items:
>         yield from item
>
> py> data = [1, 2, 3, 4]
> py> weights = [1, 4, 1, 2]
> py> statistics.mean(flatten([x]*w for x, w in zip(data, weights)))
> 2.5
>
> In principle, the implementation could be as simple as a single
> recursive call:
>
> def mean(data, weights=None):
>     if weights is not None:
>         return mean(flatten([x]*w for x, w in zip(data, weights)))
>     # base case without weights is unchanged
>
> or perhaps it could be just a recipe in the docs.
>
> ----------
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <https://bugs.python.org/issue20479>
> _______________________________________

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue20479>
_______________________________________


More information about the Python-bugs-list mailing list