[Numpy-discussion] Computing Simple Statistics When Only they Frequency Distribution is Known

Sat Nov 28 22:20:15 EST 2009

On Fri, Nov 27, 2009 at 9:25 PM, Wayne Watson
<sierra_mtnview at sbcglobal.net>wrote:

> I actually wrote my own several days ago. When I began getting myself
> more familiar with numpy, I was hoping there would be an easy to use
> version in it for this frequency approach. If not, then I'll just stick
> with what I have. It seems something like this should be common.
>
> A simple way to do it with the present capabilities would be to "unwind"
> the frequencies,  For example, given [2,1,3] for some corresponding set
> of x, say, [1,2,3], produce[1, 1, 2, 3, 3, 3]. I have no idea if numpy
> does anything like that, but, if so, the typical mean, std, ... could be
> used. In my case, it's sort of pointless. It would produce an array of
> 307,200 items for 256 x (0,1,2,...,255), and just slow down the
> computations "unwinding" it in software. The sub-processor hardware
> already produced the 256 frequencies.
>
> Basically, this amounts to having a pdf, and values of x.
> Mathematically, the statistics are produced directly from it.
>
> josef.pktd at gmail.com wrote:
> > On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson
> > <sierra_mtnview at sbcglobal.net> wrote:
> >
> >> How do I compute avg, std dev, min, max and other simple stats if I only
> >> know the frequency distribution?
> >>
> >
> > If you are willing to assign to all observations in a bin the value at
> > the bin midpoint, then you could do it with weights in the statistics
> > calculations. However, numpy.average is, I think, the only statistic
> > that takes weights. min max are independent of weight, but std and var
> > need to be calculated indirectly.
> >
> > If you need more stats with weights, then the attachment in
> > http://projects.scipy.org/scipy/ticket/604  is a good start.
> >
> > Josef
>

Wayne:

There is no need to "unwind": If Y(X) is the (unnormalized) freq. distr. of
random variable/data X, start by computing y = Y/(Y.sum()) (if Y is already
normalized, skip this step).  Then:

av(X) = np.dot(X, y), sd(X) = np.sqrt(np.dot((X*X), y) - (av(X))^2), and
higher moment statistics can be calculated utilizing similar formulae.

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091128/c740f4b4/attachment.html>