[Python-ideas] Running average and stdev in the statistics module?

Mon May 6 13:10:44 EDT 2019

On Sun, May 5, 2019 at 1:08 PM Luca Baldini <luca.baldini at pi.infn.it> wrote:
>
> Hi here,
> I wonder if the idea of adding to the statistics module a class to
> calculate the running statistics (average and standard deviation) of a
> generic input data stream has ever come up in the past.
>
> The basic idea is to do the necessary book-keeping as the data are fed
> into the accumulator class and to be able to query the average variance
> of the sequence at any point in time without having to loop over the
> thing again. The obvious way to do that is well know, and described,
> e.g., in Knuth TAOCP vol 2, 3rd edition, page 232. FWIW It is something
> that through the years I have coded myself a myriad of times (e.g., for
> real-time data processing)---and maybe worth considering for addition to
> the standard library.

Personally, I would definitely use this in a number of places in the
real-life code I contribute to.

The problem that I have with this idea is it's not clear how to store
the data in an accumulator class. What about cases with different
contexts in asyncio and/or multithreading code?
I would say it could be useful to allow to pass a storage
implementation from a user's code to address almost any possible
scenario. In that case, such an accumulator class doesn't need to be a
class at all and bother with any intermediate storage. It could be a
number of module-level functions providing an effective algorythm
implementation for user to be able to base on.