[SciPy-Dev] Boost for stats

Hans Dembinski hans.dembinski at gmail.com
Mon Feb 15 08:03:54 EST 2021


> On 15. Feb 2021, at 13:47, Neal Becker <ndbecker2 at gmail.com> wrote:
> 
> One thing I've missed with the current scipy histogram is the ability
> to do 'online' or 'incremental' collection of the histogram data.  For
> this reason I have written my own histogram code.  I am often
> collecting data from monte-carlo simulations and want to accumulate
> stats from data that arrives in batches.
> I don't know if boost-histogram supports this but if so I would find
> this very welcome.

I think the answer is yes, if I understood you correctly.

Boost.Histogram has an object oriented design, the histogram is an object that one can fill incrementally with input arrays. 

I personally like the functional paradigm behind np.histogram and friends, but it is not as efficient for incremental collection. When numpy.histogram is used, one has to generate a temporary array with the intermediate results, which are then added to the main array. The object-oriented approach avoids this.

In my field (high energy and astroparticle physics), incremental filling is also the default. We typically have large amounts of data that we want to convert into histograms, so the codes typically fill some histograms incrementally.

The performance issue of numpy.histogram could also be fixed by adding an "out" keyword to numpy.histogram, to allow the user to pass the array which is filled.

Best regards,
Hans


More information about the SciPy-Dev mailing list