[SciPy-Dev] Boost for stats

Neal Becker ndbecker2 at gmail.com
Mon Feb 15 07:47:48 EST 2021


One thing I've missed with the current scipy histogram is the ability
to do 'online' or 'incremental' collection of the histogram data.  For
this reason I have written my own histogram code.  I am often
collecting data from monte-carlo simulations and want to accumulate
stats from data that arrives in batches.
I don't know if boost-histogram supports this but if so I would find
this very welcome.

On Mon, Feb 15, 2021 at 7:42 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Mon, Feb 15, 2021 at 1:35 PM Hans Dembinski <hans.dembinski at gmail.com> wrote:
>>
>>
>> > On 15. Feb 2021, at 01:47, Warren Weckesser <warren.weckesser at gmail.com> wrote:
>> >
>> > * The Boost histogram library might provide some benefits over the
>> >   existing NumPy and SciPy options.  (Hans Dembinski, the author
>> >   of the histrogram library, has already commented in this email
>> >   thread.)
>>
>> I would happily support this. We currently offer a Python front-end to Boost.Histogram
>> https://github.com/scikit-hep/boost-histogram
>> which includes a numpy.histogram compatible interface.
>>
>> Switching to Boost.Histogram may offer performance benefits, see
>> https://boost-histogram.readthedocs.io/en/latest/notebooks/PerformanceComparison.html
>>
>> Compared to np.histogram we saw a 1.7 times increase - single threaded, more if multiple threads are used. Compared to np.histogram2d we saw a 11 times increase. These numbers should probably be checked more carefully before decisions are made.
>>
>> Boost.Histogram offers generalised histograms with arbitrary accumulators per cell, so it could also replace the implementations of https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html and friends.
>
>
> That would be really nice. binned_statistic is currently pure Python, and can be a performance hotspot (I've seen multiple cases of that in dealing with image and geospatial data).
>
> Cheers,
> Ralf
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev



-- 
Those who don't understand recursion are doomed to repeat it


More information about the SciPy-Dev mailing list