[SciPy-Dev] Boost for stats

Ralf Gommers ralf.gommers at gmail.com
Mon Feb 15 07:41:51 EST 2021


On Mon, Feb 15, 2021 at 1:35 PM Hans Dembinski <hans.dembinski at gmail.com>
wrote:

>
> > On 15. Feb 2021, at 01:47, Warren Weckesser <warren.weckesser at gmail.com>
> wrote:
> >
> > * The Boost histogram library might provide some benefits over the
> >   existing NumPy and SciPy options.  (Hans Dembinski, the author
> >   of the histrogram library, has already commented in this email
> >   thread.)
>
> I would happily support this. We currently offer a Python front-end to
> Boost.Histogram
> https://github.com/scikit-hep/boost-histogram
> which includes a numpy.histogram compatible interface.
>
> Switching to Boost.Histogram may offer performance benefits, see
>
> https://boost-histogram.readthedocs.io/en/latest/notebooks/PerformanceComparison.html
>
> Compared to np.histogram we saw a 1.7 times increase - single threaded,
> more if multiple threads are used. Compared to np.histogram2d we saw a 11
> times increase. These numbers should probably be checked more carefully
> before decisions are made.
>
> Boost.Histogram offers generalised histograms with arbitrary accumulators
> per cell, so it could also replace the implementations of
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html
> and friends.
>

That would be really nice. binned_statistic is currently pure Python, and
can be a performance hotspot (I've seen multiple cases of that in dealing
with image and geospatial data).

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210215/5a972b62/attachment.html>


More information about the SciPy-Dev mailing list