[SciPy-User] numpy.histogram is slow

Jerome Kieffer Jerome.Kieffer at esrf.fr
Tue Oct 23 01:30:12 EDT 2012


On Mon, 22 Oct 2012 13:59:23 +0200
Sturla Molden <sturla at molden.no> wrote:

> On 18.10.2012 09:26, Jerome Kieffer wrote:
> 
> > I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel.
> > It is much faster than the one provided by numpy:
> > 4ms vs 25ms in your case on my computer
> > https://github.com/kif/pyFAI/blob/master/src/histogram.pyx
> 
> Is there a reason why you set cdivision to True in a code that has no 
> integer division?

No... I would say this is legacy code. Basically I am (was) interested in the
(weighted histogram)/(unwgeighted histogram). This part has been
removed from the code.
I re-implemented histogram because I needed faster execution but the
implementation in Cython is not optimal, as you mentionned (large
storage because there are no atomic add in cython resulting in speed up
that don't scale). I also moved away from histogram as I needed more
precision.
 
> Cython prange scales badly unless you do a lot of work on each 
> iteration. That is, each iteration of a prange loop does a barrier 
> synchronization through an OpenMP flush. Don't use it the way you do 
> here. A Cython prange loop is not nearly as cheap as a C loop with 
> "#pragma omp parallel for". If you really want to use OpenMP, let your 
> Cython code call C code.

I totally agree ... this is why I changed the algorithm to be able to
implement it in OpenCL (using pyopencl). OpenCL on the CPU is much
faster than cython and almost as dynamic as python when using pyopencl.

Cheers,

-- 
Jérôme Kieffer
Data analysis unit - ESRF



More information about the SciPy-User mailing list