[Numpy-discussion] Question about broadcasting vs for loop performance

Mon Sep 15 09:50:51 EDT 2014

Broadcasting, by itself, should not be creating large arrays in memory. It
uses stride tricks to make the array appear larger, while simply reusing
the same memory block. This is why it is so valuable because it doesn't
make a copy.

Now, what may be happening is that the resulting calculation from the
broadcasted arrays is too large to easily fit into the cpu cache, so the
subsequent summation might be hitting performance penalties for that.
Essentially, your first example may be a poor-man's implementation of data
chunking. I bet if you ran these performance metrics over a wide range of
sizes, you will see some interesting results.

Cheers!
Ben Root

On Sun, Sep 14, 2014 at 10:53 PM, Ryan Nelson <rnelsonchem at gmail.com> wrote:

> I think I figured out my own question. I guess that the broadcasting
> approach is generating a very large 2D array in memory, which takes a bit
> of extra time. I gathered this from reading the last example on the
> following site:
> http://wiki.scipy.org/EricsBroadcastingDoc
> I tried this again with a much smaller "xs" array (~100 points) and the
> broadcasting version was much faster.
> Thanks
>
> Ryan
>
> Note: The link to the Scipy wiki page above is broken at the bottom of
> Numpy's broadcasting page, otherwise I would have seen that earlier. Sorry
> for the noise.
>
> On Sun, Sep 14, 2014 at 10:22 PM, Ryan Nelson <rnelsonchem at gmail.com>
> wrote:
>
>> Hello all,
>>
>> I have a question about the performance of broadcasting versus Python for
>> loops. I have the following sample code that approximates some simulation
>> I'd like to do:
>>
>> ## Test Code ##
>>
>> import numpy as np
>>
>>
>> def lorentz(x, pos, inten, hwhm):
>>
>>     return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )
>>
>>
>> poss = np.random.rand(100)
>>
>> intens = np.random.rand(100)
>>
>> xs = np.linspace(0,10,10000)
>>
>>
>> def first_try():
>>
>>     sim_inten = np.zeros(xs.shape)
>>
>>     for freq, inten in zip(poss, intens):
>>
>>         sim_inten += lorentz(xs, freq, inten, 5.0)
>>
>>     return sim_inten
>>
>>
>> def second_try():
>>
>>     sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)
>>
>>     sim_inten2 = sim_inten2.sum(axis=1)
>>
>>     return sim_inten2
>>
>>
>> print np.array_equal(first_try(), second_try())
>>
>>
>> ## End Test ##
>>
>>
>> Running this script prints "True" for the final equality test. However,
>> IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for
>> second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux
>> machine both with Python 2.7 and Numpy 1.8.2.
>>
>>
>> I understand in principle why broadcasting should be faster than Python
>> loops, but I'm wondering why I'm getting worse results with the pure Numpy
>> function. Is there some general rules for when broadcasting might give
>> worse performance than a Python loop?
>>
>>
>> Thanks
>>
>>
>> Ryan
>>
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/fe2d18c4/attachment.html>