[Numpy-discussion] Ufunc memory access optimization

Tue Jun 15 11:37:35 EDT 2010

On Wed, Jun 16, 2010 at 12:16 AM, Pauli Virtanen <pav at iki.fi> wrote:
> ti, 2010-06-15 kello 10:10 -0400, Anne Archibald kirjoitti:
>> Correct me if I'm wrong, but this code still doesn't seem to make the
>> optimization of flattening arrays as much as possible. The array you
>> get out of np.zeros((100,100)) can be iterated over as an array of
>> shape (10000,), which should yield very substantial speedups. Since
>> most arrays one operates on are like this, there's potentially a large
>> speedup here. (On the other hand, if this optimization is being done,
>> then these tests are somewhat deceptive.)
>
> It does perform this optimization, and unravels the loop as much as
> possible. If all arrays are wholly contiguous, iterators are not even
> used in the ufunc loop. Check the part after
>
>        /* Determine how many of the trailing dimensions are contiguous
>        */
>
> However, in practice it seems that this typically is not a significant
> win -- I don't get speedups over the unoptimized numpy code even for
> shapes
>
>        (2,)*20
>
> where you'd think that the iterator overhead could be important:

I unfortunately don't have much time to look into the code ATM, but
tests should be run with different CPU. When I implemented the
neighborhood iterator, I observed significant (somtimes several tens
of %) differences - the gcc version also matters,

David