[Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

Sat Mar 20 14:56:03 EDT 2010

Pauli Virtanen wrote:
> Anne Archibald wrote:
>> I'm not knocking numpy; it does (almost) the best it can. (I'm not
>> sure of the optimality of the order in which ufuncs are executed; I
>> think some optimizations there are possible.)
>
> Ufuncs and reductions are not performed in a cache-optimal fashion, IIRC
> dimensions are always traversed in order from left to right. Large
> speedups are possible in some cases, but in a quick try I didn't manage to
> come up with an algorithm that would always improve the speed (there was a
> thread about this last year or so, and there's a ticket). Things varied
> between computers, so this probably depends a lot on the actual cache
> arrangement.
>
> But perhaps numexpr has such heuristics, and we could steal them?

At least in MultiIter (and I always assumed ufuncs too, but perhaps not)
there's functionality to remove the largest dimension so that it can be
put innermost in a loop. In many situations, removing the dimension with
the smallest stride from the iterator would probably work much better.

It's all about balancing iterator overhead and memory overhead. Something
simple like "select the dimension with length > 200 which has smallest
stride, or the dimension with largest length if none are above 200" would
perhaps work well?

Dag Sverre