[Numpy-discussion] Vectorizing code, for loops, and all that

A. M. Archibald peridot.faceted at gmail.com
Tue Oct 3 12:38:55 EDT 2006


On 03/10/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:

> I had an idea regarding which axis to operate on first. Rather than
> operate on strictly the longest axis or strictly the innermost axis, a
> hybrid approach could be used. We would operate on the longest axis that
> would not result in the inner loop overflowing the cache. The idea is
> minimize the loop overhead as we do now by choosing the largest axis,
> while at the same time attempting to maintain cache friendliness.

If elements are smaller than cache lines (usually at least eight
bytes, I think), we might end up pulling many times as many bytes into
the cache as we actually need if we don't loop along axes with small
strides first.

Can BLAS be used for some of these operations?

A. M. Archibald

A. M. Archibald




More information about the NumPy-Discussion mailing list