numarray speed question

Mon Aug 9 01:17:13 EDT 2004

cookedm+news at physics.mcmaster.ca (David M. Cooke) wrote in 
<qnkn015ujoh.fsf at arbutus.physics.mcmaster.ca>:

>At some point, grv575 at hotmail.com (grv575) wrote:

>> Heh.  Try timing the example I gave (a += 5) using byteswapped vs.
>> byteswap().  It's fairly fast to do the byteswap.  If you go the
>> interpretation way (byteswapped) then all subsequent array operations
>> are at least an order of magnitude slower (5 million elements test
>> example).
>
>You mean something like
>a = arange(0, 5000000, type=Float64).byteswapped()
>a += 5
>
>vs.
>a = arange(0, 5000000, type=Float64)
>a.byteswap()
>a += 5
>
>? I get the same time for the a+=5 in each case -- and it's only twice
>as slow as operating on a non-byteswapped version. Note that numarray
>calls the ufunc add routine with non-byteswapped numbers; it takes a
>block, orders it correctly, then adds 5 to that, does the byteswap on
>the result, and stores that back. (You're not making a full copy of
>the array; just a large enough section at a time to do useful work.)

It must be using some sort of cache for the multiplication.  Seems like on 
the first run it takes 6 seconds and subsequently .05 seconds for either 
version.

>Maybe what you need is a package designed for *small* arrays ( < 1000).
>Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
>Maybe a fixed number of dimensions. Probably easy to throw something
>together using Pyrex. Or, wrap blitz++ with boost::python.

I'll check out Numeric first.  Would rather have a drop-in solution (which 
hopefully will get more optimized in future releases) rather than hacking 
my own wrappers.  Is it some purist mentality that's keeping numarray from 
dropping to C code for the time-critical routines?  Or can a lot of the 
speed issues be attributed to the overhead of using objects for the library 
(numarray does seem more general)?