numarray speed question

Wed Aug 25 20:57:27 EDT 2004

grv575 wrote:
> Is it true that python uses doubles for all it's internal floating
> point arithmetic 

Yes.

even if you're using something like numarray's
> Complex32?  

No. Numarray is an extension module and can use whatever numeric types 
it feels like. Float32 for instance is an array of C floats (assuming 
floats are 32 bits on your box, which they almost certainly are).

Is it possible to do single precision ffts in numarray or
> no?

I believe so, but I'm not sure off the top of my head. I recommend that 
you ask on numpy-discussion <numpy-discussion at lists.sourceforge.net> or 
peek at the implementation. It's possible that all FFTs are done double 
precision, but I don't think so.

-tim

> 
> cookedm+news at physics.mcmaster.ca (David M. Cooke) wrote in message news:<qnkisbrv8r9.fsf at arbutus.physics.mcmaster.ca>...
> 
>>At some point, grv575 at hotmail.com (grv) wrote:
>>
>>
>>>cookedm+news at physics.mcmaster.ca (David M. Cooke) wrote in 
>>><qnkn015ujoh.fsf at arbutus.physics.mcmaster.ca>:
>>>
>>>
>>>>At some point, grv575 at hotmail.com (grv575) wrote:
>>
>> 
>>
>>>>>Heh.  Try timing the example I gave (a += 5) using byteswapped vs.
>>>>>byteswap().  It's fairly fast to do the byteswap.  If you go the
>>>>>interpretation way (byteswapped) then all subsequent array operations
>>>>>are at least an order of magnitude slower (5 million elements test
>>>>>example).
>>>>
>>>>You mean something like
>>>>a = arange(0, 5000000, type=Float64).byteswapped()
>>>>a += 5
>>>>
>>>>vs.
>>>>a = arange(0, 5000000, type=Float64)
>>>>a.byteswap()
>>>>a += 5
>>>>
>>>>? I get the same time for the a+=5 in each case -- and it's only twice
>>>>as slow as operating on a non-byteswapped version. Note that numarray
>>>>calls the ufunc add routine with non-byteswapped numbers; it takes a
>>>>block, orders it correctly, then adds 5 to that, does the byteswap on
>>>>the result, and stores that back. (You're not making a full copy of
>>>>the array; just a large enough section at a time to do useful work.)
>>>
>>>It must be using some sort of cache for the multiplication.  Seems like on 
>>>the first run it takes 6 seconds and subsequently .05 seconds for either 
>>>version.
>>
>>There is. The ufunc for the addition gets cached, so the first time
>>takes longer (but not that much???)
>>
>>
>>>>Maybe what you need is a package designed for *small* arrays ( < 1000).
>>>>Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
>>>>Maybe a fixed number of dimensions. Probably easy to throw something
>>>>together using Pyrex. Or, wrap blitz++ with boost::python.
>>>
>>>I'll check out Numeric first.  Would rather have a drop-in solution (which 
>>>hopefully will get more optimized in future releases) rather than hacking 
>>>my own wrappers.  Is it some purist mentality that's keeping numarray from 
>>>dropping to C code for the time-critical routines?  Or can a lot of the 
>>>speed issues be attributed to the overhead of using objects for the library 
>>>(numarray does seem more general)?
>>
>>It's the object overhead in numarray. The developers moved stuff up to
>>Python, where it's more flexible to handle. Numeric is faster for
>>small arrays (say < 3000), but numarray is much better at large
>>arrays. I have some speed comparisions at
>>http://arbutus.mcmaster.ca/dmc/numpy/
>>
>>I did a simple wrapper using Pyrex the other night for a vector of
>>doubles (it just does addition, so it's not much good :-) It's twice
>>as fast as Numeric, so I might give it a further try.