[Numpy-discussion] optimizing power() for complex and real cases

Wed Feb 22 19:56:03 EST 2006

David M. Cooke wrote:

>Tim Hochberg <tim.hochberg at cox.net> writes:
>
>  
>
>>David M. Cooke wrote:
>>    
>>
[SNIP]

>
>I've gone through your code you checked in, and fixed it up. Looks
>good. One side effect is that
>
>def zl(x):
>    a = ones_like(x)
>    a[:] = 0
>    return a
>
>is now faster than zeros_like(x) :-)
>  
>
I noticed that ones_like was faster than zeros_like, but I didn't think 
to try that. That's pretty impressive considering how ridicuously easy 
it was to write.

>One problem I had is that in PyArray_SetNumericOps, the "copy" method
>wasn't picked up on. It may be due to the order of initialization of
>the ndarray type, or something (since "copy" isn't a ufunc, it's
>initialized in a different place). I couldn't figure out how to fiddle
>that, so I replaced the x.copy() call with a call to PyArray_Copy().
>  
>
Interesting. It worked fine here.

>  
>
>>>Yes; because it's the implementation of __pow__, the second argument can
>>>be anything.
>>>
>>>
>>>      
>>>
>>No, you misunderstand.. What I was talking about was that the *first*
>>argument can also be something that's not a PyArrayObject, despite the
>>functions signature.
>>    
>>
>
>Ah, I suppose that's because the power slot in the number protocol
>also handles __rpow__.
>  
>
That makes sense. It was giving me fits whatever the cause.

[SNIP]

>
>>but the real fix would be to dig into
>>PyArray_EnsureArray and see why it's slow for Python_Ints. It is much
>>faster for numarray scalars.
>>    
>>
>
>Right; that needs to be looked at.
>  
>
It doesn't look to bad. But I haven't had a chance to try to do anything 
about it yet.

>>Another approach is to actually compute (x*x)*(x*x) for pow(x,4) at
>>the level of array_power. I think I could make this work. It would
>>probably work well for medium size arrays, but might well make things
>>worse for large arrays that are limited by memory bandwidth since it
>>would need to move the array from memory into the cache multiple times.
>>    
>>
>
>I don't like that; I think it would be better memory-wise to do it
>elementwise. Not just speed, but size of intermediate arrays.
>  
>
Yeah, for a while I was real hot on the idea since I could do everything 
without messing with ufuncs. But then I decided not to pursue it because 
I thought it would be slow because of memory usage -- it would be 
pullling data into the cache over and over again and I think that would 
slow things down a lot.

[SNIP]

>'int_power' we could do; that would the next step I think. The half
>integer powers we could maybe leave; if you want x**(-3/2), for
>instance, you could do y = x**(-1)*sqrt(x) (or do y = x**(-1);
>sqrt(y,y) if you're worried about temporaries).
>
>Or, 'fast_power' could be documented as doing the optimizations for
>integer and half-integer _scalar_ exponents, up to a certain size,
>like 100), and falling back on pow() if necessary. I think we could do
>a precomputation step to split the exponent into appropiate squarings
>and such that'll make the elementwise loop faster.
>
There's a clever implementation of this in complexobject.c.  Speaking of 
complexobject.c, I did implement fast integer powers for complex objects 
at the nc_pow level. For small powers at least, it's over 10 times as 
fast. And, since it's at the nc_pow level it works for matrix matrix 
powers as well. My implementation is arguably slightly faster than 
what's in complexobject, but I won't have a chance to check it in till 
next week -- I'm off for some snowboarding tomorrow.

I kind of like power and scalar_power. Then ** could be advertised as 
calling scalar_power for scalars and power for arrays. Scalar power 
would do optimizations on integer and half_integer powers. Of course 
there's no real way to enforce that scalar power is passed scalars, 
since presumably it would be a ufunc, short of making _scalar_power a 
ufunc instead and doing something like:

def scalar_power(x, y):
    "compute x**y, where y is a scalar optimizing integer and half 
integer powers possibly at some minor loss of accuracy"
    if not is_scalar(y): raise ValuerError("Naughty!!")
    return _scalar_power(x,y)

> Half-integer
>exponents are exactly representable as doubles (up to some number of
>course), so there's no chance of decimal-to-binary conversions making
>things look different. That might work out ok. Although, at that point
>I'd suggest we make it 'power', and have 'rawpower' (or ????) as the
>version that just uses pow().
>  
>
>Another point is to look at __div__, and use reciprocal if the
>dividend is 1.
>  
>
That would be easy, but wouldn't it be just as easy to optimize __div__ 
for scalar divisions. Should probably check that this isn't just as fast 
since it would be a lot more general.

>
>I've added a page to the developer's wiki at
>http://projects.scipy.org/scipy/numpy/wiki/PossibleOptimizationAreas
>to keep a list of areas like that to look into if someone has time :-)
>  
>
Ah, good plan.

-tim