[Numpy-discussion] optimizing power() for complex and real cases

Sun Feb 19 16:39:00 EST 2006

On Sat, Feb 18, 2006 at 06:17:47PM -0700, Tim Hochberg wrote:
> 
> OK, I now have a faily clean implementation in C of:
> 
> def __pow__(self, p):
>    if p is not a scalar:
>       return power(self, p)
>    elif p == 1:
>       return p
>    elif p == 2:
>       return square(self)
> #    elif p == 3:
> #       return cube(self)
> #    elif p == 4:
> #       return power_4(self)
> #    elif p == 0:
> #       return ones(self.shape, dtype=self.dtype)
> #    elif p == -1:
> #       return 1.0/self
>    elif p == 0.5:
>       return sqrt(self)
> 
> 
> First a couple of technical questions, then on to the philosophical portion 
> of this note.
> 
> 1. Is there a nice fast way to get a matrix filled with ones from C. I've 
> been tempted to write a ufunc 'ones_like', but I'm afraid that might be 
> considered inappropriate.
> 
> 2. Are people aware that array_power is sometimes passed non arrays as its 
> first argument? Despite having the signature:
> 
> array_power(PyArrayObject *a1, PyObject *o2) 
> 
> This caused me almost no end of headaches, not to mention crashes during 
> numpy.test().

Yes; because it's the implementation of __pow__, the second argument can
be anything.

> I'll check this into the power_optimization branch RSN, hopefully with a 
> fix for the zero power case. Possibly also after extending it to inplace 
> power as well.
> 
> 
> OK, now on to more important stuff. As I've been playing with this my 
> opinion has gone in circles a couple of times. I now think the issue of 
> optimizing integer powers of complex numbers and integer powers of floats 
> are almost completely different. Because complex powers are quite slow and 
> relatively inaccurate, it is appropriate to optimize them for integer 
> powers at the level of nc_pow. This should be just a matter of liberal 
> borrowing from complexobject.c, but I haven't tried it yet.

Ok.

> On the other hand, real powers are fast enough that doing anything at the 
> single element level is unlikely to help. So in that case we're left with 
> either optimizing the cases where the dimension is zero as David has done, 
> or optimizing at the __pow__ (AKA array_power) level as I've done now based 
> on David's original suggestion. This second approach is faster because it 
> avoids the mysterious python scalar -> zero-D array conversion overhead. 
> However, it suffers if we want to optimize lots of different powers since 
> one needs a ufunc for each one. So the question becomes, which powers 
> should we optimize?

Hmm, ufuncs are passed a void* argument for passing info to them. Now,
what that argument is defined when the ufunc is created, but maybe
there's a way to piggy-back on it.

> My latest thinking on this is that we should optimize only those cases 
> where the optimized result is no less accurate than that produced by pow. 
> I'm going to assume that all C operations are equivalently accurate, so 
> pow(x,2) has roughly the same amount of error as x*x. (Something on the 
> order of 0.5 ULP I'd guess). In that case:
>   pow(x, -1) -> 1 / x
>   pow(x, 0) -> 1
>   pow(x, 0.5) -> sqrt(x)
>   pow(x, 1) -> x
>   pow(x, 2) -> x*x
> can all be implemented in terms of multiply or divide with the same 
> accuracy as the original power methods. Once we get beyond these, the error 
> will go up progressively.
> 
> The minimal set described above seems like it should be relatively 
> uncontroversial and it's what I favor. Once we get beyond this basic set, 
> we would need to reach some sort of consensus on how much additional error 
> we are willing to tolerate for optimizing these extra cases. You'll notice 
> that I've changed my mind, yet again, over whether to optimize A**0.5. 
> Since the set of additional ufuncs needed in this case is relatively small, 
> just square and inverse (==1/x), this minimal set works well if optimizing 
> in pow as I've done.

Ok. I'm still not happy with the speed of pow(), though. I'll have to
sit and look at. We may be able to optimize integer powers better.
And there's another area: integer powers of integers. Right now that
uses pow(), whereas we might be able to do better. I'm looking into
that. A framework for that could be helpful for the complex powers too.

Too bad we couldn't make a function generator :-) [Well, we could using
weave...]

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca