[Numpy-discussion] New ufuncs

Wed Nov 5 23:12:32 EST 2008

On Wed, Nov 5, 2008 at 3:09 PM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
> On Wed, Nov 5, 2008 at 2:41 PM, Stéfan van der Walt <stefan at sun.ac.za>wrote:
>
>> 2008/11/5 T J <tjhnson at gmail.com>:
>> > numpy, it seems that logadd or logaddexp is probably a more fitting
>> > name.  So long as it is documented, I doubt it matters much though...
>>
>> Please don't call it logadd.  `logaddexp` or `logsumexp` are both
>> fine, but the `exp` part is essential in emphasising that you are not
>> calculating a+b using logs.
>>
>
> I'm inclined to go with logaddexp and add logsumexp as an alias for
> logaddexp.reduce. But I'll wait until tomorrow to see if there are more
> comments.
>

Some timings of ufunc vs implementation with currently available functions.
I've done the ufunc as logaddexp and defined currently corresponding
functions as logadd and logsum just for quick convenience. Results:

In [15]: def logsum(x) :
   ....:     off = x.max(axis=0)
   ....:     return off + log(sum(exp(x - off), axis=0))
   ....:

In [57]: def logadd(x,y) :
    max1 = maximum(x,y)
    min1 = minimum(x,y)
    return max1 + log1p(exp(min1 - max1))
   ....:

In [61]: a = np.random.random(size=(1000,1000))

In [62]: b = np.random.random(size=(1000,1000))

In [63]: time x = logadd(a,b)
CPU times: user 0.15 s, sys: 0.02 s, total: 0.17 s
Wall time: 0.17 s

In [65]: time x = logaddexp(a,b)
CPU times: user 0.12 s, sys: 0.00 s, total: 0.13 s
Wall time: 0.13 s

In [67]: time x = logsum(a)
CPU times: user 0.10 s, sys: 0.01 s, total: 0.11 s
Wall time: 0.11 s

In [69]: time x = logaddexp.reduce(a, axis=0)
CPU times: user 0.14 s, sys: 0.00 s, total: 0.14 s
Wall time: 0.14 s

It looks like a ufunc implementation is just a bit faster for adding two
arrays but for summing along axis logsum is a bit faster. This isn't
unexpected because repeated calls to logaddexp isn't the most efficient way
to sum. For smaller arrays, say 10x10 the ufunc wins in both cases by
significant margins (like 2x) because of function call overhead. What sort
of numbers do folks typically use?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20081105/560e06eb/attachment.html>