[SciPy-User] vectorized version of logsumexp? (from scipy.maxentropy)
Warren Weckesser
warren.weckesser at enthought.com
Sat Oct 17 17:37:46 EDT 2009
Hi,
Here's the code for logsumexp from scipy.maxentropy:
----------
def logsumexp(a):
"""Compute the log of the sum of exponentials log(e^{a_1}+...e^{a_n})
of the components of the array a, avoiding numerical overflow.
"""
a = asarray(a)
a_max = a.max()
return a_max + log((exp(a-a_max)).sum())
----------
What's missing is an 'axis' keyword argument to restrict the reduction
to a single axis of the array.
Here's a version with an 'axis' keyword:
----------
def my_logsumexp(a, axis=None):
if axis is None:
# Use the scipy.maxentropy version.
return logsumexp(a)
a = asarray(a)
shp = list(a.shape)
shp[axis] = 1
a_max = a.max(axis=axis)
s = log(exp(a - a_max.reshape(shp)).sum(axis=axis))
lse = a_max + s
return lse
----------
The attached script includes this function, and runs your calculation
twice, once as you originally wrote it, and once using my_logsumexp.
Depending on the number of vectors, the new version is 75 to 100 times
faster.
Cheers,
Warren
per freem wrote:
> hi all,
>
> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> lot. as far as i can tell, this function has no vectorized version
> that works on an m-x-n matrix. i might be doing something wrong here,
> but i found that this function can run extremely slowly if used as
> follows: i have an array of log probability vectors, such that each
> column sums to one. i want to simply iterate over each column and
> renormalize it, using exp(col - logsumexp(col)). here is the code that
> i used to profile this operation:
>
> from scipy import *
> from numpy import *
> from numpy.random.mtrand import dirichlet
> from scipy.maxentropy import logsumexp
> import time
>
> # build an array of probability vectors. each column represents a
> probability vector.
> num_vectors = 1000000
> log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
> # now renormalize each column, using logsumexp
> norm_prob_vectors = []
> t1 = time.time()
> for n in range(num_vectors):
> norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
> norm_prob_vectors.append(norm_p)
> t2 = time.time()
> norm_prob_vectors = array(norm_prob_vectors)
> print "logsumexp renormalization (%d many times) took %s seconds."
> %(num_vectors, str(t2-t1))
>
> i found that even with only 100,000 elements, this code takes about 5 seconds:
>
> logsumexp renormalization (100000 many times) took 5.07085394859 seconds.
>
> with 1 million elements, it becomes prohibitively slow:
>
> logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.
>
> is there a way to speed this up? most vectorized operations that work
> on matrices in numpy/scipy are incredibly fast and it seems like a
> vectorized version of logsumexp should be near instant on this scale.
> is there a way to rewrite the above snippet so that it's faster?
>
> thanks very much for your help.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: my_logsumexp.py
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091017/748893b4/attachment.ksh>
More information about the SciPy-User
mailing list