[SciPy-User] vectorized version of logsumexp? (from scipy.maxentropy)

Warren Weckesser warren.weckesser at enthought.com
Sat Oct 17 17:37:46 EDT 2009


Hi,

Here's the code for logsumexp from scipy.maxentropy:

----------
def logsumexp(a):
    """Compute the log of the sum of exponentials log(e^{a_1}+...e^{a_n})
    of the components of the array a, avoiding numerical overflow.
    """
    a = asarray(a)
    a_max = a.max()
    return a_max + log((exp(a-a_max)).sum())
----------

What's missing is an 'axis' keyword argument to restrict the reduction 
to a single axis of the array.

Here's a version with an 'axis' keyword:

----------
def my_logsumexp(a, axis=None):
    if axis is None:
        # Use the scipy.maxentropy version.
        return logsumexp(a)
    a = asarray(a)
    shp = list(a.shape)
    shp[axis] = 1
    a_max = a.max(axis=axis)
    s = log(exp(a - a_max.reshape(shp)).sum(axis=axis))
    lse  = a_max + s
    return lse
----------

The attached script includes this function, and runs your calculation 
twice, once as you originally wrote it, and once using my_logsumexp.  
Depending on the number of vectors, the new version is 75 to 100 times 
faster.

Cheers,

Warren




per freem wrote:
> hi all,
>
> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> lot. as far as i can tell, this function has no vectorized version
> that works on an m-x-n matrix. i might be doing something wrong here,
> but i found that this function can run extremely slowly if used as
> follows: i have an array of log probability vectors, such that each
> column sums to one. i want to simply iterate over each column and
> renormalize it, using exp(col - logsumexp(col)). here is the code that
> i used to profile this operation:
>
> from scipy import *
> from numpy import *
> from numpy.random.mtrand import dirichlet
> from scipy.maxentropy import logsumexp
> import time
>
> # build an array of probability vectors.  each column represents a
> probability vector.
> num_vectors = 1000000
> log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
> # now renormalize each column, using logsumexp
> norm_prob_vectors = []
> t1 = time.time()
> for n in range(num_vectors):
>     norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
>     norm_prob_vectors.append(norm_p)
> t2 = time.time()
> norm_prob_vectors = array(norm_prob_vectors)
> print "logsumexp renormalization (%d many times) took %s seconds."
> %(num_vectors, str(t2-t1))
>
> i found that even with only 100,000 elements, this code takes about 5 seconds:
>
> logsumexp renormalization (100000 many times) took 5.07085394859 seconds.
>
> with 1 million elements, it becomes prohibitively slow:
>
> logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.
>
> is there a way to speed this up? most vectorized operations that work
> on matrices in numpy/scipy are incredibly fast and it seems like a
> vectorized version of logsumexp should be near instant on this scale.
> is there a way to rewrite the above snippet so that it's faster?
>
> thanks very much for your help.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: my_logsumexp.py
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20091017/748893b4/attachment.ksh>


More information about the SciPy-User mailing list