[SciPy-Dev] Proposed improvement: recursive integrator for arbitrary dimensionality using QUADPACK

Mon Dec 12 15:17:59 EST 2011

That's quite a mouthful. In simpler terms: I've been playing with 
scipy/integrate/quadpack.py lately, specifically using dblquad and 
tplquad to meet my needs for continuous mutual information 
(https://en.wikipedia.org/wiki/Mutual_information).

After implementing this, I refactored my algorithm to perform a 
deterministic walk of multivariate mutual information 
(https://en.wikipedia.org/wiki/Interaction_information) for cases where 
the shared information increased. And so entered my dilemma: what I 
needed was a general-purpose, canned integrator that could handle an 
arbitrary number of terms during multiple integration using 
fundamentally the same algorithm (to minimize work).

Here's what I came up with:

def _infuncn(x,func,a,b,count,epsabs,epsrel,more_args):
     myargs = (x,) + more_args
     if count == 1:
         return quad(func,a,b,args=myargs,epsabs=epsabs,epsrel=epsrel)[0]
     else:
         return 
quad(_infuncn,a,b,(func,a,b,count-1,epsabs,epsrel,myargs),epsabs=epsabs,epsrel=epsrel)

def nquad(func, a, b, args=(), epsabs=1.49e-8, epsrel=1.49e-8, count=2):
     ''' Recursive, N-depth integrator with number of variables given '''

     # Preconditions
     if not isinstance( count, int ):
         raise TypeError('Variable "count" must be an integer value')
     if count < 2:
         raise ValueError('Variable "count" must be at least 2 to perform \
             meaningful multivariate integration. Consider using 
quad(...) instead.')

     return 
quad(_infuncn,a,b,(func,a,b,count-1,epsabs,epsrel,args),epsabs=epsabs,epsrel=epsrel)

Basically, what you get is this construct that recursively calls into 
QUADPACK with a series of simplifying assumptions (ie, fixed domain, 
equivalent epsilon values propagated down to all invocations of quad) 
and allowable calls from an arbitrarily-shaped function pointer. 
Invoking this function for any number of terms is as easy as abusing the 
lambda function and argument pointers in Python, like so:

nquad(lambda *args: my_nspace_function(args), min_bound, max_bound, 
count=number_of_args)

For example:

# Define INF as float('inf')

points = [[1,2,3],[1,1,2],[2,2,6]]
gkde_n = gaussian_kde(points)

# ... compute Shannon entropy, store as variable "entropy".

mutual_info = lambda *args: gkde_n(args) * \
                    math.log((gkde_n(args) / (entropy_mult + \
MIN_DOUBLE)) + MIN_DOUBLE)

# Compute MI(X1..Xn)
(minfo_n, err_n) = nquad(mutual_info, -INF, INF, count=scale)

I bring this up here for two reasons. The first is I'd find this to be 
greatly beneficial to the current Python interface into QUADPACK, 
insofar as it solves the general case instead of forcing programmers to 
rely on (for example) invocations directly into quad, dblquad, or 
tplquad. The second is my surprise that a function to compute the 
discrete and continuous Shannon-Weaver mutual information does not 
appear to be extant in SciPy or NumPy, which I found greatly distressing 
given their usefulness in discovering a select set of exceptionally 
useful properties about paired data sets that are not well expressed in 
monotonically-dependent tests for independence like Pearson's or 
Spearman's correlations.

Hence my purpose bringing these issues up for general review. I am 
absolutely convinced that my prototypical code above may be greatly 
improved, and I would wholeheartedly support it insofar as I've been 
spinning my wheels over the problem of a robust, fast implementation. 
What I have is an implementation that reliably offers a good 
approximation, complains bitterly when integration fails to rapidly 
converge, and which makes relatively naive assumptions that could be 
greatly improved upon by developers more actively involved in the code 
than I am.

Thanks for reading. I do hope someone out there finds this post valuable.

- Tom Robinson