[SciPy-Dev] Proposed improvement: recursive integrator for arbitrary dimensionality using QUADPACK
Thomas Robinson
trobinson at systemsbiology.org
Mon Dec 12 15:17:59 EST 2011
That's quite a mouthful. In simpler terms: I've been playing with
scipy/integrate/quadpack.py lately, specifically using dblquad and
tplquad to meet my needs for continuous mutual information
(https://en.wikipedia.org/wiki/Mutual_information).
After implementing this, I refactored my algorithm to perform a
deterministic walk of multivariate mutual information
(https://en.wikipedia.org/wiki/Interaction_information) for cases where
the shared information increased. And so entered my dilemma: what I
needed was a general-purpose, canned integrator that could handle an
arbitrary number of terms during multiple integration using
fundamentally the same algorithm (to minimize work).
Here's what I came up with:
def _infuncn(x,func,a,b,count,epsabs,epsrel,more_args):
myargs = (x,) + more_args
if count == 1:
return quad(func,a,b,args=myargs,epsabs=epsabs,epsrel=epsrel)[0]
else:
return
quad(_infuncn,a,b,(func,a,b,count-1,epsabs,epsrel,myargs),epsabs=epsabs,epsrel=epsrel)
def nquad(func, a, b, args=(), epsabs=1.49e-8, epsrel=1.49e-8, count=2):
''' Recursive, N-depth integrator with number of variables given '''
# Preconditions
if not isinstance( count, int ):
raise TypeError('Variable "count" must be an integer value')
if count < 2:
raise ValueError('Variable "count" must be at least 2 to perform \
meaningful multivariate integration. Consider using
quad(...) instead.')
return
quad(_infuncn,a,b,(func,a,b,count-1,epsabs,epsrel,args),epsabs=epsabs,epsrel=epsrel)
Basically, what you get is this construct that recursively calls into
QUADPACK with a series of simplifying assumptions (ie, fixed domain,
equivalent epsilon values propagated down to all invocations of quad)
and allowable calls from an arbitrarily-shaped function pointer.
Invoking this function for any number of terms is as easy as abusing the
lambda function and argument pointers in Python, like so:
nquad(lambda *args: my_nspace_function(args), min_bound, max_bound,
count=number_of_args)
For example:
# Define INF as float('inf')
points = [[1,2,3],[1,1,2],[2,2,6]]
gkde_n = gaussian_kde(points)
# ... compute Shannon entropy, store as variable "entropy".
mutual_info = lambda *args: gkde_n(args) * \
math.log((gkde_n(args) / (entropy_mult + \
MIN_DOUBLE)) + MIN_DOUBLE)
# Compute MI(X1..Xn)
(minfo_n, err_n) = nquad(mutual_info, -INF, INF, count=scale)
I bring this up here for two reasons. The first is I'd find this to be
greatly beneficial to the current Python interface into QUADPACK,
insofar as it solves the general case instead of forcing programmers to
rely on (for example) invocations directly into quad, dblquad, or
tplquad. The second is my surprise that a function to compute the
discrete and continuous Shannon-Weaver mutual information does not
appear to be extant in SciPy or NumPy, which I found greatly distressing
given their usefulness in discovering a select set of exceptionally
useful properties about paired data sets that are not well expressed in
monotonically-dependent tests for independence like Pearson's or
Spearman's correlations.
Hence my purpose bringing these issues up for general review. I am
absolutely convinced that my prototypical code above may be greatly
improved, and I would wholeheartedly support it insofar as I've been
spinning my wheels over the problem of a robust, fast implementation.
What I have is an implementation that reliably offers a good
approximation, complains bitterly when integration fails to rapidly
converge, and which makes relatively naive assumptions that could be
greatly improved upon by developers more actively involved in the code
than I am.
Thanks for reading. I do hope someone out there finds this post valuable.
- Tom Robinson
More information about the SciPy-Dev
mailing list