[SciPy-User] Probability Density Estimation

Mon Apr 11 05:52:50 EDT 2011

On Wed, Apr 06, 2011 at 11:47:26AM -0400, josef.pktd at gmail.com wrote:
> My objective was measures for general (non-linear) dependence between
> two random variables, and tests for independence.

OK.

The measure I need to estimate is

  I(X;Y) = H(X) - 0.5( H(X|Y=0) + H(X|Y=1) )

Where X is continuous and normally multivariate, and Y is boolean.
The entropy H is estimated as

  H(X) ~= sum log f(x)

where the sum is taken over all the observations x of X, and f is the
KDE.  This is following Ahmad and Lin (IEEE Trans IT 1989).

I initially estimated the KDE independently for the full set of X
observations, and for each class (Y=0 and Y=1).  This gave some
silly results.

A much better approach is to calculate the bandwidth once; I calculate
it from the full set of X observations, and then use the same bandwidth
matrix for all the three KDE-s.  Thanks to all of you for getting me on
track.

I'll see if I can clean up my annotated and augmented version of
gaussian_kde for publications.  I'll share when it is readable and
consistent :-)

If you have any ideas or are curious about further details, I shall be
happy to listen and discuss it further.

-- 
:-- Hans Georg