[SciPy-User] Probability Density Estimation
Hans Georg Schaathun
hg+scipy at schaathun.net
Mon Apr 11 05:52:50 EDT 2011
On Wed, Apr 06, 2011 at 11:47:26AM -0400, josef.pktd at gmail.com wrote:
> My objective was measures for general (non-linear) dependence between
> two random variables, and tests for independence.
OK.
The measure I need to estimate is
I(X;Y) = H(X) - 0.5( H(X|Y=0) + H(X|Y=1) )
Where X is continuous and normally multivariate, and Y is boolean.
The entropy H is estimated as
H(X) ~= sum log f(x)
where the sum is taken over all the observations x of X, and f is the
KDE. This is following Ahmad and Lin (IEEE Trans IT 1989).
I initially estimated the KDE independently for the full set of X
observations, and for each class (Y=0 and Y=1). This gave some
silly results.
A much better approach is to calculate the bandwidth once; I calculate
it from the full set of X observations, and then use the same bandwidth
matrix for all the three KDE-s. Thanks to all of you for getting me on
track.
I'll see if I can clean up my annotated and augmented version of
gaussian_kde for publications. I'll share when it is readable and
consistent :-)
If you have any ideas or are curious about further details, I shall be
happy to listen and discuss it further.
--
:-- Hans Georg
More information about the SciPy-User
mailing list