[SciPy-Dev] Entropy from empirical high-dimensional data

Gael Varoquaux gael.varoquaux at normalesup.org
Wed May 25 17:35:33 EDT 2011


Hi list,

I am looking at estimating entropy and conditional entropy from data for
which I have only access to observations, and not the underlying
probabilistic laws.

With low dimensional data, I would simply use an empirical estimate of
the probabilities by converting each observation to its quantile, and
then apply the standard formula for entropy (for instance using
scipy.stats.entropy).

However, I have high-dimensional data (~100 features, and 30000
observations). Not only is it harder to convert observations to
probabilities in the empirical law, but I am also worried of curse of
dimensionality effects: density estimation in high-dimension is a
difficult problem.

Does anybody has advices, or code in Python to point to, for this task?

Cheers,

Gaël



More information about the SciPy-Dev mailing list