[SciPy-User] [SciPy-Dev] Entropy from empirical high-dimensional data

Thu May 26 01:17:31 EDT 2011

On Wed, May 25, 2011 at 03:43:30PM -0700, Nathaniel Smith wrote:
> Depending on the situation even this technique can generate extremely
> biased estimates (you basically end up measuring your sample size
> instead of the real entropy,

That's exactly what I have observed on small simulations and naive
estimators.

> What I've ended up doing for estimating entropy of word probability
> distributions (which have long Zipfian tails) is to just fit a
> reasonable parametric distribution (e.g., zeta) and then calculate the
> theoretical entropy of that distribution. Might be another approach
> worth considering, if you know enough about your data to do it.

Unfortunately, I want to use entropy as a model selection tool, thus
using a parameteric approximation is hard to justify.

That said, it seems that I might be able to get away with estimating the
entropy of the marginal distributions only, i.e. entropy in one
dimension, which is heaps easier.

Gael