[SciPy-Dev] Entropy from empirical high-dimensional data

Thu May 26 04:17:16 EDT 2011

Hi Gael,

I recently played with a related problem which you might find of interest:

(short paper)
http://nilab.cimec.unitn.it/people/olivetti/work/prni2011/olivetti_bayes_error.pdf

(slides)
http://nilab.cimec.unitn.it/people/olivetti/work/prni2011/olivetti_prni2011_bayesian.pdf

The proposed model can be used to estimate the posterior probability of
information given observations and using classifiers. Note that these are just preliminary
results. If this is of some help for you just let me know :-)

I've recently talked to Stephen Strother about this topics and he pointed me to this
paper:
http://www.ncbi.nlm.nih.gov/pubmed/20533565

HTH,

Emanuele

On 05/25/2011 11:35 PM, Gael Varoquaux wrote:
> Hi list,
>
> I am looking at estimating entropy and conditional entropy from data for
> which I have only access to observations, and not the underlying
> probabilistic laws.
>
> With low dimensional data, I would simply use an empirical estimate of
> the probabilities by converting each observation to its quantile, and
> then apply the standard formula for entropy (for instance using
> scipy.stats.entropy).
>
> However, I have high-dimensional data (~100 features, and 30000
> observations). Not only is it harder to convert observations to
> probabilities in the empirical law, but I am also worried of curse of
> dimensionality effects: density estimation in high-dimension is a
> difficult problem.
>
> Does anybody has advices, or code in Python to point to, for this task?
>
> Cheers,
>
> Gaël
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>