[SciPy-User] [SciPy-Dev] Entropy from empirical high-dimensional data

josef.pktd at gmail.com josef.pktd at gmail.com
Wed May 25 19:48:36 EDT 2011


On Wed, May 25, 2011 at 6:45 PM,  <josef.pktd at gmail.com> wrote:
> On Wed, May 25, 2011 at 5:40 PM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
>> Sorry for the noise, I sent this to the dev list, while it belongs to the
>> user list.
>>
>> Hi list,
>>
>> I am looking at estimating entropy and conditional entropy from data for
>> which I have only access to observations, and not the underlying
>> probabilistic laws.
>>
>> With low dimensional data, I would simply use an empirical estimate of
>> the probabilities by converting each observation to its quantile, and
>> then apply the standard formula for entropy (for instance using
>> scipy.stats.entropy).
>>
>> However, I have high-dimensional data (~100 features, and 30000
>> observations). Not only is it harder to convert observations to
>> probabilities in the empirical law, but I am also worried of curse of
>> dimensionality effects: density estimation in high-dimension is a
>> difficult problem.
>>
>> Does anybody has advices, or code in Python to point to, for this task?
>
> 30000 doesn't sound like a lot of observations for 100 dimensions,
> 2**100 bins is pretty large, so binning sounds pretty impossible.
>
> Are you willing to impose some structure, (a gaussian copula might be
> able to handle it, or blockwise independence (?)). But even then
> integration in 100 dimension sounds tough.
>
> gaussian_kde with Monte Carlo Integration ?
>
> Maybe a PCA or some other dimension reduction helps, if the data is
> cluster in some dimensions.

maybe what this one might be talking about
http://www.cs.utah.edu/~suyash/Dissertation_html/node13.html

(It's not quite clear whether you have a discrete sample space like in
the reference of Nathaniel, or a continuous space in R^100)

Josef

>
> Josef
>
>>
>> Cheers,
>>
>> Gaël
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>



More information about the SciPy-User mailing list