[SciPy-Dev] Entropy Calculations

Michael Nowotny nowotnym at gmail.com
Fri Jul 31 00:07:02 EDT 2020


Hi Matt,

I have added a reference section to the readme on GitHub. The formulas were implemented straight from articles in wikipedia.

The density for continuous distributions is estimated via kernel methods. At the moment I am using KDE objects from the statsmodels package. We could just as well switch to SciPy’s gaussian KDE implementation if there is a preference to avoid dependency on statsmodels.

It turns out that quadpy is not needed anymore. It has been superseded by the cubature package which is a python wrapper around this C library: https://github.com/stevengj/cubature <https://github.com/stevengj/cubature>. I am using its adaptive Clenshaw-Curtis based rules which perform better than SciPy’s quad for this particular application. This could easily be reverted back to SciPy’s quadrature functions - albeit at a performance loss.

For 10 million observations, the conditional entropy calculation is about 3 times faster with Numba than without. Numba makes little difference for the other information theoretic measures. We could probably remove Numba in the first iteration and rewrite the code for conditional entropy for a sample from a discrete distribution in C or Cython at some point if that 3x performance benefit is deemed important enough.

Cocos is my own NumPy-like multi-GPU computing package for Python which includes an adaptation of SciPy’s gaussian_kde class for GPUs (see here https://github.com/michaelnowotny/cocos <https://github.com/michaelnowotny/cocos>). Since SciPy itself does not feature GPU support, I would suggest to simply not include any Cocos-based GPU accelerated functionality in divergence.

Best, 

Michael

Date: Wed, 29 Jul 2020 13:23:06 -0700
From: Matt Haberland <mhaberla at calpoly.edu <mailto:mhaberla at calpoly.edu>>
To: SciPy Developers List <scipy-dev at python.org <mailto:scipy-dev at python.org>>
Subject: Re: [SciPy-Dev] Entropy Calculations
Message-ID:
	<CADuxUixZ76sKz4KgjKCTdvDrJqoJtf7Ak7au+o1S1j0DfAEuEw at mail.gmail.com <mailto:CADuxUixZ76sKz4KgjKCTdvDrJqoJtf7Ak7au+o1S1j0DfAEuEw at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8”

Thanks for letting us know. Can you send a reference for the algorithms you
implemented? I didn't see any in a quick look through the notebook and code.

Also, I see that this uses Numba, but we don't have that as a dependency
yet. How important is that speedup? How essential are the other
dependencies - cocos, cubature, quadpy?

---------- Forwarded message ---------
From: Michael Nowotny <nowotnym at gmail.com <mailto:nowotnym at gmail.com>>
Date: Fri, Jul 24, 2020 at 7:00 PM
Subject: [SciPy-Dev] Entropy Calculations
To: <scipy-dev at python.org <mailto:scipy-dev at python.org>>


Dear SciPy developers,

I have noticed that the statistical functions for the calculation of
entropy and KL divergence currently only support discrete distributions for
which the probability mass function is known. I recently needed to compute
various information theoretic measures from samples of distributions and
created the package `Divergence`. It offers functionality for entropy,
cross entropy, relative entropy, Jensen-Shannon divergence, joint entropy,
conditional entropy, and mutual information and is available on GitHub at
https://github.com/michaelnowotny/divergence <https://github.com/michaelnowotny/divergence>. It supports samples from both
discrete and continuous distributions. Continuous distributions are
implement via numerical integration of kernel density estimates generated
from the sample. I would be happy to contribute some or all of its
functionality to SciPy. Please let me know if you are interested.

Thank you,

Michael
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev <https://mail.python.org/mailman/listinfo/scipy-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200730/86cd7cd2/attachment.html>


More information about the SciPy-Dev mailing list