[SciPy-Dev] Adding statistical distances

Ralf Gommers ralf.gommers at gmail.com
Sun Jul 16 06:27:56 EDT 2017


Hi Charles,


On Thu, Jul 6, 2017 at 2:40 AM, Charles-Philippe Masson <
charles.masson at datadoghq.com> wrote:

> Hi,
>
> I am a data scientist at Datadog, a cloud monitoring company. We have been
> working with statistical distances, which are distances between
> distributions, and more specifically on a family of distances that can be
> computed from CDFs, e.g., the first Wasserstein distance and the Cramér-von
> Mises distance.
>
> We wrote and optimized some code in Python to compute those distances.
> Since those distances have various applications, we think that it might be
> helpful to others and that is why we intend to share it. Here is the PR:
> https://github.com/scipy/scipy/pull/7563
>

Thanks for contributing!

I put the code in scipy.stats.stats as statistical distances share common
> features and applications with statistical tests (such as chisquare or
> ks_2samp) but let me know if that is not the appropriate place.
>

I had a look at the other possible place to put them,
scipy.spatial.distance. While it could fit there as well - your function
signatures fit with distance.cdist - I agree that putting statistical
distances in scipy.stats makes more sense. The Kullback-Leibler divergence
is also present in scipy.stats already (a bit hidden, it's in `entropy`).

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20170716/87b9e7f9/attachment.html>


More information about the SciPy-Dev mailing list