[SciPy-Dev] differential_entropy? quartile_coeff_dispersion?

Matt Haberland mhaberla at calpoly.edu
Mon Mar 22 02:31:50 EDT 2021


Two PRs I've reviewed recently are in reasonably good shape, but they
should be considered by the community before merging.

TL-DR:
Differential entropy of a continuous distribution from a sample in gh-13631
<https://github.com/scipy/scipy/pull/13631>; closes gh-4080
<https://github.com/scipy/scipy/issues/4080>.
Quartile Coefficient of Dispersion in gh-13475
<https://github.com/scipy/scipy/pull/13475>; closes gh-13385
<https://github.com/scipy/scipy/issues/13385>.

More information in the postscript. Thanks for your thoughts!
Matt

---

*Differential Entropy from a Sample*
@vnmabus submitted gh-13631 <https://github.com/scipy/scipy/pull/13631>,
which would add the function `differential_entropy`. This would close
gh-4080 <https://github.com/scipy/scipy/issues/4080>, which asks for a way
of approximating the differential entropy of a continuous distribution from
a sample. Currently, the function implements only the Vasicek estimator,
but there would be followup PR to add a `method` parameter to choose
between other estimation methods.

My opinion: it is essentially ready to merge and it would be a useful
addition. `scipy.stats.entropy`
<https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html>
is only for discrete distributions, and differential entropy is not the
continuous analog of discrete entropy, so I think it deserves its own
function to avoid confusion. There is one question in the PR about the file
in which the new function should live.

*Quartile/Quantile Coefficient of Dispersion*
@YarivLevy81 submitted gh-13475 <https://github.com/scipy/scipy/pull/13475/>,
which would add the function `quartile_coeff_dispersion`. This would close
gh-13385 <https://github.com/scipy/scipy/issues/13385> which asks for a
"robust variation" statistic.

My opinion: We might want to change the name to
`quantile_coeff_dispersion`, as it uses `np.quantile` and allows quantiles
other than 0.25/0.5/0.75. If there is interest in the function returning
additional information (e.g. confidence interval) in the future, we should
consider having the function return some sort of object other than a scalar
or array.


-- 
Matt Haberland
Assistant Professor
BioResource and Agricultural Engineering
08A-3K, Cal Poly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210321/e2267b98/attachment.html>


More information about the SciPy-Dev mailing list