[Numpy-discussion] Automatic number of bins for numpy histograms
Jaime Fernández del Río
jaime.frio at gmail.com
Sun Apr 12 03:45:12 EDT 2015
On Sun, Apr 12, 2015 at 12:19 AM, Varun <nayyarv at gmail.com> wrote:
>
> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
> tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb
>
> Long story short, histogram visualisations that depend on numpy (such as
> matplotlib, or nearly all of them) have poor default behaviour as I have
> to
> constantly play around with the number of bins to get a good idea of what
> I'm
> looking at. The bins=10 works ok for up to 1000 points or very normal
> data,
> but has poor performance for anything else, and doesn't account for
> variability either. I don't have a method easily available to scale the
> number
> of bins given the data.
>
> R doesn't suffer from these problems and provides methods for use with it's
> hist method. I would like to provide similar functionality for
> matplotlib, to
> at least provide some kind of good starting point, as histograms are very
> useful for initial data discovery.
>
> The notebook above provides an explanation of the problem as well as some
> proposed alternatives. Use different datasets (type and size) to see the
> performance of the suggestions. All of the methods proposed exist in R and
> literature.
>
> I've put together an implementation to add this new functionality, but am
> hesitant to make a pull request as I would like some feedback from a
> maintainer before doing so.
>
+1 on the PR.
Jaime
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150412/947990ae/attachment.html>
More information about the NumPy-Discussion
mailing list