[Numpy-discussion] Automatic number of bins for numpy histograms

Jaime Fernández del Río jaime.frio at gmail.com
Sun Apr 12 03:45:12 EDT 2015


On Sun, Apr 12, 2015 at 12:19 AM, Varun <nayyarv at gmail.com> wrote:

>
> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
> tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb
>
> Long story short, histogram visualisations that depend on numpy (such as
> matplotlib, or  nearly all of them) have poor default behaviour as I have
> to
> constantly play around with  the number of bins to get a good idea of what
> I'm
> looking at. The bins=10 works ok for  up to 1000 points or very normal
> data,
> but has poor performance for anything else, and  doesn't account for
> variability either. I don't have a method easily available to scale the
> number
> of bins given the data.
>
> R doesn't suffer from these problems and provides methods for use with it's
> hist  method. I would like to provide similar functionality for
> matplotlib, to
> at least provide  some kind of good starting point, as histograms are very
> useful for initial data discovery.
>
> The notebook above provides an explanation of the problem as well as some
> proposed  alternatives. Use different datasets (type and size) to see the
> performance of the  suggestions. All of the methods proposed exist in R and
> literature.
>
> I've put together an implementation to add this new functionality, but am
> hesitant to  make a pull request as I would like some feedback from a
> maintainer before doing so.
>

+1 on the PR.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150412/947990ae/attachment.html>


More information about the NumPy-Discussion mailing list