[Numpy-discussion] Automatic number of bins for numpy histograms

Eric Moore ewm at redtetrahedron.org
Wed Apr 15 12:14:59 EDT 2015


This blog post, and the links within also seem relevant.  Appears to have
python code available to try things out as well.

https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest

-Eric

On Wed, Apr 15, 2015 at 11:24 AM, Benjamin Root <ben.root at ou.edu> wrote:

> "Then you can set about convincing matplotlib and friends to
> use it by default"
>
> Just to note, this proposal was originally made over in the matplotlib
> project. We sent it over here where its benefits would have wider reach.
> Matplotlib's plan is not to change the defaults, but to offload as much as
> possible to numpy so that it can support these new features if they are
> available. We might need to do some input validation so that users running
> older version of numpy can get a sensible error message.
>
> Cheers!
> Ben Root
>
>
> On Tue, Apr 14, 2015 at 7:12 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar <mistersheik at gmail.com>
>> wrote:
>> > Can I suggest that we instead add the P-square algorithm for the dynamic
>> > calculation of histograms?
>> > (
>> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
>> )
>> >
>> > This is already implemented in C++'s boost library
>> > (
>> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
>> )
>> >
>> > I implemented it in Boost Python as a module, which I'm happy to share.
>> > This is much better than fixed-width histograms in practice.  Rather
>> than
>> > adjusting the number of bins, it adjusts what you really want, which is
>> the
>> > resolution of the bins throughout the domain.
>>
>> This definitely sounds like a useful thing to have in numpy or scipy
>> (though if it's possible to do without using Boost/C++ that would be
>> nice). But yeah, we should leave the existing histogram alone (in this
>> regard) and add a new name for this like "adaptive_histogram" or
>> something. Then you can set about convincing matplotlib and friends to
>> use it by default :-)
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- http://vorpus.org
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150415/92b81d3d/attachment.html>


More information about the NumPy-Discussion mailing list