[Numpy-discussion] Automatic number of bins for numpy histograms

Neil Girdhar mistersheik at gmail.com
Wed Apr 15 11:06:48 EDT 2015


You got it.  I remember this from when I worked at Google and we would
process (many many) logs.  With enough bins, the approximation is still
really close.  It's great if you want to make an automatic plot of data.
Calling numpy.partition a hundred times is probably slower than calling P^2
with n=100 bins.  I don't think it does O(n) computations per point.  I
think it's more like O(log(n)).

Best,

Neil

On Wed, Apr 15, 2015 at 10:02 AM, Jaime Fernández del Río <
jaime.frio at gmail.com> wrote:

> On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
>
>> Yeah, I'm not arguing, I'm just curious about your reasoning.  That
>> explains why not C++.  Why would you want to do this in C and not Python?
>>
>
> Well, the algorithm has to iterate over all the inputs, updating the
> estimated percentile positions at every iteration. Because the estimated
> percentiles may change in every iteration, I don't think there is an easy
> way of vectorizing the calculation with numpy. So I think it would be very
> slow if done in Python.
>
> Looking at this in some more details, how is this typically used? Because
> it gives you approximate values that should split your sample into
> similarly filled bins, but because the values are approximate, to compute a
> proper histogram you would still need to do the binning to get the exact
> results, right? Even with this drawback P-2 does have an algorithmic
> advantage, so for huge inputs and many bins it should come ahead. But for
> many medium sized problems it may be faster to simply use np.partition,
> which gives you the whole thing in a single go. And it would be much
> simpler to implement.
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150415/d3d1f26d/attachment.html>


More information about the NumPy-Discussion mailing list