[Numpy-discussion] histogram2d bug?

Emanuele Olivetti emanuele at relativita.com
Thu Apr 19 17:51:30 EDT 2007


David Huard wrote:
> Hi Emanuele,
>
> The bug is due to a part of the code that shifts the last bin's
> position to make sure the array's maximum value is counted in the last
> bin, and not as an outlier. To do so, the code computes an approximate
> precision used the shift the bin edge by amount small compared to the
> array's value. In your example, since all values in x are identical,
> the precision is ``infinite''. So my question is, what kind of
> behaviour would you be expecting in this case for the automatic
> placement of bin edges ?
>
> That is, given
> x : array of identical values, eg. [0, 0, 0, 0, 0, ..., 0]
> smin, smax = x.min(), x.max()
> How do you select the bin edges ?
>
> One solution is to use the same scheme used by histogram:
> if smin == smax:
>     edges[i] = linspace(smin-.5, smax+.5, nbin[i]+1)
>
> Would that be ok ?
>
> David
>
>
>  I'll submit a patch.
>
The histogram solution seems ok. I can't see drawbacks.
My concerns are about not having exception in degenerate
cases, like the example I sent. I need to estimate many probability
distributions counting samples efficiently so histogram*
functions are really nice.

Please submit the patch. By the way the same issue affects
histogramdd.

Thanks a lot,

Emanuele




More information about the NumPy-Discussion mailing list