[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)
Nils Becker
n.becker at amolf.nl
Fri Aug 6 11:46:53 EDT 2010
Hi,
I found what looks like a bug in histogram, when the option normed=True
is used together with non-uniform bins.
Consider this example:
import numpy as np
data = np.array([1, 2, 3, 4])
bins = np.array([.5, 1.5, 4.5])
bin_widths = np.diff(bins)
(counts, dummy) = np.histogram(data, bins)
(densities, dummy) = np.histogram(data, bins, normed=True)
What this gives is:
bin_widths
array([ 1., 3.])
counts
array([1, 3])
densities
array([ 0.1, 0.3])
The documentation claims that histogram with normed=True gives a
density, which integrates to 1. In this example, it is true that
(densities * bin_widths).sum() is 1. However, clearly the data are
equally spaced, so their density should be uniform and equal to 0.25.
Note that (0.25 * bin_widths).sum() is also 1.
I believe np.histogram(data, bins, normed=True) effectively does :
np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).
However, it _should_ do
np.histogram(data, bins, normed=False) / bins_widths
to get a true density over the data coordinate as a result. It's easy to
fix by hand, but I think the documentation is at least misleading?!
sorry if this has been discussed before; I did not find it anyway (numpy
1.3)
More information about the NumPy-Discussion
mailing list