[SciPy-user] histogram(a, normed=True) doesn't normalize?
Martin Spacek
scipy at mspacek.mm.st
Mon Jun 19 05:02:57 EDT 2006
I've searched around on this and I can't find anything. I'm confused by
the 'normed' argument in histogram(). According to the numpy book:
"If normed is True, then the histogram will be normalized and comparable
with a probability density function, otherwise it will be a count of the
number of items in each bin."
The sum of the heights of all the bins should be 1 for a PDF (right?).
But I get the following in numpy 0.9.8:
>>> import numpy as np
>>> np.histogram([1,2,3], bins=3, normed=False)
(array([1, 1, 1]), array([ 1., 1.66666667, 2.33333333]))
>>> np.histogram([1,2,3], bins=3, normed=True)
(array([ 0.5, 0.5, 0.5]), array([ 1., 1.66666667, 2.33333333]))
Adding up the bins gives 1.5 in this case. Here's the code:
C:\bin\Python24\Lib\site-packages\numpy\lib\function_base.py:
def histogram(a, bins=10, range=None, normed=False):
a = asarray(a).ravel()
if not iterable(bins):
if range is None:
range = (a.min(), a.max())
mn, mx = [mi+0.0 for mi in range]
if mn == mx:
mn -= 0.5
mx += 0.5
bins = linspace(mn, mx, bins, endpoint=False)
n = sort(a).searchsorted(bins)
n = concatenate([n, [len(a)]])
n = n[1:]-n[:-1]
if normed:
db = bins[1] - bins[0]
return 1.0/(a.size*db) * n, bins
else:
return n, bins
From what I can tell, normed normalizes n by the total span of the
bins, which seems an odd thing to do. Here's my interpretation of what
it should do:
if normed:
return 1.0/sum(n) * n, bins
else:
return n, bins
which then gives me:
>>> np.histogram([1,2,3], bins=3, normed=True)
(array([ 0.33333333, 0.33333333, 0.33333333]), array([ 1.,
1.66666667, 2.33333333]))
Which adds to 1. Am I way off on this?
Cheers,
Martin
More information about the SciPy-User
mailing list