[SciPy-user] histogram(a, normed=True) doesn't normalize?

Martin Spacek scipy at mspacek.mm.st
Mon Jun 19 05:02:57 EDT 2006


I've searched around on this and I can't find anything. I'm confused by 
the 'normed' argument in histogram(). According to the numpy book:

"If normed is True, then the histogram will be normalized and comparable 
with a probability density function, otherwise it will be a count of the 
number of items in each bin."

The sum of the heights of all the bins should be 1 for a PDF (right?). 
But I get the following in numpy 0.9.8:

 >>> import numpy as np
 >>> np.histogram([1,2,3], bins=3, normed=False)
(array([1, 1, 1]), array([ 1.,  1.66666667,  2.33333333]))
 >>> np.histogram([1,2,3], bins=3, normed=True)
(array([ 0.5,  0.5,  0.5]), array([ 1.,  1.66666667,  2.33333333]))

Adding up the bins gives 1.5 in this case. Here's the code:

C:\bin\Python24\Lib\site-packages\numpy\lib\function_base.py:

def histogram(a, bins=10, range=None, normed=False):
     a = asarray(a).ravel()
     if not iterable(bins):
         if range is None:
             range = (a.min(), a.max())
         mn, mx = [mi+0.0 for mi in range]
         if mn == mx:
             mn -= 0.5
             mx += 0.5
         bins = linspace(mn, mx, bins, endpoint=False)

     n = sort(a).searchsorted(bins)
     n = concatenate([n, [len(a)]])
     n = n[1:]-n[:-1]

     if normed:
         db = bins[1] - bins[0]
         return 1.0/(a.size*db) * n, bins
     else:
         return n, bins

 From what I can tell, normed normalizes n by the total span of the 
bins, which seems an odd thing to do. Here's my interpretation of what 
it should do:

     if normed:
         return 1.0/sum(n) * n, bins
     else:
         return n, bins

which then gives me:

 >>> np.histogram([1,2,3], bins=3, normed=True)
(array([ 0.33333333,  0.33333333,  0.33333333]), array([ 1., 
1.66666667,  2.33333333]))

Which adds to 1. Am I way off on this?

Cheers,

Martin




More information about the SciPy-User mailing list