[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Sun Aug 29 09:21:55 EDT 2010

> On Sat, Aug 28, 2010 at 04:12, Zbyszek Szmek <zbyszek at in.waw.pl> wrote:
>> Hi,
>>
>> On Fri, Aug 27, 2010 at 06:43:26PM -0600, Charles R Harris wrote:
>>> ? ?On Fri, Aug 27, 2010 at 2:47 PM, Robert Kern <robert.kern at gmail.com>
>>> ? ?wrote:
>>>
>>> ? ? ?On Fri, Aug 27, 2010 at 15:32, David Huard <david.huard at gmail.com>
>>> ? ? ?wrote:
>>> ? ? ?> Nils and Joseph,
>>> ? ? ?> Thanks for the bug report, this is now fixed in SVN (r8672).
>>>
>>> ? ? ?While we're at it, can we change the name of the argument? "normed"
>>> ? ? ?has caused so much confusion over the years. We could deprecate
>>> ? ? ?normed=True in favor of pdf=True or density=True.
>> I think it might be a good moment to also include a different type of normalization:
>> ? ? ? n = n / n.sum()
>> i.e. the frequency of counts in each bin. This one is of course very simple to calculate
>> by hand, but very common. I think it would be useful to have this normalization
>> available too. [http://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm]
> 
> My feeling is that this is trivial to do "by hand". I do not see a
> reason to add an option to histogram() to do this.
> 
Hi,

+1 for not silently changing the behavior of normed=True. (I'm one of
the people who have worked around it).

One argument in favor of putting both normalizing styles 'frequency' and
'density' may be that the documentation will automatically become very
clear. A user sees all options and there is little chance of a
misunderstanding. Of course, a sentence like "If you want frequency
normalization, use histogram(data, normalized=False)/sum(data)" would
also make things clear, without adding the frequency option.

Nils