[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

David Huard david.huard at gmail.com
Mon Aug 30 09:29:48 EDT 2010


Thanks for the feedback,

As far as I understand it, the proposition is to keep histogram as it is for
1.5, then in 2.0, deprecate normed=True but keep the buggy behavior, while
adding a density keyword that fixes the bug. In a later release, we could
then get rid of normed. While the bug won't be present in histogramdd and
histogram2d, the keyword change should be mirrored in those functions as
well.

I personally am not too keen on changing the keyword normed for density. I
feel we are trading clarity for a few new users against additional trouble
for many existing users. We could mitigate this by first documenting the
change in the docstring and live with both keywords for a few years before
raising a DeprecationWarning.

Since this has a direct impact on matloblib's hist, I'd be keen to hears the
devs on this.

David



On Sun, Aug 29, 2010 at 5:06 PM, Sebastian Haase <seb.haase at gmail.com>wrote:

> On Sun, Aug 29, 2010 at 3:21 PM, Nils Becker <n.becker at amolf.nl> wrote:
> >> On Sat, Aug 28, 2010 at 04:12, Zbyszek Szmek <zbyszek at in.waw.pl> wrote:
> >>> Hi,
> >>>
> >>> On Fri, Aug 27, 2010 at 06:43:26PM -0600, Charles R Harris wrote:
> >>>> ? ?On Fri, Aug 27, 2010 at 2:47 PM, Robert Kern <
> robert.kern at gmail.com>
> >>>> ? ?wrote:
> >>>>
> >>>> ? ? ?On Fri, Aug 27, 2010 at 15:32, David Huard <
> david.huard at gmail.com>
> >>>> ? ? ?wrote:
> >>>> ? ? ?> Nils and Joseph,
> >>>> ? ? ?> Thanks for the bug report, this is now fixed in SVN (r8672).
> >>>>
> >>>> ? ? ?While we're at it, can we change the name of the argument?
> "normed"
> >>>> ? ? ?has caused so much confusion over the years. We could deprecate
> >>>> ? ? ?normed=True in favor of pdf=True or density=True.
> >>> I think it might be a good moment to also include a different type of
> normalization:
> >>> ? ? ? n = n / n.sum()
> >>> i.e. the frequency of counts in each bin. This one is of course very
> simple to calculate
> >>> by hand, but very common. I think it would be useful to have this
> normalization
> >>> available too. [
> http://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm]
> >>
> >> My feeling is that this is trivial to do "by hand". I do not see a
> >> reason to add an option to histogram() to do this.
> >>
> > Hi,
> >
> > +1 for not silently changing the behavior of normed=True. (I'm one of
> > the people who have worked around it).
> >
> > One argument in favor of putting both normalizing styles 'frequency' and
> > 'density' may be that the documentation will automatically become very
> > clear. A user sees all options and there is little chance of a
> > misunderstanding. Of course, a sentence like "If you want frequency
> > normalization, use histogram(data, normalized=False)/sum(data)" would
> > also make things clear, without adding the frequency option.
> >
> I am in favor of adding an option for the density mode (not for this
> release I guess).
> I often have a long expressing in place of `data` and the one extra
> keyword saves lot's of typing.
>
> -Sebastian Haase
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100830/905133e6/attachment.html>


More information about the NumPy-Discussion mailing list