[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

Bruce Southey bsouthey at gmail.com
Mon Apr 7 10:44:10 EDT 2008


Hi,
Thanks David for pointing the piece of information I forgot to add in
my original email.

-1 for 'raise an exception' because, as Dan points out, the problem
stems from user providing bins.

+1 for the outliers keyword. Should 'exclude' distinguish points that
are too low and those that are too high?

+1 for axis.

Really I was only looking at seeing what it would take to close this
bug, but I am willing to test any code.

Thanks
Bruce



On Mon, Apr 7, 2008 at 8:55 AM, David Huard <david.huard at gmail.com> wrote:
> +1 for an outlier keyword. Note, that this implies that when bins are passed
> explicitly, the edges are given (nbins+1), not simply the left edges
> (nbins).
>
> While we are refactoring histogram, I'd suggest adding an axis keyword. This
> is pretty straightforward to implement using the np.apply_along_axis
> function.
>
> Also, I noticed that current normalization is buggy for non-uniform bin
> sizes.
>     if normed:
>         db = bins[1] - bins[0]
>         return 1.0/(a.size*db) * n, bins
>
> Finally, whatever option is chosen in the end, we should make sure it is
> consistent across all histogram functions. This may mean that we will also
> break the behavior of histogramdd and histogram2d.
>
> Bruce: I did some work over the weekend on the histogram function, including
> tests. If you want, I'll send that to you in the evening.
>
> David
>
>
>
>
> 2008/4/7, Hans Meine <meine at informatik.uni-hamburg.de>:
>
> > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald:
> >
> > > There's also a fourth option - raise an exception if any points are
> > > outside the range.
> >
> >
> > +1
> >
> > I think this should be the default.  Otherwise, I tend towards "exclude",
> in
> > order to have comparable bin sizes (when plotting, I always find peaks at
> the
> > ends annoying); this could also be called "clip" BTW.
> >
> > But really, an exception would follow the Zen: "In the face of ambiguity,
> > refuse the temptation to guess."  And with a kwarg: "Explicit is better
> than
> > implicit."
> >
> > histogram(a, arange(10), outliers = "clip")
> > histogram(a, arange(10), outliers = "include")
> > # better names? "include"->"accumulate"/"map to border"/"map"/"boundary"
> >
> >
> > --
> > Ciao, /  /
> >      /--/
> >     /  / ANS
> >
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
> _______________________________________________
>  Numpy-discussion mailing list
>  Numpy-discussion at scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list