[SciPy-user] Pb with numpy.histogram

David Huard david.huard at gmail.com
Mon Oct 1 10:57:07 EDT 2007


2007/10/1, LB <berthe.loic at gmail.com>:
>
> > I think histogram has had this weird behavior since the numeric era and
> a
> > lot of code may break if we fix it. Basically, histogram discards the
> lower
> > than range values as outliers but puts the higher than range values into
> the
> > last bin.
> I think this should be clearly explained in the doc string. The
> current doc string says
> "Values outside of this range are allocated to the closest bin".
> This is wrong, can leads to bug and should be fixed.


You're right. In fact, it said so at some point but it seems it has been
edited out.


numpy.histogram's behavior seems still weirds to me, and I don't see
> why values
> lower than range should always be discarded as outliers.
> If the real problem is cosistency with older versions from the numeric
> era,
> what about adding a new keyword to the function, says "discard", which
> could be
> used to decide what to do with values outside the range :
>    - 'low'      => values lower than the range are discarded, values
> higher are added to the last bin
>    - 'up'       => values higher than the range are discarded, values
> lower
> are added to the first bin
>    - 'out'      => values out of the range are discarded
>    - None       => values outside of this range are allocated to the
> closest
> bin
>
> For compatibility reason, a default value of 'low' could be used.


Good idea. Better yet would be to raise a deprecation warning and change the
function in the next or second next release, and ideally replace it with
something written in C to speed things up. The final decision is up to
someone else than me, though.

Cheers,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20071001/fe6c6585/attachment.html>


More information about the SciPy-User mailing list