[SciPy-user] Pb with numpy.histogram

LB berthe.loic at gmail.com
Mon Oct 1 05:49:30 EDT 2007


> I think histogram has had this weird behavior since the numeric era and a
> lot of code may break if we fix it. Basically, histogram discards the lower
> than range values as outliers but puts the higher than range values into the
> last bin.
I think this should be clearly explained in the doc string. The
current doc string says
"Values outside of this range are allocated to the closest bin".
This is wrong, can leads to bug and should be fixed.

numpy.histogram's behavior seems still weirds to me, and I don't see
why values
lower than range should always be discarded as outliers.
If the real problem is cosistency with older versions from the numeric
era,
what about adding a new keyword to the function, says "discard", which
could be
used to decide what to do with values outside the range :
   - 'low'    	=> values lower than the range are discarded, values
higher are added to the last bin
   - 'up'	=> values higher than the range are discarded, values lower
are added to the first bin
   - 'out'	=> values out of the range are discarded
   - None	=> values outside of this range are allocated to the closest
bin

For compatibility reason, a default value of 'low' could be used.

> I'm generally using my own histograming routines, I could send them your way
> if you're interested.
Thanks, I will check the code you've put in the sandbow at home.

--
LB




More information about the SciPy-User mailing list