[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

Tommy Grav tgrav at mac.com
Mon Apr 7 16:26:29 EDT 2008


On Apr 7, 2008, at 4:14 PM, LB wrote:
> +1 for axis and +1 for a keyword to define what to do with values
> outside the range.
>
> For the keyword, ather than 'outliers', I would propose 'discard' or
> 'exclude', because it could be used to describe the four
> possibilities :
>  - discard='low'      => values lower than the range are discarded,
> values higher are added to the last bin
>   - discard='up'       => values higher than the range are discarded,
> values lower are added to the first bin
>   - discard='out'      => values out of the range are discarded
>   - discard=None    => values outside of this range are allocated to
> the closest bin
>
> For the default behavior, most of the case, the sum of the bins 's
> population should be equal to the size of the original one for me, so
> I would prefer discard=None. But I'm also okay with discard='low' in
> order not to break older code, if this is clearly stated.

It seems that people in this discussion are forgetting that the bins
are actually defined by the lower boundaries supplied, such that

bins = [1,3,5]

actually currently means

bin1 -> 1 to 2.99999...
bin2 -> 3 to 4.99999...
bin3 -> 5 to inf

(of course in version 1.0.1 the documentation is inconsistent with the
behavior as described by the original poster). This definition of bins
makes it hard to exclude values as it forces the user to give an extra
value in the bin definition, i.e. the bins statement above only give two
bins, while supplying three values. That seems confusing to me.

I am not sure what the right approach is, but currently using range will
clip the values outside the number the user wants.

Cheers
   Tommy







More information about the NumPy-Discussion mailing list