[SciPy-Dev] binned_statistic: binnumber and array shapes

Luke Zoltan Kelley lzkelley at gmail.com
Tue Nov 3 10:03:49 EST 2015


The docs for `scipy.stats.binned_statistic` explain the `binnumber` returned arrays as:

binnumber : 1-D ndarray of ints
    This assigns to each observation an integer that represents the bin
    in which this observation falls. Array has the same length as `values`.

However, it's very difficult to understand how the returned values match this description.  For example:

>>> a1 = [0.1, 0.1, 0.1, 0.6]
>>> a2 = [2.1, 2.6, 2.1, 2.1]
>>> b1 = [0.0, 0.5, 1.0]
>>> b2 = [2.0, 2.5, 3.0]
>>> stats = scipy.stats.binned_statistic_2d(a1, a2, None, 'count', bins=[b1,b2])
BinnedStatistic2dResult(statistic=array([[ 2.,  1.],
       [ 1.,  0.]]), x_edge=array([ 0. ,  0.5,  1. ]), y_edge=array([ 2. ,  2.5,  3. ]), binnumber=array([5, 6, 5, 9]))

The resulting 'statistic' array makes sense; but the 'binnumber' array is... cryptic...
Before being returned, [`statistic` is reshaped and cleaned-up](https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L452-L461 <https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L452-L461>)

Should the same thing be happening to `binnumber`?

(Unfortunately) I created an [issue for this](https://github.com/scipy/scipy/issues/5449 <https://github.com/scipy/scipy/issues/5449>), but it seemed like this (the mailing list) was probably far more appropriate; woops.  One other minor point is that the docstring for `binned_statistic_2d` says that `x` and `y` can have different lengths.  I think that's a mistake; they have to be the same shape right?

Luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20151103/373c9888/attachment.html>


More information about the SciPy-Dev mailing list