[SciPy-Dev] binned_statistic: binnumber and array shapes

Luke Zoltan Kelley lzkelley at gmail.com
Tue Nov 3 10:31:50 EST 2015


> On Nov 3, 2015, at 10:29 AM, josef.pktd at gmail.com wrote:
> 
> 
> 
> On Tue, Nov 3, 2015 at 10:03 AM, Luke Zoltan Kelley <lzkelley at gmail.com <mailto:lzkelley at gmail.com>> wrote:
> The docs for `scipy.stats.binned_statistic` explain the `binnumber` returned arrays as:
> 
> binnumber : 1-D ndarray of ints
>     This assigns to each observation an integer that represents the bin
>     in which this observation falls. Array has the same length as `values`.
> 
> However, it's very difficult to understand how the returned values match this description.  For example:
> 
> >>> a1 = [0.1, 0.1, 0.1, 0.6]
> >>> a2 = [2.1, 2.6, 2.1, 2.1]
> >>> b1 = [0.0, 0.5, 1.0]
> >>> b2 = [2.0, 2.5, 3.0]
> >>> stats = scipy.stats.binned_statistic_2d(a1, a2, None, 'count', bins=[b1,b2])
> BinnedStatistic2dResult(statistic=array([[ 2.,  1.],
>        [ 1.,  0.]]), x_edge=array([ 0. ,  0.5,  1. ]), y_edge=array([ 2. ,  2.5,  3. ]), binnumber=array([5, 6, 5, 9]))
> 
> The resulting 'statistic' array makes sense; but the 'binnumber' array is... cryptic...
> 
> 
> My guess is that there are outlier bins, one row and one column in front of the actual bins, so counting for binnumber is based on a a larger array.
> 
> 
> 0, 1, 2, 3
> 4, 5*, 6*, 7
> 8, 9*, 10*, 11
> 12, 13, 14, 15

That's exactly what `statistic` looks like before having the 'outlier' bins cleaned up.  It doesn't seem like there's any benefit to preserving this format.

> You only get the start entries in the center without outliers.
> 
> I didn't check the details.
> 
> Josef
> 
>  
> Before being returned, [`statistic` is reshaped and cleaned-up](https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L452-L461 <https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L452-L461>)
> 
> Should the same thing be happening to `binnumber`?
> 
> (Unfortunately) I created an [issue for this](https://github.com/scipy/scipy/issues/5449 <https://github.com/scipy/scipy/issues/5449>), but it seemed like this (the mailing list) was probably far more appropriate; woops.  One other minor point is that the docstring for `binned_statistic_2d` says that `x` and `y` can have different lengths.  I think that's a mistake; they have to be the same shape right?
> 
> Luke
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org <mailto:SciPy-Dev at scipy.org>
> https://mail.scipy.org/mailman/listinfo/scipy-dev <https://mail.scipy.org/mailman/listinfo/scipy-dev>
> 
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20151103/43e318ce/attachment.html>


More information about the SciPy-Dev mailing list