[SciPy-Dev] binned_statistic: binnumber and array shapes
Luke Zoltan Kelley
lzkelley at gmail.com
Tue Nov 3 10:31:50 EST 2015
> On Nov 3, 2015, at 10:29 AM, josef.pktd at gmail.com wrote:
>
>
>
> On Tue, Nov 3, 2015 at 10:03 AM, Luke Zoltan Kelley <lzkelley at gmail.com <mailto:lzkelley at gmail.com>> wrote:
> The docs for `scipy.stats.binned_statistic` explain the `binnumber` returned arrays as:
>
> binnumber : 1-D ndarray of ints
> This assigns to each observation an integer that represents the bin
> in which this observation falls. Array has the same length as `values`.
>
> However, it's very difficult to understand how the returned values match this description. For example:
>
> >>> a1 = [0.1, 0.1, 0.1, 0.6]
> >>> a2 = [2.1, 2.6, 2.1, 2.1]
> >>> b1 = [0.0, 0.5, 1.0]
> >>> b2 = [2.0, 2.5, 3.0]
> >>> stats = scipy.stats.binned_statistic_2d(a1, a2, None, 'count', bins=[b1,b2])
> BinnedStatistic2dResult(statistic=array([[ 2., 1.],
> [ 1., 0.]]), x_edge=array([ 0. , 0.5, 1. ]), y_edge=array([ 2. , 2.5, 3. ]), binnumber=array([5, 6, 5, 9]))
>
> The resulting 'statistic' array makes sense; but the 'binnumber' array is... cryptic...
>
>
> My guess is that there are outlier bins, one row and one column in front of the actual bins, so counting for binnumber is based on a a larger array.
>
>
> 0, 1, 2, 3
> 4, 5*, 6*, 7
> 8, 9*, 10*, 11
> 12, 13, 14, 15
That's exactly what `statistic` looks like before having the 'outlier' bins cleaned up. It doesn't seem like there's any benefit to preserving this format.
> You only get the start entries in the center without outliers.
>
> I didn't check the details.
>
> Josef
>
>
> Before being returned, [`statistic` is reshaped and cleaned-up](https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L452-L461 <https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L452-L461>)
>
> Should the same thing be happening to `binnumber`?
>
> (Unfortunately) I created an [issue for this](https://github.com/scipy/scipy/issues/5449 <https://github.com/scipy/scipy/issues/5449>), but it seemed like this (the mailing list) was probably far more appropriate; woops. One other minor point is that the docstring for `binned_statistic_2d` says that `x` and `y` can have different lengths. I think that's a mistake; they have to be the same shape right?
>
> Luke
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org <mailto:SciPy-Dev at scipy.org>
> https://mail.scipy.org/mailman/listinfo/scipy-dev <https://mail.scipy.org/mailman/listinfo/scipy-dev>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20151103/43e318ce/attachment.html>
More information about the SciPy-Dev
mailing list