[SciPy-User] Sort b according to histogram(a)

Wed Dec 16 15:52:52 EST 2009

On Wed, Dec 16, 2009 at 12:21 PM, Arthur M. Greene <amg at iri.columbia.edu> wrote:
> This can be accomplished in a loop, but I'm hoping there is a more efficient
> way: Starting with two 1-D arrays indexed the same, e.g.,
>
> x0,x1,x2... y0,y1,y2...
>
> the x's are first binned normally, i.e., a set of edges is defined and each
> x that falls in a particular bin generates a count. (This can be
> accomplished using histogram.) What I then need to do though, is find the
> average value of the corresponding y's. Example:
>
> x = (1,4,7), y = (200,100,1000), edges = (0,5,10)
>
> Then
>
> counts = (2,1), ydata = (150,1000)
>
> Size of x or y is only about 500, but the procedure needs to be repeated
> many times and looping makes the execution quite slow. I've been looking at
> np.digitize, but haven't quite figured out how this (or some other call I
> don't know) might be used to "vectorize" the process. Suggestions
> appreciated!

If you don't have many edges then looping might be faster than
something like this:

>> x = (1,4,7)
>> y = (200,100,1000)
>> edges = (0,5,10)

>> idx = np.digitize(x, edges)
>> idx
   array([1, 1, 2])

>> jdx = np.equal.outer(idx, np.unique1d(idx))
>> jdx

array([[ True, False],
       [ True, False],
       [False,  True]], dtype=bool)

Now, how to use jdx to find the means....