[Numpy-discussion] Efficient way of binning points and applying functions to these groups
Eric Emsellem
eric.emsellem at eso.org
Wed Dec 26 04:09:42 EST 2012
Hi!
I am looking for an efficient way of doing some simple binning of points
and then applying some functions to points within each bin.
I have tried several ways, including crude looping over the indices, or
using digitize (see below) but I cannot manage to get it as efficient as
I need it to be. I have a comparison with a similar (although complex)
code in idl, and thought I would ask the forum. In idl there is a way to
"invert" an histogram and get a reverse set of indices (via the
histogram function) which seems to do the trick (or maybe it is faster
for another reason).
Below I provide a (dummy) example of what I wish to achieve. Any hint on
how to do this EFFICIENTLY using numpy is most welcome. I need to speed
things up quite a bit (at the moment, the version I have, see below, is
10 times slower than the more complex idl routine..., I must be doing
something wrong!).
thanks!!
Eric
========================================================
# I have a random set of data points in 2D with coordinates x,y :
import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)
# I have now a 2D grid given by let's say 10x10 grid points:
nx = 11
ny = 21
lx = linspace(0,1,nx)
ly = linspace(0,1,ny)
gx, gy = np.meshgrid(lx, ly)
# So my set of 2D bins are (not needed in the solution I present but
just for clarity)
bins = np.dstack((gx.ravel(), gy.ravel()))[0]
# Now I want to have the list of points in each bin and
# if the number of points in that bin is larger than 10, apply (dummy)
function func1 (see below)
# If less than 10, apply (dummy) function func2 so (dum?)
# if 0, do nothing
# for two dummy functions like (for example):
def func1(x) : return x.mean()
def func2(x) : return x.std()
# One solution would be to use digitize in 1D and histogram in 2D (don't
need gx, gy for this one):
h = histogram2d(x, y, bins=[lx, ly])[0]
digitX = np.digitize(x, lx)
digitY = np.digitize(y, ly)
# create the output array, with -999 values to make sure I see which
ones are not filled in
result = np.zeros_like(h) - 999
for i in range(nx-1) :
for j in range(ny-1) :
selectionofpoints = (digitX == i+1) & (digitY == j+1)
if h[i,j] > 10 : result[i,j] = func1(x[selectionofpoints])
elif h[i,j] > 0 : result[i,j] = func2(x[selectionofpoints])
More information about the NumPy-Discussion
mailing list