[Numpy-discussion] 2D binning
Zachary Pincus
zachary.pincus at yale.edu
Tue Jun 1 16:49:16 EDT 2010
> Hi
> Can anyone think of a clever (non-lopping) solution to the following?
>
> A have a list of latitudes, a list of longitudes, and list of data
> values. All lists are the same length.
>
> I want to compute an average of data values for each lat/lon pair.
> e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
> data[1001] = (data[1001] + data[2001])/2
>
> Looping is going to take wayyyy to long.
As a start, are the "equal" lat/lon pairs exactly equal (i.e. either
not floating-point, or floats that will always compare equal, that is,
the floating-point bit-patterns will be guaranteed to be identical) or
approximately equal to float tolerance?
If you're in the approx-equal case, then look at the KD-tree in scipy
for doing near-neighbors queries.
If you're in the exact-equal case, you could consider hashing the lat/
lon pairs or something. At least then the looping is O(N) and not
O(N^2):
import collections
grouped = collections.defaultdict(list)
for lt, ln, da in zip(lat, lon, data):
grouped[(lt, ln)].append(da)
averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items())
Is that fast enough?
Zach
More information about the NumPy-Discussion
mailing list