[Numpy-discussion] creating zonal statistics from two arrays
josef.pktd at gmail.com
josef.pktd at gmail.com
Wed Dec 8 12:48:51 EST 2010
On Wed, Dec 8, 2010 at 12:12 PM, Gregory, Matthew
<matt.gregory at oregonstate.edu> wrote:
> Hi all,
>
> Likely a very newbie type of question. I'm using numpy with GDAL to calculate zonal statistics on images. The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population. (I'm well aware that this approach may have memory issues with large rasters ...)
>
> GDAL ReadAsArray gives you a chunk of raster data as a numpy array. Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way.
>
> zone_stats = {}
> zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size)
> value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size)
> for row in xrange(y_size):
> for col in xrange(x_size):
> zone = zone_block[row][col]
> value = value_block[row][col]
> try:
> zone_stats[zone].append(value)
> except KeyError:
> zone_stats[zone] = [value]
>
> # Then calculate stats per zone
> ...
Just a thought since I'm not doing spatial statistics.
If you can create (integer) labels that assigns each point to a zone,
then you can treat it essentially as a 1d grouped data, and you could
use np.bincount to calculate some statistics, or alternatively
scipy.ndimage.measurements for some additional statistics.
This would avoid any python loop, but require a full label array.
Josef
>
> Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed.
>
> matt
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list