[Numpy-discussion] Multidimension array access in C via Python API

Tue Apr 5 14:16:14 EDT 2016

On Tue, Apr 5, 2016 at 9:48 AM, mpc <matt.p.conte at gmail.com> wrote:

> The idea is that I want to thin a large 2D buffer of x,y,z points to a
> given
> resolution by dividing the data into equal sized "cubes" (i.e. resolution
> is
> number of cubes along each axis) and averaging the points inside each cube
> (if any).
>

are the original x,y,z points aranged along a nice even grid? or
arbitrarily spaced?

if the former, I have Cython code that does that :-) I could dig it up,
haven't used it in a while. or scikit.image might have something.

If the latter, then Ben is right -- you NEED a spatial index --
scipy.spatial.kdtree will probably do what you want, though it would be
easier to use a sphere to average over than a cube.

Also, maybe Kernel Density Estimation could help here????

https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/

Otherwise, you could use Cython to write a non-vectorized version of your
below code -- it would be order NM where N is the number of "cubes" and M
is the number of original points. I think, but would be a lot faster than
the pure python.

-CHB

Here is where you would do the cython:
    while x_idx < max_x:

>         y_idx = min_y
>         while y_idx < max_y:
>             z_idx = min_z
>             while z_idx < max_z:
>                 inside_block_points = np.where((x_buffer >= x_idx) &
>                                                              (x_buffer <=
> x_idx + x_block) &
>                                                              (y_buffer >=
> y_idx) &
>                                                              (y_buffer <=
> y_idx + y_block) &
>                                                              (z_buffer >=
> z_idx) &
>                                                              (z_buffer <=
> z_idx + z_block))
>

instead of where, you could loop through all your points and find the ones
inside your extents.

though now that I think about it -- you are mapping arbitrary points to a
regular grid, so you only need to go through the points once, assigning
each one to a bin, and then compute the average in each bin.

Is this almost a histogram?

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160405/0eafcb0f/attachment.html>