[Numpy-discussion] effectively computing variograms with numpy

Fri Jun 22 11:18:12 EDT 2007

Tim,

this is the best I could come up with until now:

import numpy as N

def naive_variogram(data, binsize=100., stepsize=5.):
    """calculates variograms along the rows and columns of the given
    array which is supposed to contain equally spaced data with
    stepsize stepsize"""

    # how many elements do fit in one bin?

    binlength = int(binsize/stepsize)

    #bins in x- and y- direction (+1 for the possible 
    #elements larger than int(binsize/stepsize):
    x_bins = (data.shape[1])/binlength+1
    y_bins = (data.shape[0])/binlength+1

    #arrays to store the reuslts in
    x_result = N.zeros(x_bins, dtype = float)
    y_result = N.zeros(y_bins, dtype = float)

    #arrays to get teh denominators right
    x_denominators = N.zeros(x_bins, dtype=float)
    y_denominators = N.zeros(x_bins, dtype=float)

    #what is the last index?
    xlast = data.shape[1]
    ylast = data.shape[0]
    for i in range(data.shape[0]):
        datasquared = (data - data[:,i])**2
        #number of bins to fill until the end of the array:
        numbins = 1 + (xlast - i)/binlength
        for j in range(numbins):
            x_result[j]+=\
datasquared[:,i+1+j*binlength:i+1+(j+1)*binlength].sum()
            x_denominators[j] +=\
datasquared[:,i+1+j*binlength:i+1+(j+1)*binlength].size
        try:
            #Is there a rest?
            x_result[numbins] +=
datasquared[:,i+1+numbins*binlength:].sum()
            x_denominators[numbins] +=
datasquared[:,i+1+numbins*binlength:].size
        except IndexError:
            pass

    x_result /= x_denominators

    for i in range(data.shape[1]):
        datasquared = (data - data[i])**2
        #number of bins to fill until the end of the array:
        numbins = 1 + (ylast - i)/binlength
        #Fill the bins
        for j in range(numbins):

y_result[j]+=datasquared[i+1+j*binlength:i+1+(j+1)*binlength].sum()
            y_denominators[j] +=
datasquared[i+1+j*binlength:i+1+(j+1)*binlength].size
        try:
            #Is there a rest?
            y_result[numbins] +=
datasquared[:,i+1+numbins*binlength:].sum()
            y_denominators[numbins] +=
datasquared[:,i+1+numbins*binlength:].size
        except IndexError:
            pass

    y_result /= y_denominators

    return x_result, y_result

Thanks,
Hanno

Timothy Hochberg <tim.hochberg at ieee.org> said:

> ------=_Part_157389_1558912.1182523880067
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> On 6/22/07, Hanno Klemm <klemm at phys.ethz.ch> wrote:
> >
> >
> > Hi,
> >
> > I have an array which represents regularly spaced spatial data. I now
> > would like to compute the (semi-)variogram, i.e.
> >
> > gamma(h) = 1/N(h) \sum_{i,j\in N(h)} (z_i - z_j)**2,
> >
> > where h is the (approximate) spatial difference between the
> > measurements z_i, and z_j, and N(h) is the number of measurements with
> > distance h.
> >
> > However, I only want to calculate the thing along the rows and
> > columns. The naive approach involves two for loops and a lot of
> > searching, which becomes painfully slow on large data sets. Are there
> > better implementations around in numpy/scipy or does anyone have a
> > good idea of how to do that more efficient? I looked around a bit but
> > couldn't find anything.
> 
> 
> Can you send the naive code as well. Its often easier to see what's
going on
> with code in addition to the equations.
> 
> Regards.
> 
> -tim
> 
> 
> 
> -- 
> .  __
> .   |-\
> .
> .  tim.hochberg at ieee.org
> 
> ------=_Part_157389_1558912.1182523880067
> Content-Type: text/html; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> <br><br><div><span class="gmail_quote">On 6/22/07, <b
class="gmail_sendername">Hanno Klemm</b> <<a
href="mailto:klemm at phys.ethz.ch">klemm at phys.ethz.ch</a>>
wrote:</span><blockquote class="gmail_quote" style="border-left: 1px
solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> <br>Hi,<br><br>I have an array which represents regularly spaced
spatial data. I now<br>would like to compute the (semi-)variogram,
i.e.<br><br>gamma(h) = 1/N(h) \sum_{i,j\in N(h)} (z_i -
z_j)**2,<br><br>where h is the (approximate) spatial difference
between the
> <br>measurements z_i, and z_j, and N(h) is the number of
measurements with<br>distance h.<br><br>However, I only want to
calculate the thing along the rows and<br>columns. The naive approach
involves two for loops and a lot of
> <br>searching, which becomes painfully slow on large data sets. Are
there<br>better implementations around in numpy/scipy or does anyone
have a<br>good idea of how to do that more efficient? I looked around
a bit but<br>couldn't find anything.
> </blockquote><div><br>Can you send the naive code as well. Its often
easier to see what's going on with code in addition to the
equations.<br><br>Regards.<br><br>-tim<br><br></div></div><br
clear="all"><br>-- <br>.  __
> <br>.   |-\<br>.<br>.  <a
href="mailto:tim.hochberg at ieee.org">tim.hochberg at ieee.org</a>
> 
> ------=_Part_157389_1558912.1182523880067--
> 

-- 
Hanno Klemm
klemm at phys.ethz.ch