[SciPy-User] Understanding the cross-correlation function numpy.correlate & how to use it properly with real and synthetic data

Rob Newman rlnewman at ucsd.edu
Thu Dec 15 11:38:48 EST 2011


Hi Kevin,

Thanks for that chunk of code and the explanation - its a great help.

Happy holidays.
- Rob Newman


On Dec 14, 2011, at 12:49 PM, Kevin Gullikson wrote:

> Rob,
> 
> I understand that the correlate function returns an array that is twice the size of both the input arrays minus 1 (when using mode='full'), but what do I need to do to that resulting array to get the correlation value (if there is indeed a value to be returned) and the timeshift that needs to be applied to the real data to match the synthetic data.
> 
> numpy.correlate returns an array of correlation values, so you don't need to do anything to get that. Getting the timeshift is the somewhat tricky part. Here is some code that I use (I stole it from somewhere, but don't remember where...)"
> 
> #Do the correlation. x and y is the x and y components of your data (so I guess x is time and y is whatever you are modeling), template is what you are cross-correlating with
> ycorr = scipy.correlate(y, template mode="full")
> 
> #Generate an x axis
> xcorr = numpy.arange(ycorr.size)
> 
> #Convert this into lag units, but still not really physical
> lags = xcorr - (y.size-1)
> distancePerLag = (x[-1] - x[0])/float(x.size)  #This is just the x-spacing (or for you, the timestep) in your data
> 
> #Convert your lags into physical units
> offsets = -lags*distancePerLag
> 
> 
> You can then use numpy.argmax() to find the index in ycorr that has the highest cross-correlation value, and do whatever you want with the cross-correlation.
> 
> Cheers,
> Kevin Gullikson
> 
> 
> 
> On Wed, Dec 14, 2011 at 2:14 PM, Rob Newman <rlnewman at ucsd.edu> wrote:
> Hi SciPy gurus,
> 
> First up - I am not a physicist, so please be gentle!
> 
> I have an array of real data and an array of synthetic data. I am trying to determine the cross-correlation of the two signals and the timeshift that needs to be applied to the real data to best match the synthetic data. I also want to only use the real data later on the script if the cross correlation result is above some level of confidence.
> 
> I have read the man page on numpy.correlate, but I am not entirely sure of what that function returns to me, and how I should use it. I have looked at James Battat's website that has a useful script on the discrete correlation function of two functions (https://www.cfa.harvard.edu/~jbattat/computer/python/science/#correlation) but I think his example is more complicated than my needs.
> 
> I understand that the correlate function returns an array that is twice the size of both the input arrays minus 1 (when using mode='full'), but what do I need to do to that resulting array to get the correlation value (if there is indeed a value to be returned) and the timeshift that needs to be applied to the real data to match the synthetic data.
> 
> Thanks in advance,
> - Rob
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20111215/c522ce8d/attachment.html>


More information about the SciPy-User mailing list