[Numpy-discussion] Numpy correlate
Sudheer Joseph
sudheer.joseph at yahoo.com
Tue Mar 19 03:12:00 EDT 2013
Thank you Pierre,
It appears the numpy.correlate uses the frequency domain method for getting the ccf. I would like to know how serious or exactly what is the issue with normalization?. I have computed cross correlation using the function and interpreting the results based on it. It will be helpful if you could tell me if there is a significant bug in the function
with best regards,
Sudheer
From: Pierre Haessig <pierre.haessig at crans.org>
To: numpy-discussion at scipy.org
Sent: Monday, 18 March 2013 10:30 PM
Subject: Re: [Numpy-discussion] Numpy correlate
Hi Sudheer,
Le 14/03/2013 10:18, Sudheer Joseph a écrit :
Dear Numpy/Scipy experts,
> Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y)
>Y is slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags.
>Can any one advice me on how to see which way exactly the 2 series are slided back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag.
You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous.
The way I would try to get an interpretation of xcorr function
(& its friends) is to go back to the theoretical definition of
cross-correlation, which is a normalized version of the covariance.
In your example you've created a time series X(k) and a lagged one :
Y(k) = X(k-5)
Now, the covariance function of X and Y is commonly defined as :
Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation
(assuming that X and Y are centered for the sake of clarity).
If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).
This yields naturally the fact that the covariance is indeed maximal
at h=-5 and not h=+5.
Note that this reasoning does yield the opposite result with a
different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) *
Y(k+h)) (and that's what I first did !).
Therefore, I think there should be a definition in of cross
correlation in matplotlib xcorr docstring. In R's acf doc, there is
this mention : "The lag k value returned by ccf(x, y) estimates the
correlation between x[t+k] and y[t]. "
(see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)
Now I believe, this upper discussion really belongs to matplotlib
ML. I'll put an issue on github (I just spotted a mistake the
definition of normalization anyway)
Coming back to numpy :
There's a strange thing, the definition of numpy.correlate seems to
give the other definition "z[k] = sum_n a[n] * conj(v[n+k])" ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ?
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130319/086ff134/attachment.html>
More information about the NumPy-Discussion
mailing list