[SciPy-dev] Fixing correlate: handling api breakage ?

Sun May 24 08:26:34 EDT 2009

josef.pktd at gmail.com wrote:
> On Sun, May 24, 2009 at 7:38 AM, David Cournapeau
> <david at ar.media.kyoto-u.ac.jp> wrote:
>   
>> josef.pktd at gmail.com wrote:
>>     
>>> On Sun, May 24, 2009 at 6:16 AM, David Cournapeau
>>> <david at ar.media.kyoto-u.ac.jp> wrote:
>>>
>>>       
>>>> Hi,
>>>>
>>>>    I have taken a look at the correlate function in scipy.signal. There
>>>> are several problems with it. First, it is wrong on several accounts:
>>>>       - It assumes that the correlation of complex numbers corresponds
>>>> to complex multiplication, but this is not the definition followed by
>>>> most textbooks, at least as far as signal processing is concerned.
>>>>       - More significantly, it is wrong with respect to the ordering:
>>>> it assumes that correlate(a, b) == correlate(b, a), which is not true in
>>>> general.
>>>>
>>>>         
>>> I don't see this in the results. There was recently the report on the
>>> mailing list that np.correlate
>>> and signal.correlate switch arrays if the second array is longer.
>>>
>>>
>>>       
>>>>>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0])
>>>>>>
>>>>>>             
>>> array([0, 0, 1, 2, 0, 0, 0, 0, 0])
>>>
>>>       
>>>>>> signal.correlate([0, 0, 1, 0, 0],[1, 2, 0, 0, 0] )
>>>>>>
>>>>>>             
>>> array([0, 0, 0, 0, 0, 2, 1, 0, 0])
>>>
>>>       
>> Well, you just happened to have very peculiar entries :)
>>
>> signal.correlate([-1, -2, -3], [1, 2, 3])
>> -> array([ -3,  -8, -14,  -8,  -3])
>>
>> signal.correlate([1, 2, 3], [-1, -2, -3])
>> -> array([ -3,  -8, -14,  -8,  -3])
>>     
>
> One of your arrays is just the negative of the other, and correlate is
> the same in this case. For other cases, the results differ
>   

Gr, you're right of course :) But it still fails for arrays where the
second argument has any dimension which is larger than the first one. I
don't know if the assumption is relied on in the C implementation (the
arrays are inverted in the C code in that case - even though they seem
to be already inverted in python).

> I looked at it only for examples to calculate auto-correlation and
> cross-correlation in time series, and had to try out to see which
> version works best.
>   

Yes, it depends. I know that correlate is way too slow for my own usage
in speech processing, for example for linear prediction coding. As only
a few lags are necessary, direct implementation is often faster than FFT
one - I have my own straightfoward autocorrelation in scikits.talkbox; I
believe matlab xcorr (1d correlation) always uses the FFT.

I know other people have problems with the scipy.signal correlate as
well for large arrays (the C code does a copy if the inputs are not
contiguous, for example - my own code using iterators should not need
any copy of inputs).

> Are the convolve in all cases compatible, identical (through
> delegation) to correlate?
>   

I don't think so - I don't think convolution uses the conjugate for
complex values. But I don't know any use of complex convolution,
although I am sure there is. Correlation is always defined with the
complex conjugate of the second argument AFAIK. For real cases,
convolutions should always be implementable as correlation, at least
when considering  0 padding for boundaries. There may be problem for
huge arrays, though - doing convolution from correlation without using
copies while staying fast may not always be easy, but I have never tried
to do so.

cheers,

David