INTERNAL: C_CORREL: calculating cross-correlation and cross-covariance via FFT

Anonym Saturday, June 23, 2001

Topic:
IDL has a function C_CORRELATE that calculates the cross-correlation and the cross covariance, using the ususal sum of products.

For large datasets, this can become slow. A much faster approach is to use the FFT.

In the discussion below, a general cross correlation function is given. While IDL's C_CORRELATE requires X and Y to be of the same size, this function (C_CORREL) does NOT place such a restriction on X and Y.Discussion:
The cross-correlation and cross covariance as defined in the documentation on C_CORRELATE, can alternatively be calculated using convolution. By using the CONV function that takes advantage of the FFT to calculate the convolution we can quickly calculate the cross-correlation and covariance for any two N dimensional arrays.

The results from C_CORREL are identical to those from C_CORRELATE, for example:
IDL> X = [3.73, 3.67, 3.77, 3.83, 4.67, 5.87, 6.70, 6.97, 6.40, 5.57]
IDL> Y = [2.31, 2.76, 3.02, 3.13, 3.72, 3.88, 3.97, 4.39, 4.34, 3.95]
IDL> lag = [-5, 0, 1, 5, 6, 7]
IDL> print, c_correlate(x,y,lag)
-0.428246 0.914755 0.674547 -0.405140 -0.403100 -0.339685
IDL> print, c_correl(x,y,lag)
-0.428246 0.914755 0.674547 -0.405140 -0.403100 -0.339685

The only difference is that C_CORREL will return all possible lag values when LAG is not specified, while C_CORRELATE requires a third argument (LAG) to be specified. In other words, the following two statements are equivalent:
IDL> print, c_correlate(x,y, lindgen(19)-9, /COVARIANCE)
-0.0559123 -0.194156 -0.353806 -0.427855 -0.360766 -0.127134 0.182833
0.441737 0.659711 0.770614 0.568256 0.311850 0.0190799 -0.243051
-0.341300 -0.339582 -0.286159 -0.168423 -0.0559365
IDL> print, c_correl(x,y, /COVARIANCE)
-0.0559123 -0.194156 -0.353806 -0.427855 -0.360766 -0.127134 0.182833
0.441737 0.659711 0.770614 0.568256 0.311850 0.0190799 -0.243051
-0.341300 -0.339582 -0.286159 -0.168423 -0.0559364

The advantage of using C_CORREL over C_CORRELATE is the increased speed. In these two examples, on a windows 2000 system using a pentium II 450MHz processor, C_CORREL was about 75 and 168 times faster than C_CORRELATE, respectively.

IDL> n = sin(dindgen(10000))
IDL> t0=systime(/sec) & a = c_correl(n, n, lindgen(8000)) & print, systime(/sec)-t0, format='(f5.2)'
0.11
IDL> t0=systime(/sec) & a = c_correlate(n, n, lindgen(8000)) & print, systime(/sec)-t0, format='(f5.2)'
8.24
IDL> print, 8.24/0.11
74.9091
IDL> t0=systime(/sec) & a = c_correl(n, n) & print, systime(/sec)-t0, format='(f5.2)'
0.11
IDL> t0=systime(/sec) & a = c_correlate(n, n, lindgen(19999)-9999) & print, systime(/sec)-t0, format='(f5.2)'
18.45
IDL> print, 18.45/0.11
167.727Solution:

; Takes the same arguments and give the same result as C_CORRELATE.
;
; C_CORREL returns all possible lag values if a third argument 
; is not specified
;
function reverse_all,a
  ; reverses all N dimensions of a
  b=a
  b[*]=reverse(b[*])
  return,b
end
 
function c_correl, a, b, lag, COVARIANCE=cov
  ;
  ; Calculate the cross correlation
  ; Using the same definition as C_CORRELATE
  ;
  if keyword_set(cov) then $
    res=conv(reverse_all(a)-mean(a),b-mean(b))/sqrt(n_elements(a))/sqrt(n_elements(b)) $
  else $
    res=conv(reverse(a)-mean(a),b-mean(b))/sqrt(total((abs(a-mean(a))^2)* $
      total((abs(b-mean(b))^2))))
  if (n_params() eq 3) then return, res[lag+n_elements(res)/2] $
  else return, res
end