Can we calculate correlation using convolution?

The Convolution Series

  1. Definition of convolution and intuition behind it
  2. Mathematical properties of convolution
  3. Convolution property of Fourier, Laplace, and z-transforms
  4. Identity element of the convolution
  5. Star notation of the convolution
  6. Circular vs. linear convolution
  7. Fast convolution
  8. Convolution vs. correlation
  9. Convolution in MATLAB, NumPy, and SciPy
  10. Deconvolution: Inverse convolution
  11. Convolution in probability: Sum of independent random variables

In many contexts, convolution and correlation are mixed up. One of the biggest sources of this confusion is deep learning, where convolutional neural networks are often implemented using discrete correlation rather than discrete convolution. That is possible, because the order of elements in the convolution masks does not matter: it can be simply learned as flipped [3].

Let’s explain the difference between correlation and convolution once and for all.

Convolution Definition

Continuous convolution is defined as

x(t)h(t)=x(tτ)h(τ)dτ.(1) x(t) \ast h(t) = \int \limits_{-\infty}^{\infty} x(t - \tau) h(\tau) d\tau. \quad (1)

Discrete convolution is defined as

x[n]h[n]=k=x[nk]h[k],nZ.(2) x[n] \ast h[n] = \sum_{k=-\infty}^{\infty} x[n - k] h[k], \quad n \in \mathbb{Z}. \quad (2)

Click here to read about the rationale behind these formulas.

Correlation Definition

Correlation is a notion from the field of stochastic processes. In signal processing we simply use an entity called the correlation function [1]

ϕxh(t)=x(t+τ)h(τ)dτ.(3)\phi_{xh}(t) = \int \limits_{-\infty}^{\infty} x(t + \tau) h(\tau) d\tau. \quad (3)

ϕxx(t)\phi_{xx}(t) is called the autocorrelation function, and ϕxh(t)\phi_{xh}(t) is called the cross-correlation function.

Analogously, we have a correlation function for real-valued discrete-time signals [1]

ϕxh[n]=k=x[n+k]h[k].(4)\phi_{xh}[n] = \sum \limits_{k=-\infty}^{\infty} x[n + k]h[k]. \quad (4)

Sometimes ϕxh[n]\phi_{xh}[n] is referred to as correlation sequence to stress its discrete character [2].

In the subsequent discussion, we assume that the integrals in Equations 1 and 3, and sums in Equations 2 and 4 converge.

Example

Let’s assume we have two signals of length 40, x[n]x[n]

Figure 1. x[n]x[n].

and y[n]y[n]

Figure 2. y[n]y[n].

The convolution between xx and yy is shown in Figure 3

Figure 3. Convolution of xx and yy.

and their correlation in Figure 4.

Figure 4. Correlation of xx and yy.

As you can observe, they are kind of similar.

Correlation as a Similarity Measure

In the context of signal processing, correlation is interpreted as a similarity measure, i.e., how similar are the two correlated signals for a specific lag in samples (relative shift between the signals).

This concept is best visibile for autocorrelation, which is a measure of self-similarity.

In Figure 5, we can see the autocorrelation of signal xx from Figure 1.

Figure 5. Autocorrelation of xx.

Obviously, autocorrelation achieves its peak value for lag n=0n=0 because signal is most similar to an unshifted version of itself.

However, two clear maxima can be observed for relative shifts of n=20n=20 and n=20n=-20. That is because the period of the sine in signal xx is exactly 20.

As you can see, autocorrelation can help us determine the periodicity of a signal. This property is used in pitch estimation (e.g., estimation of the fundamental frequency of the human voice) and also for tempo estimation in the domain of music information retrieval. Autocorrelation can also help us determine whether the observed signal is periodic at all.

Relation Between Convolution and Correlation

As we can observe, Equations 1 and 3, 2 and 4 and Figures 3 and 4 are somewhat similar. Indeed, only the sign of the “shift” in the argument of xx differs. We will now show how to obtain correlation using convolution.

Discrete Correlation Obtained Using Discrete Convolution

ϕxh[n]=k=x[n+k]h[k]=k=x[(n)+k]h[k]=k=x[((n)k)]h[k]=x1[l]=x[l]k=x1[((n)k)]h[k]=(x1[n]h[n])[n]=x[l]=x1[l](x[n]h[n])[n].(5)\phi_{xh}[n] \newline = \sum \limits_{k=-\infty}^{\infty} x[n + k]h[k] \newline = \sum \limits_{k=-\infty}^{\infty} x[-(-n) + k]h[k] \newline = \sum \limits_{k=-\infty}^{\infty} x[-((-n) - k)]h[k] \newline \stackrel{x_1[l]=x[-l]}{=} \sum \limits_{k=-\infty}^{\infty} x_1[((-n) - k)]h[k] \newline = (x_1[n] \ast h[n])[-n] \newline \stackrel{x[-l]=x_1[l]}{=} (x[-n] \ast h[n])[-n]. \quad (5)

In the above derivation, we used the “helper function trick”, which you can read more about here. Index ll in the substitution formulas was used not to confuse the reader but it still denotes the discrete-time index.

It turned out that correlation can be obtained by convolving the signals to be correlated, with one of them having its element order reversed, and then reversing the output of the convolution.

Continuous Correlation Obtained Using Continuous Convolution

Analogously to the discrete case,

ϕxh(t)=x(t+τ)h(τ)dτ=x(((t)τ)))h(τ)dτ=(x(t)h(t))(t).(6)\phi_{xh}(t) = \int \limits_{-\infty}^{\infty} x(t + \tau) h(\tau) d\tau \newline = \int \limits_{-\infty}^{\infty} x(-((-t) - \tau))) h(\tau) d\tau \newline = (x(-t) \ast h(t))(-t). \quad (6)

Final Test

To ultimately test the validity of Equation 5, we do a quick in-code test: we generate 100000 samples of uniformly distributed noise, apply the correlation-from-convolution formula, and test for equality to direct correlation up to the machine precision.

import numpy as np
from matplotlib import pyplot as plt


def main():
    rg = np.random.default_rng()
    shape = (100000,)
    x = rg.uniform(-1.0, 1.0, shape)
    y = rg.uniform(-1.0, 1.0, shape)

    correlation_from_convolution = np.flip(
        np.convolve(np.flip(x), y, 'full'))
    correlation = np.correlate(x, y, 'full')

    np.testing.assert_array_almost_equal(
        correlation_from_convolution, correlation, decimal=10)

if __name__=='__main__':
    main()

Application

The fact that correlation can be obtained using convolution is significant. For example, one could use the fast convolution algorithms to compute correlation efficiently; that is the basis of fast correlation algorithms [2].

This fact also points to how closely convolution and correlation are related. This similarity was mentioned in the introduction in the context of deep learning, where terms “convolution” and “correlation” are used interchangeably [3]. What is more, exactly as we have circular convolution, we also have circular correlation. However, the correlation function does not have many useful properties that convolution has, e.g., correlation is not commutative [3].

Bibliography

[1] Alan V. Oppenheim, Alan S. Willsky, with S. Hamid Signals and Systems, 2nd Edition, Pearson 1997.

[2] Alan V Oppenheim, Ronald W. Schafer Discrete-Time Signal Processing, 3rd Edition, Pearson 2010.

[3] I. Goodfellow, Y. Bengio, A. Courville Deep learning, MIT Press, 2016, https://www.deeplearningbook.org/.