How to convolve and do nothing at the same time?

The Convolution Series

  1. Definition of convolution and intuition behind it
  2. Mathematical properties of convolution
  3. Convolution property of Fourier, Laplace, and z-transforms
  4. Identity element of the convolution
  5. Star notation of the convolution
  6. Circular vs. linear convolution
  7. Fast convolution
  8. Convolution vs. correlation
  9. Convolution in MATLAB, NumPy, and SciPy
  10. Deconvolution: Inverse convolution
  11. Convolution in probability: Sum of independent random variables

Table of Contents

  1. Introduction
  2. Justification of the need for an identity element
  3. Identity element of the discrete convolution
  4. Identity element of the continuous convolution
  5. The sifting property
  6. Delay
  7. Signal representation using the delay
  8. Summary


For any operation, a very important concept is the neutral or identity element. Adding 0 to any number results in the same number. Multiplying a number by 1 results in the same number. These trivial facts are extensively used to prove numerous theorems of mathematics, especially in engineering. Particularly popular is adding and subtracting a variable or a constant (effectively adding 0) to introduce a desired element in the inspected (in)equality.

More formally, a neutral element or an identity element with respect to a binary operation \ast defined on a set AA is an element eAe \in A such that [1, Sec.]

ea=ae=aaA.(1)e \ast a = a \ast e = a \quad \forall a \in A. \quad (1)

What is the identity element of convolution?

Why do we need a neutral element?

We often want to represent a “do nothing” operation in our processing, regardless of the domain. Examples of such operations are “add 0” for addition and “multiply by 1” for multiplication, as mentioned in the introduction. Another example is the NOP (“no operation”) instruction of processors used, for instance, for memory alignment.

Imagine that you would like to identify the impulse response of a certain system. As we know from one of the previous articles, the output of an LTI system is the convolution of its impulse response with the input. What if the system does nothing? We need a way to represent its impulse response (Figure 1).

Figure 1. How to represent a system that does not alter our signal at all?

Identity element of the discrete convolution

Let’s focus on the discrete convolution first. We are looking for a discrete signal, let’s denote it by δ[n]\delta[n], such that for any signal x[n]x[n] it holds (according to Equation 1) that

x[n]δ[n]=δ[n]x[n]=k=x[k]δ[nk]=x[n],nZ.(2)x[n] \ast \delta[n] = \delta[n] \ast x[n] = \sum_{k=-\infty}^{\infty} x[k] \delta[n - k] = x[n], \quad n \in \mathbb{Z}. \quad (2)

From the above it is clear that δ[nk]\delta[n-k] should be equal to 1 if k=nk = n and 0 for every other kk. In this way, we can pick out untouched x[n]x[n] from the infinite sum and only x[n]x[n].

If δ[nk]=1\delta[n-k] = 1 for k=nk = n, then δ[0]=1\delta[0]=1. Thus,

δ[n]={1 if n=0,0 if n0.(3)\delta[n] = \begin{cases} 1 &\text{ if } n=0,\newline 0 &\text{ if } n \neq 0. \end{cases}\quad (3)

And so we have found our neutral element! The signal defined in Equation 3 is called a unit sample sequence, a discrete-time impulse, or just an impulse [2]. I have also often stumbled upon the name discrete (Dirac) delta (impulse).

The definition in Equation 3 makes sense also from a different perspective. In the first article in the convolution series, we said that convolution in the context of filtering means delaying and scaling the impulse response by the samples of the input signal. If the impulse response consists of a single sample with value 1, convolving a signal with it should yield only delayed successive weights, i. e., just the input signal.

Identity element of the continuous convolution

How does the neutral element look in the case of continuous convolution? According to Equation 2, we obtain

x(t)δ(t)=δ(t)x(t)=x(τ)δ(tτ)dτ=x(t).(4)x(t) \ast \delta(t) = \delta(t) \ast x(t) = \int \limits_{-\infty}^{\infty} x(\tau) \delta(t - \tau) d\tau = x(t). \quad (4)

Are you able to extract the formula for δ(t)\delta(t) out of this equation? Me neither. So how is our δ(t)\delta(t) defined, then?

It turns out that there exists no function satisfying Equation 4. We need another type of entity called a generalized function or a distribution. Then our δ(t)\delta(t) is called the Dirac δ\delta function and can be approximated by [1, Eq. 15.33a]

δ(t)=limϵ0f(t,ϵ),(5)\delta(t) = \lim_{\epsilon \rightarrow 0} f(t,\epsilon), \quad (5)


f(t,ϵ)={1ϵ if t<ϵ2,0 if tϵ2.(6) f(t,\epsilon) = \begin{cases} \frac{1}{\epsilon} &\text{ if } |t|<\frac{\epsilon}{2},\newline 0 &\text{ if } |t|\geq\frac{\epsilon}{2}. \end{cases}\quad (6)

How to tackle this definition? I try to think about it as a function being 0 everywhere apart from t=0t=0. At t=0t=0, δ(t)\delta(t) tends to ++\infty like an infinitesimally narrow impulse of infinite height. But it is just an intuition; a correct mathematical definition is beyond the scope of this article.

Dirac δ\delta function is ubiquitious in mathematics and engineering. It is often used to define empirical probability distributions (i.e., the ones resulting directly from data) [3]. Additionally, I have seen it in action when defining the excitation function of partial differential equations (PDEs), e.g., representing the influence of a hammer strucking a piano string in physical modeling sound synthesis [4].

The sifting property

Dirac δ\delta function has a valuable property

tat+ax(τ)δ(tτ)dτ=x(t)a>0.(7) \int \limits_{t-a}^{t+a} x(\tau) \delta(t-\tau) d\tau = x(t) \quad \forall a > 0.\quad (7)

Substituting a=a=\infty (what we can do) yields exactly our desired Equation 4.

The property in Equation 7 is called the sifting property of the δ\delta function, because the δ\delta function “sifts” our signal only to return the value of xx at a point where the argument of δ\delta is equal to 0.

In the discrete case, the sifting property was shown in action in Equation 2; there we extracted a single element x[n]x[n] out of the (possibly infinite) xx sequence.


What happens if we shift the argument of the discrete-time impulse by 1?

x[n]δ[n1]=k=x[k]δ[n1k]=x[n1],nZ.(8)x[n] \ast \delta[n-1] = \sum_{k=-\infty}^{\infty} x[k] \delta[n - 1 - k] = x[n-1], \quad n \in \mathbb{Z}. \quad (8)

Looking at the discrete time instant nn, the convolution with an argument-shifted impulse, δ[n1]\delta[n-1], yielded x[n1]x[n-1], i. e., a sample that was already “known” to us (we are at time nn so we already observed x[n1]x[n-1] at time n1n-1). That is the concept of a unit delay.

By adjusting the argument shift n0n_0 of δ[nn0]\delta[n-n_0] and convolving the result with a signal we can obtain an arbitrarily delayed signal. If n0<0n_0 < 0, we can even obtain “samples from the future”, i. e., x[n]δ[n+1]=x[n+1]x[n] \ast \delta[n+1] = x[n+1]. n0n_0 is called the delay length or simply the delay.

The concept of the delay and its application in digital signal processing and audio programming is very profound. Delay is an inherent property of any filter, or more generally, any LTI system. You may have stumbled upon the “Delay effect” as an audio plug-in to a digital audio workstation (DAW); the underlying principle relies on delaying the input signal and possibly adding it to the original. Delaying one channel with respect to the other helps to set up panning based on interaural time difference (ITD). Examples of other applications of the delay, just in the domain of audio effects, include artificial reverberation, comb filter, flanger, chorus, and Karplus-Strong synthesis.

Graphical representation

In DSP diagrams, the delay by n0n_0 samples is marked with a zn0z^{-n_0} box (Figure 2).

Figure 2. Representation of a delay by n0n_0 samples as a functional block in a DSP diagram.

That is because the zz-transform of δ[nn0]\delta[n-n_0] is equal to zn0z^{-n_0}

Z{δ[nn0]}=n=δ[nn0]zn=zn0.(9) \mathcal{Z}\{\delta[n-n_0]\} = \sum_{n=-\infty}^{\infty} \delta[n-n_0] z^{-n} = z^{-n_0}. \quad (9)

Notice that Equation 9 could be viewed as an application of the sifting property. From an infinite “stream” of znz^{-n} we pick out only the one for which n=n0n=n_0.

Arranging delays in a series

From the associativity property of the convolution, which we derived in one of the previous articles, it can be inferred that arranging delays in a series results in a delay of length equal to the sum of the individual delay lengths. That is because

δ[nn0]δ[nn1]=k=δ[kn0]δ[nn1k]=δ[nn0n1].(10)\delta[n-n_0] \ast \delta[n-n_1] = \sum_{k=-\infty}^{\infty} \delta[k - n_0]\delta[n-n_1 - k] \newline = \delta[n-n_0-n_1]. \quad (10)

(δ[kn0]δ[nn1k]=1\delta[k - n_0]\delta[n-n_1 - k]=1 only if kn0=0k-n_0=0 what results in k=n0k=n_0).

That means we can stack the delays one after another to increase the delay length (Figure 3).

Figure 3. Appending a delay element to the system results in adding its delay length to the original delay of the system.

Unsurprisingly, the znz^{-n} notation in Figure 3 results directly from the convolution property of the zz-transform, which we discussed in the previous article

Z{δ[nn0]δ[nn1]}=Z{δ[nn0]}Z{δ[nn1]}=zn0zn1=z(n0+n1).(11) \mathcal{Z}\{\delta[n-n_0] \ast \delta[n-n_1]\} = \mathcal{Z}\{\delta[n-n_0] \} \mathcal{Z}\{\delta[n-n_1]\} \newline = z^{-n_0} z^{-n_1} = z^{-(n_0+n_1)}. \quad (11)

What is a signal, really?

Let’s recap once again the convolutional sum of Equation 2 [2, Eq. 2.5]

x[n]=k=x[k]δ[nk],nZ.(12)x[n] = \sum_{k=-\infty}^{\infty} x[k] \delta[n - k], \quad n \in \mathbb{Z}. \quad (12)

Let’s evaluate it for a few conrete values of nn.

x[0]=k=x[k]δ[0k]=x[0]δ[00]=x[0],x[0] = \sum_{k=-\infty}^{\infty} x[k] \delta[0 - k] = x[0]\delta[0 - 0] = x[0],
x[1]=k=x[k]δ[1k]=x[1]δ[11]=x[1],x[1] = \sum_{k=-\infty}^{\infty} x[k] \delta[1 - k] = x[1]\delta[1 - 1] = x[1],
x[2]=k=x[k]δ[2k]=x[2]δ[22]=x[2],x[2] = \sum_{k=-\infty}^{\infty} x[k] \delta[2 - k] = x[2]\delta[2 - 2] = x[2],
x[n]=k=x[k]δ[nk]=x[n]δ[nn]=x[n].x[n] = \sum_{k=-\infty}^{\infty} x[k] \delta[n - k] = x[n]\delta[n - n] = x[n].

As the value of nn changes, the corresponding shift kk of the delta argument must change as well to make nkn-k equal to 0 and produce a single result x[n]x[n]. We could think of this change of kk as a change of the delay length.

Let’s now assume that x[n]x[n] starts at 0, i. e., x[n]=0n<0x[n]= 0 \forall n <0. Writing down the sum in Equation 12 explicitly yields [2]

x[n]=x[0]δ[n]+x[1]δ[n1]+x[2]δ[n2]++x[n1]δ[n(n1)]+x[n]δ[nn]+(13)x[n] = x[0]\delta[n] + x[1]\delta[n-1] + x[2]\delta[n-2] + \dots \newline + x[n-1]\delta[n-(n-1)] + x[n]\delta[n - n] + \dots \quad (13)

Can you see the beauty of it? x[n]x[n] already contains all possible samples of the sequence xx; we just need to delay it properly to receive the desired sample. In other words, any discrete-time signal is a convolutional sum, a weighted sum of delayed impulses. Fixing index nn to some concrete value sets the delay length accordingly so as to return the signal value for that particular nn.


In this article we examined the identity element of the convolution, i. e., δ[n]\delta[n] for the discrete convolution (Equation 3) and δ(t)\delta(t) for the continuous convolution (Equation 5). The former is much more easily tractable mathematically [2].

We introduced the sifting property of the delta impulse and interpreted it as the delay in the context of digital signal processing.

Finally, we looked at a discrete-time signal as a weighted sum of delayed impulses.


[1] I.N. Bronshtein et. al. Handbook of Mathematics, 5th Edition, Springer, 2007.

[2] A. V. Oppenheim, R. W. Schafer Discrete-Time Signal Processing, 3rd Edition, Pearson, 2010.

[3] I. Goodfellow, Y. Bengio, A. Courville Deep learning, MIT Press, 2016,

[4] M. Schäfer Simulation of Distributed Parameter Systems by Transfer Function Models Ph.D. dissertation, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2020.