Do you remember the slowdown of time in “Inception”? Or the “bullet time” in “The Matrix”? How to create such a slow motion effect in the audio domain? That is the topic of today’s article!
In this article we will prove the usefulness of the sampling theory in music and audio effects. If you don’t know what sampling is all about check out this article. If you need a revision on aliasing check out this article.
Decimation in time
What would happen if we removed every other sample from the signal? To simplify things, let’s look at an example.
Here we have a 500 Hz sine wave sampled at 16 kHz:
500 Hz sine sampled at 16,000 Hz.
Let’s note, that one period is 2 ms long.
Removing every second sample we obtain:
The signal after removing every other sample.
We can see, that the duration of the sine shrunk to 2.5 ms, what is no surprise, since we basically removed half of the signal and 2.5 ms corresponds to half of 5 ms.
But if duration changed, with sample rate remaining constant, the frequency changed as well! The new period length is 1 ms, so the frequency of the sine is not 500 Hz anymore, but
The process of removing particular samples is called decimation or downsampling and here we performed decimation with decimation factor equal to 2 (2 times less samples). It sometimes involves the antialiasing filter, which will be explained below in more detail.
Tape replay
If you ever rewinded a cassette magnetic tape, you were able to hear a quickened version of the recording, what enabled you to find the exact spot you wanted to hear. That’s exactly the same effect you can witness by sample decimation.
We can view that change in the signal as though the replay sample rate has changed. If we output the samples with two times the original sample rate we will run out of samples in half of the duration of the original signal, since we are outputting twice as many samples per second as before.
Mathematical formulation
We may express it in mathematical terms as the length of the output signal’s period in seconds:
where
Converting to the frequency domain means taking the reciprocal of equation {eq:vsr_time} what results in the following frequency scaling:
where
Possible ways to rescale signal’s time
How can we achieve the effect of variable speed replay using digital devices rather than a tape player? There are basically two ways:
- change the sampling frequency of the DAC (e.g., in a stage setting),
- resample the signal before output.
As this is a blog about audio programming we will deal with the second option, namely the variable speed replay algorithm.
Variable speed replay algorithm
Assuming that
Variable speed replay algorithm scheme.
Here
- upsampling by a factor of M (inserting M-1 zeros after each sample),
- low-pass filtering with cutoff frequency set to
(to remove the repeated spectral components that were dragged below the Nyquist frequency as a result of lowering the frequency of the repeated digital spectra), - low-pass filtering with cutoff frequency set to
(to prevent the aliasing of the stretched spectra, since all repeated spectral components will be stretched by downsampling), - downsampling by a factor of N (removing N-1 samples from each block of N samples, exactly what we did to our 500 Hz sine).
Note the usage of low-pass filters. They are crucial to prevent any additional components from appearing in the processed signal’s spectrum. So whether we are raising the frequency or we are lowering it, we must always filter, to remove the extra components interfering with our original signal.
Upsampling is performed first, because downsampling (removing certain samples) may be considered a removal of information, therefore it’s done best at the end of processing.
All in all, this scheme is a typical resampling scheme. The only difference in the variable speed replay algorithm is that we treat the output signal, as if it still had the original sample rate
The outcome of variable speed replay application
We can reason about the result of such signal rescaling in two domains: time and frequency.
Modified frequency structure
The frequency structure changes with the scaling accordingly:
- if we are quickening the signal (scaling the time speed up) we receive a “Mickey Mouse” or a “Chipmunk” effect (especially powerful when applied to speech),
- if we are slowing the signal (scaling the time speed down) we perceive a “slow motion” effect (exactly like the one in “Inception” or “The Matrix”)
Modified time structure
The time structure is altered with the following perceivable changes:
- the transients are spread (slower) or contracted (more rapid),
- the vibrato technique (changing the frequency of the instrument in a certain range around the played note) loses its characteristics: it becomes a slower (when slowing down time) or faster modulation.
Negative speed?!
If we set
Keeping the
Above examples show that variable speed replay algorithm is a creative application of the sampling theory.
Unity Time.timeScale correspondence
If you ever worked with the Unity Editor, you may have stumbled upon the Time.timeScale parameter. It basically changes the speed at which things are happening. If set to 1 the time flows normally, however, if you decrease it to 0.5 the events happen two times slower, so the Time.timeScale parameter corresponds to the
Note: the Time.timeScale parameter also does not change the rate at which physics engine is updating. Thus the corresponding Time.fixedDeltaTime parameter should be updated accordingly (typically Time.fixedDeltaTime = 0.02f * Time.timeScale, where 0.02f denotes the typical default value of Time.fixedDeltaTime, but it can be different of course).
Time scaling preserving pitch
It is not trivial to scale the signal in time without altering the pitch (i.e. to lengthen or shorten the signal). Here, a variety of Overlap-Add methods apply, which are beyond the scope of this article. However, they are likely to appear in the future WolfSound’s articles, so stay tuned!
Summary
We have discussed the variable speed replay algorithm, which enables to manipulate the relative feeling of time flow in the signal and explored its consequences and applications.
If you want to learn more I highly encourage you to check out this book “Digital Audio FX” by Udo Zölzer et. al.:
It contains an even broader perspective on the variable speed replay algorithm and many more; it is one of the best books to learn about audio algorithms!
Code example
The following code is a command-line tool to create the slow-motion or ‘chipmunk’ sound effect. The ‘crude’ version is a direct implementation of the algorithm as presented on the scheme above, the ‘smart’ version uses the resampling facility of SciPy and is significantly faster.
#!/usr/bin/env python3
"""
Variable speed replay algorithm implementation.
A command-line tool that processes the supplied wave file and plays out the result.
"""
import argparse
from scipy.io.wavfile import read
from scipy.signal import resample, butter, lfilter
import sounddevice as sd
import numpy as np
__author__ = "Jan Wilczek"
__license__ = "GPL"
__version__ = "1.0.0"
def parse_args():
parser = argparse.ArgumentParser(description='Apply the Variable Speed Replay algorithm to an audio file '
'and play it out.')
parser.add_argument('-f', '--filename', type=str, required=True, help='filename of the wave file to process',
dest='filename')
parser.add_argument('-N', type=int, required=True, help='numerator of the time scale parameter (v = N / M)',
dest='N')
parser.add_argument('-M', type=int, required=True, help='denominator of the time scale parameter (v = N / M)',
dest='M')
parser.add_argument('-c', '--crude', dest='process_frame', action='store_const', const=process_frame_crude,
default=process_frame_smart,
help='process each frame with brute force implementation '
'(default: process each frame optimally)')
return parser.parse_args()
def read_and_preprocess_file(filename):
sample_rate, data = read(filename)
data = data[:, 0] # extract first channel
normalized_data = data / np.amax(abs(data)) # convert to [-1, 1] range
return sample_rate, normalized_data
def play_samples_blocking(samples, sample_rate, volume=1.0):
sd.play(volume * samples, samplerate=sample_rate)
sd.wait()
def upsample(frame, M):
M_minus_1_zeros = (M - 1) * [0.0]
for i in range(1, len(frame)):
next_sample_id = (i-1) * M + 1
frame = np.insert(frame, next_sample_id, M_minus_1_zeros)
return frame
def filter_frame(frame, nyquist_frequency, M, N):
order = 5
cutoff_frequency = min(nyquist_frequency / M, nyquist_frequency / N)
normalized_cutoff = cutoff_frequency / nyquist_frequency
b, a = butter(order, normalized_cutoff, btype='low', analog=False)
return lfilter(b, a, frame)
def downsample(frame, N):
return [frame[i] for i in range(0, len(frame)) if i % N == 0]
def process_frame_crude(frame, M, N, fs):
upsampled = upsample(frame, M)
filtered = filter_frame(upsampled, fs / 2, M, N)
downsampled = downsample(filtered, N)
return downsampled
def process_frame_smart(frame, M, N, _):
return resample(frame, int(len(frame) / (N / M)))
def variable_speed_replay(samples, fs, M, N, process_frame_function, frame_size=16384):
if N / M == 1:
return samples
output = np.empty((0,))
for i in range(0, len(samples), frame_size):
end = min(i + frame_size, len(samples))
resampled_frame = process_frame_function(samples[i:end], M, N, fs)
output = np.append(output, resampled_frame)
return output
if __name__ == '__main__':
args = parse_args()
sample_rate, data = read_and_preprocess_file(args.filename)
transformed = variable_speed_replay(data, sample_rate, args.M, args.N, args.process_frame)
play_samples_blocking(transformed, sample_rate, volume=0.4)
Feel free to copy and test out the above code. If you are not sure how to use it, then run variable_speed_replay.py --help
(assuming you copied the code into ‘variable_speed_replay.py’ file) and follow the instructions given.
References
[1] Zölzer, U. DAFX: Digital Audio Effects. 2nd ed. Helmut Schmidt University – University
of the Federal Armed Forces, Hamburg, Germany: John Wiley & Sons Ltd, 2011.
Comments powered by Talkyard.