Visualize a sound file using Python!
In digital signal processing (DSP), machine learning, and deep learning we often need a representation of an audio signal in an image form.
The closest we can get is via using a spectrogram: the magnitude of a short-time Fourier transform (STFT).
In the below code snippet and linked YouTube tutorial, I’m showing you how to calculate the spectrogram, plot it, and save it.
What is a short-time Fourier transform (STFT)?
A short-time Fourier transform (STFT) is the effect of
- windowing a signal and
- calculating its discrete Fourier transform (DFT)
every few samples.
To calculate the STFT:
- Window a part of the signal of length
with a window, for example, the Hann window. - If the given DFT size
is larger than pad the windowed signal with zeros so that it is of length . - Calculate the DFT of the windowed and zero-padded signal.
- Advance by
samples and go to step 1. Repeat until the whole signal has been processed.
Following parameters of the STFT are important:
- Window length or window size
, - Hop length or hop size
, - DFT size (often called FFT size or FFT length)
.
These parameters are given in samples and they influence the time and frequency resolution of the STFT.
What is a spectrogram?
Spectrogram is the magnitude of the STFT.
Each STFT coefficient is a complex number. By taking their magnitude, we obtain a real-valued spectrogram.
How to calculate the spectrogram in Python?
Below there’s the code snippet for it, further down are the explanations and finally, a video showing step-by-step how the script was created.
From the video, you will learn:
- ✅ Which libraries to use
- ✅ How to effortlessly compute the STFT of an audio signal
- ✅ Step-by-step writing of the
plot_spectrogram_and_save()
function - ✅ How to plot the spectrogram in decibels full-scale (dBFS)
- ✅ How to mark the frequency axis using the ISO-standardized octave band marks
- ✅ How to adjust the figure to your needs (colors, labels, font size, and more)
- ✅ How to export your figure to a .png file effortlessly
Code snippet to plot the magnitude spectrum of an audio signal in Python
Explanation:
plot_spectrogram_and_save()
- calculates the short-time Fourier transform (the STFT),
- computes its magnitude (i.e., the spectrogram),
- converts it to decibels full scale (normalized to the highest value),
- plots the spectrogram with beautiful formatting,
- saves it to a file.
- Example
main()
function- reads an audio file from a specified location,
- passes it to the plotting function.
Example speech comes from the LibriSpeech database.
Feel free to copy & paste & modify the snippet according to your needs!
Python libraries used:
- numpy
- matplotlib
- pathlib
- librosa version 0.9.2
- soundfile (only for reading the example audio file, not needed for the spectrogram per se)
Code explanation video
Watch how this code was written and why I included particular lines in this explainer video:
Want to know what knowledge from digital signal processing in needed for audio programming? Check out my free Audio Plugin Developer Checklist!
Comments powered by Talkyard.