How To Plot Audio Spectrogram For Machine Learning In Python Using Librosa & Matplotlib | Tutorial for Beginners

Visualize a sound file using Python!

In digital signal processing (DSP), machine learning, and deep learning we often need a representation of an audio signal in an image form.

The closest we can get is via using a spectrogram: the magnitude of a short-time Fourier transform (STFT).

In the below code snippet and linked YouTube tutorial, I’m showing you how to calculate the spectrogram, plot it, and save it.

What is a short-time Fourier transform (STFT)?

A short-time Fourier transform (STFT) is the effect of

every few samples.

To calculate the STFT:

Window a part of the signal of length $W$ with a window, for example, the Hann window.
If the given DFT size $N_\text{DFT}$ is larger than $W$ pad the windowed signal with zeros so that it is of length $N_\text{DFT}$ .
Calculate the DFT of the windowed and zero-padded signal.
Advance by $H$ samples and go to step 1. Repeat until the whole signal has been processed.

Following parameters of the STFT are important:

These parameters are given in samples and they influence the time and frequency resolution of the STFT.

Spectrogram is the magnitude of the STFT.

Each STFT coefficient is a complex number. By taking their magnitude, we obtain a real-valued spectrogram.

Below there’s the code snippet for it, further down are the explanations and finally, a video showing step-by-step how the script was created.

From the video, you will learn:

Explanation:

plot_spectrogram_and_save()
1. calculates the short-time Fourier transform (the STFT),
2. computes its magnitude (i.e., the spectrogram),
3. converts it to decibels full scale (normalized to the highest value),
4. plots the spectrogram with beautiful formatting,
5. saves it to a file.
Example main() function
1. reads an audio file from a specified location,
2. passes it to the plotting function.