Visualize a sound file using Python!

In digital signal processing (DSP), machine learning, and deep learning we often need a representation of an audio signal in an image form.

The closest we can get is via using a **spectrogram**: the magnitude of a short-time Fourier transform (STFT).

In the below code snippet and linked YouTube tutorial, I’m showing you how to calculate the spectrogram, plot it, and save it.

## What is a short-time Fourier transform (STFT)?

A short-time Fourier transform (STFT) is the effect of

- windowing a signal and
- calculating its discrete Fourier transform (DFT)

every few samples.

To calculate the STFT:

- Window a part of the signal of length
$W$ with a window, for example, the Hann window. - If the given DFT size
$N_\text{DFT}$ is larger than$W$ pad the windowed signal with zeros so that it is of length$N_\text{DFT}$ . - Calculate the DFT of the windowed and zero-padded signal.
- Advance by
$H$ samples and go to step 1. Repeat until the whole signal has been processed.

Following parameters of the STFT are important:

- Window length or window size
$W$ , - Hop length or hop size
$H$ , - DFT size (often called FFT size or FFT length)
$N_\text{DFT}$ .

These parameters are given in samples and they influence the time and frequency resolution of the STFT.

## What is a spectrogram?

Spectrogram is the magnitude of the STFT.

Each STFT coefficient is a complex number. By taking their magnitude, we obtain a real-valued spectrogram.

## How to calculate the spectrogram in Python?

Below there’s the code snippet for it, further down are the explanations and finally, a video showing step-by-step how the script was created.

From the video, you will learn:

- ✅ Which libraries to use
- ✅ How to effortlessly compute the STFT of an audio signal
- ✅ Step-by-step writing of the
`plot_spectrogram_and_save()`

function - ✅ How to plot the spectrogram in decibels full-scale (dBFS)
- ✅ How to mark the frequency axis using the ISO-standardized octave band marks
- ✅ How to adjust the figure to your needs (colors, labels, font size, and more)
- ✅ How to export your figure to a .png file effortlessly

### Code snippet to plot the magnitude spectrum of an audio signal in Python

Explanation:

`plot_spectrogram_and_save()`

- calculates the short-time Fourier transform (the STFT),
- computes its magnitude (i.e., the spectrogram),
- converts it to decibels full scale (normalized to the highest value),
- plots the spectrogram with beautiful formatting,
- saves it to a file.

- Example
`main()`

function- reads an audio file from a specified location,
- passes it to the plotting function.

Example speech comes from the LibriSpeech database.

Feel free to copy & paste & modify the snippet according to your needs!

Python libraries used:

- numpy
- matplotlib
- pathlib
- librosa version 0.9.2
- soundfile (only for reading the example audio file, not needed for the spectrogram per se)

### Code explanation video

Watch how this code was written and why I included particular lines in this explainer video:

Want to know what knowledge from digital signal processing in needed for audio programming? Check out my free Audio Plugin Developer Checklist!

Comments powered by Talkyard.