Spectrogram Analysis using Python

Keywords: Spectrogram, signal processing, time-frequency analysis, speech recognition, music analysis, frequency domain, time domain, python

Introduction

A spectrogram is a visual representation of the frequency content of a signal over time. Spectrograms are widely used in signal processing applications to analyze and visualize time-varying signals, such as speech and audio signals. In this article, we will explore the concept of spectrograms, how they are generated, and their applications in signal processing.

What is a Spectrogram?

A spectrogram is a two-dimensional representation of the frequency content of a signal over time. The x-axis of a spectrogram represents time, while the y-axis represents frequency. The color or intensity of each point in the spectrogram represents the magnitude of the frequency content at that time and frequency.

How are Spectrograms Generated?

Spectrograms are typically generated using a mathematical operation called the short-time Fourier transform (STFT).

The STFT is a variation of the Fourier transform that computes the frequency content of a signal in short, overlapping time windows. The resulting frequency content is then plotted over time to generate the spectrogram.

\[ STFT(x(t), f, \tau)(\omega, \tau) = \int_{-\infty}^{\infty} x(t)w(t – \tau)e^{-i\omega t}dt \]

where \(STFT(x(t),f \tau)\) is the STFT of the signal \(x(t)\) with respect to frequency f and time shift \(\tau\), \(\omega\) is the frequency variable, \(w(t-\tau)\) is the window function, and \(e^{-j\omega t}\) is the complex exponential that represents the frequency component at \(\omega\).

The STFT is computed by dividing the signal \(x(t)\) into overlapping windows of length \(\tau\) and applying the Fourier transform to each window. The window function \(w(t-\tau)\) is used to taper the edges of the window and minimize spectral leakage. The resulting complex-valued STFT is a function of both frequency and time, and can be visualized as a spectrogram.

In practice, the STFT is computed using a sliding window that moves over the signal in small increments. The size and overlap of the window determine the frequency and temporal resolution of the spectrogram. A larger window size provides better frequency resolution but poorer temporal resolution, while a smaller window size provides better temporal resolution but poorer frequency resolution.

The equation for computing spectrogram can be expressed as:

\[S(f, t) = |STFT(x(t), f, \tau)|^2\]

where \(S(f, t)\) is the spectrogram, STFT is the short-time Fourier transform, \(x(t)\) is the input signal, f is the frequency, and \(\tau\) is the time shift or window shift.

The STFT is calculated by dividing the signal into overlapping windows of length \(\tau\) and computing the Fourier transform of each window. The magnitude squared of the resulting complex-valued STFT is then used to compute the spectrogram. The spectrogram provides a time-frequency representation of the signal, where the magnitude of the STFT at each time and frequency point represents the strength of the signal at that particular time and frequency.

Spectrogram using python

To generate a spectrogram in Python, we can use the librosa library which provides an easy-to-use interface for computing and visualizing spectrograms. Here’s an example program that generates a spectrogram for an audio signal:

import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt

# Load audio file
y, sr = librosa.load('audio_147793__setuniman__sweet-waltz-0i-22mi.hq.ogg')

# Compute spectrogram
spec = librosa.feature.melspectrogram(y=y, sr=sr)

# Convert power to decibels
spec_db = librosa.power_to_db(spec, ref=np.max)

# Plot spectrogram
fig, ax = plt.subplots(nrows = 1, ncols = 1)
img = librosa.display.specshow(spec_db, x_axis='time', y_axis='mel', ax = ax)
fig.colorbar(img, ax = ax, format='%+2.0f dB')
ax.set_title('Spectrogram')
fig.show()
Spectrogram of an example audio file using librosa python library.  Time-frequency components of a signal using spectrogram
Figure 1: Spectrogram of an example audio file using librosa python library

In this program, we first load an audio file using the librosa.load() function, which returns the audio signal (y) and its sampling rate (sr). We then compute the spectrogram using the librosa.feature.melspectrogram() function, which computes a mel-scaled spectrogram of the audio signal.

To convert the power spectrogram to decibels, we use the librosa.power_to_db() function, which scales the spectrogram to decibels relative to a maximum reference power. We then plot the spectrogram using the librosa.display.specshow() function, which displays the spectrogram as a heatmap with time on the x-axis and frequency on the y-axis.

Finally, we add a colorbar and title to the plot using the fig.colorbar() and ax.set_title() functions, respectively, and display the plot using the fig.show() function.

Note that this program assumes that the audio file is in OGG format and is located in the current working directory. If the file is in a different format or location, you will need to modify the librosa.load() function accordingly. The sample audio file can be obtained from the librosa github repository.

Applications of Spectrograms in Signal Processing

Spectrograms are widely used in signal processing applications, particularly in speech and audio processing. Some common applications of spectrograms include:

  1. Speech Analysis: Spectrograms are used to analyze the frequency content of speech signals, which can provide insight into the characteristics of the speaker, such as gender, age, and emotional state.
  2. Music Analysis: Spectrograms are used to analyze the frequency content of music signals, which can provide information about the genre, tempo, and instrument composition of the music.
  3. Noise Reduction: Spectrograms can be used to identify and remove noise from a signal by filtering out the frequency components that correspond to the noise.
  4. Voice Activity Detection: Spectrograms can be used to detect the presence of speech in a noisy environment by analyzing the frequency content of the signal.

Conclusion

Spectrograms are a powerful tool in signal processing for analyzing and visualizing time-varying signals. They provide a detailed view of the frequency content of a signal over time, enabling accurate analysis and interpretation of complex signals such as speech and audio signals. With their numerous applications in speech and audio processing, spectrograms are an essential tool for researchers and practitioners in these fields.

Post your valuable comments !!!