Audio signal processing Archives

Spectrogram Analysis using Python

Keywords: Spectrogram, signal processing, time-frequency analysis, speech recognition, music analysis, frequency domain, time domain, python

Introduction

A spectrogram is a visual representation of the frequency content of a signal over time. Spectrograms are widely used in signal processing applications to analyze and visualize time-varying signals, such as speech and audio signals. In this article, we will explore the concept of spectrograms, how they are generated, and their applications in signal processing.

What is a Spectrogram?

A spectrogram is a two-dimensional representation of the frequency content of a signal over time. The x-axis of a spectrogram represents time, while the y-axis represents frequency. The color or intensity of each point in the spectrogram represents the magnitude of the frequency content at that time and frequency.

How are Spectrograms Generated?

Spectrograms are typically generated using a mathematical operation called the short-time Fourier transform (STFT).

The STFT is a variation of the Fourier transform that computes the frequency content of a signal in short, overlapping time windows. The resulting frequency content is then plotted over time to generate the spectrogram.

\[ STFT(x(t), f, \tau)(\omega, \tau) = \int_{-\infty}^{\infty} x(t)w(t – \tau)e^{-i\omega t}dt \]

where \(STFT(x(t),f \tau)\) is the STFT of the signal \(x(t)\) with respect to frequency f and time shift \(\tau\), \(\omega\) is the frequency variable, \(w(t-\tau)\) is the window function, and \(e^{-j\omega t}\) is the complex exponential that represents the frequency component at \(\omega\).

The STFT is computed by dividing the signal \(x(t)\) into overlapping windows of length \(\tau\) and applying the Fourier transform to each window. The window function \(w(t-\tau)\) is used to taper the edges of the window and minimize spectral leakage. The resulting complex-valued STFT is a function of both frequency and time, and can be visualized as a spectrogram.

In practice, the STFT is computed using a sliding window that moves over the signal in small increments. The size and overlap of the window determine the frequency and temporal resolution of the spectrogram. A larger window size provides better frequency resolution but poorer temporal resolution, while a smaller window size provides better temporal resolution but poorer frequency resolution.

The equation for computing spectrogram can be expressed as:

\[S(f, t) = |STFT(x(t), f, \tau)|^2\]

where \(S(f, t)\) is the spectrogram, STFT is the short-time Fourier transform, \(x(t)\) is the input signal, f is the frequency, and \(\tau\) is the time shift or window shift.

The STFT is calculated by dividing the signal into overlapping windows of length \(\tau\) and computing the Fourier transform of each window. The magnitude squared of the resulting complex-valued STFT is then used to compute the spectrogram. The spectrogram provides a time-frequency representation of the signal, where the magnitude of the STFT at each time and frequency point represents the strength of the signal at that particular time and frequency.

Spectrogram using python

To generate a spectrogram in Python, we can use the librosa library which provides an easy-to-use interface for computing and visualizing spectrograms. Here’s an example program that generates a spectrogram for an audio signal:

import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt

# Load audio file
y, sr = librosa.load('audio_147793__setuniman__sweet-waltz-0i-22mi.hq.ogg')

# Compute spectrogram
spec = librosa.feature.melspectrogram(y=y, sr=sr)

# Convert power to decibels
spec_db = librosa.power_to_db(spec, ref=np.max)

# Plot spectrogram
fig, ax = plt.subplots(nrows = 1, ncols = 1)
img = librosa.display.specshow(spec_db, x_axis='time', y_axis='mel', ax = ax)
fig.colorbar(img, ax = ax, format='%+2.0f dB')
ax.set_title('Spectrogram')
fig.show()

Figure 1: Spectrogram of an example audio file using librosa python library

Spectrogram of an example audio file using librosa python library. Time-frequency components of a signal using spectrogram — Figure 1: Spectrogram of an example audio file using librosa python library

In this program, we first load an audio file using the librosa.load() function, which returns the audio signal (y) and its sampling rate (sr). We then compute the spectrogram using the librosa.feature.melspectrogram() function, which computes a mel-scaled spectrogram of the audio signal.

To convert the power spectrogram to decibels, we use the librosa.power_to_db() function, which scales the spectrogram to decibels relative to a maximum reference power. We then plot the spectrogram using the librosa.display.specshow() function, which displays the spectrogram as a heatmap with time on the x-axis and frequency on the y-axis.

Finally, we add a colorbar and title to the plot using the fig.colorbar() and ax.set_title() functions, respectively, and display the plot using the fig.show() function.

Note that this program assumes that the audio file is in OGG format and is located in the current working directory. If the file is in a different format or location, you will need to modify the librosa.load() function accordingly. The sample audio file can be obtained from the librosa github repository.

Applications of Spectrograms in Signal Processing

Spectrograms are widely used in signal processing applications, particularly in speech and audio processing. Some common applications of spectrograms include:

Speech Analysis: Spectrograms are used to analyze the frequency content of speech signals, which can provide insight into the characteristics of the speaker, such as gender, age, and emotional state.
Music Analysis: Spectrograms are used to analyze the frequency content of music signals, which can provide information about the genre, tempo, and instrument composition of the music.
Noise Reduction: Spectrograms can be used to identify and remove noise from a signal by filtering out the frequency components that correspond to the noise.
Voice Activity Detection: Spectrograms can be used to detect the presence of speech in a noisy environment by analyzing the frequency content of the signal.

Conclusion

Spectrograms are a powerful tool in signal processing for analyzing and visualizing time-varying signals. They provide a detailed view of the frequency content of a signal over time, enabling accurate analysis and interpretation of complex signals such as speech and audio signals. With their numerous applications in speech and audio processing, spectrograms are an essential tool for researchers and practitioners in these fields.

Statistical measures for stochastic signals

Key focus: Discuss statistical measures for stochastic signals : mean, variance, skewness, kurtosis, histogram, scatterplot, cross-correlation and auto-correlation.

Deterministic and stochastic signals

A deterministic signal is exactly predictable for the given time span of interest. It could be expressed using analytic form (example: x(t) = sin (2 π f_c t) ).

Many of the signals that are encountered in real world signal processing applications cannot be expressed or predicted using analytic equations, because they have an element of uncertainty associated with them. As a consequence, such signals are characterized using statistical concepts. Therefore, these signals are outside the realm of deterministic signals and are classified as stochastic signals.

For example, we look at an electrical circuit and monitor the voltage across a resistor for a period of time. Under an applied electric field, atomic particles inside resister tend to randomly move and it manifests as heat. This random thermal motion causes random fluctuation in the voltage measured across the resistor. Therefore, the measured voltage can be characterized by a probability distribution and can be analyzed using statistical methods, but it cannot be predicted with precision. This is an example of signal that is stochastic function of time. Such functions usually evolve according to certain probabilistic laws and are assumed to be generated by an underlying stochastic process (thermal motion in the example above).

*Figure 1: Examples for deterministic and stochastic signals*

Given the amplitude of the stochastic voltage signal at time , now we know, we cannot predict the value at t’. However, if we observed the signal for a sufficient amount of time, we can empirically determine its probability distribution based on which we should be able to answer questions like

Given the amplitude of the voltage at time t, what is the average (expected or mean) of the voltage at time t’?
How much can we expect the voltage at time t’, to fluctuate from the mean ? In other words, we are interested in the variance of the voltage at time t’.
What is the probability that the voltage at t’ exceeds or falls below a certain threshold value ?
How the voltages at times t and t’ are related ? In other words, we are interested in correlation.

Summary of descriptive statistical measures

Different statistical measures are available to gather, review, analyze and draw conclusions from stochastic signals. Some of the statistical measures are summarized below:

Quantitative measures of shape:

In many statistical analysis, the fundamental task is to get a general idea about the shape of a distribution and it is done by using moments.

Measure of central tendency – mean – the first moment:

Measures of central tendency attempt to identify the central position in the distribution of samples that make up the stochastic signal. Mean, mode and median are different measures of central tendency. In signal processing, we are mostly interested in computing mean which is the average of values of the given samples. Given a discrete random variable X with probability mass function p_X(x) the mean is denoted by

\[\mu = E \left[ X\right] = \sum_{x: p_X(x) > 0} x p_X(x) \]

Measure of dispersion – variance – the second moment:

Measures of dispersion describe how the values in the signal samples are spread out around a central location. Variance and standard deviation are some of the measures of dispersion. If the values are widely dispersed, the central location is said to be less representative of the values as a whole. If the values are tightly dispersed, the central location is considered more reliable.

For a discrete random variable X with probability mass function p_X(x) , the variance (σ_X²) is given by the second central moment. The term central moment implies that this is measured relative to mean. For an electrical signal, the second moment is proportional to the average power.

\[\sigma_X^2 = E \left[ \left(X – \mu \right)^2\right] = \sum_{x: p_X(x) > 0} \left(x – \mu \right)^2 p_X(x) \]

The square root of variance is standard deviation

\[\sigma_X = \sqrt{E \left[ \left(X – \mu \right)^2\right]}\]

Figure 2, demonstrates the use of histogram for getting a visual perception of shape of the distribution of samples from two different stochastic signals. Though the signals look similar in nature in time domain, their histogram reveals different picture altogether. The central location (mean) of the first signal is around zero (no DC shift in the time domain view) and for the second signal the mean is at 0.75. The average power of first signal varies widely (histogram is widely spread out) compared to that of the second signal.

*Figure 2: Histogram is a visual method that provides a general idea about the shape of a distribution.*

Higher order moments – skewness and kurtosis:

Further characterization of the shape of a distribution includes higher order moments : skewness and kurtosis. They identify anomalies and outliers in many signal processing applications.

Skewness provides a measure to quantify the presence of asymmetry in the shape of the distribution. Actually, skewness measures the relative size of the tails in a distribution. Presence of asymmetry manifests as non-zero value for the standardized third central moment. The term “standardized” stands for the normalization of the third central moment by σ³.

\[Skewness = \frac{E \left[ \left( X – \mu \right)^3 \right]}{\sigma^3}\]

*Figure 3: A random sequence showing positive skewness (asymmetry in shape) in its distribution*

Kurtosis measures the amount of probability in the two tails of a distribution, relative to a normal distribution of same variance. Kurtosis is 3 for normal distribution (perfect bell shaped curve). If the kurtosis of a distribution is greater than 3, it implies that the tails are heavier compared to that of normal distribution. A value less than 3 for kurtosis implies lighter tails compared to that of the normal distribution. Essentially, kurtosis is a measure of outliers. Kurtosis is calculated as the standardized fourth central moment.

\[Kurtosis = \frac{E \left[ \left( X – \mu \right)^4 \right]}{\sigma^4}\]

*Figure 4: Histogram showing kurtosis of a random sequence vs. kurtosis of normal distribution*

Measures of association

Statistics, such as measures of central tendency, dispersion and higher order moments, describe about a single distribution are called univariate statistics. If we are interested in the relationship between two or more variables, we have to move to at least the realm of bivariate statistics.

Measures of association, summarize the size of association between two variables. Most measures of associates are scaled to a range of values. For example, a measure of association can be construed to have a range 0 to 1, where the value 0 implies no relationship and a value of 1 implies perfect relationship between the two variables under study. In another case, the measure can range from -1 to 1, which can help us determine if the two variables have negative or positive relationship between each other.

Scatter plot

Scatter plot is a great way to visually assess the nature of relationship between two variables. Following figure contains the scatter plots between different stochastic signals, exhibiting different strengths of relationships between the signals.

Correlation

Correlation functions are commonly used in signal processing, for measuring the similarity of two signals. They are especially used in signal detection and pattern recognition.

Cross-correlation

Cross-correlation is commonly used for measuring the similarity between two different signals. In signal processing, cross-correlation measures the similarity of two waveforms as a function of time lags applied to one of the waveform.

For discrete-time waveforms – x[n] and y[n], cross correlation is defined as

\[Corr_{xy}[l] = \sum_{m=-\infty}^{\infty} x[n]^{\ast} y[n+l] = \sum_{m=-\infty}^{\infty} x[n-l]^{\ast} y[n]\]

where, * denotes complex conjugate operation and l is the discrete time lags applied to one of the waveforms. That is, the cross-correlation can be calculated as the dot product of a sequence with each shifted version of another sequence.

Auto-correlation

Auto-correlation is the cross-correlation of a waveform/sequence with itself.

For discrete-time waveform – x[n], auto-correlation is defined as

\[Corr_{xx}[l] = \sum_{m=-\infty}^{\infty} x[n]^{\ast} x[n+l] = \sum_{m=-\infty}^{\infty} x[n-l]^{\ast} x[n] \]

where, * denotes complex conjugate operation and l is the discrete time lags applied to the copy of the same waveform.

Correlation properties are useful for identifying/distinguishing a known bit sequence from a set of other possible known sequences. For example, in GPS technology, each satellite is assigned a unique 10-bit Gold code sequence (2¹⁰ = 1023 possible combinations). Cross-correlation between different Gold code sequence is very low, since the Gold codes are orthogonal to each other. However, the auto-correlation of a Gold code is maximum at zero lag. The satellites are identified using these correlation properties of Gold codes. (I have described the hardware implementation of Gold codes, it can be accessed here).

Following plots illustrate the application of auto-correlation for audio analysis. The auto-correlation of an audio signal will have a peak at zero lag (i.e, where there is no time shifting when computing the correlation) as shown in Figure 6. Figure 7 contains the same audio file that is synthetically processed to produce reverberation characteristics. By looking at the time series plot in Figure 7, we cannot infer anything. However, the auto-correlation plot reveals the reverberation characteristics embedded in the audio.

*Figure 6: Normalized auto-correlation of an original sound*

*Figure 7: Normalized auto-correlation of sound with simulated reverberation characteristics (synthetically processed through an IIR comb filter)*

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author

Wireless Communication Systems in Matlab Second Edition(PDF) (173 votes, average: 3.66 out of 5) Checkout Added to cart	Digital Modulations using Python (PDF ebook) (127 votes, average: 3.58 out of 5) Checkout Added to cart	Digital Modulations using Matlab (PDF ebook) (134 votes, average: 3.63 out of 5) Checkout Added to cart
Hand-picked Best books on Communication Engineering Best books on Signal Processing

For further reading

[1] Steven M. Kay, “Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory”, ISBN: 978-0133457117, Prentice Hall, Edition 1, 1993.↗

Design FIR filter to reject unwanted frequencies

Let’s see how to design a simple digital FIR filter to reject unwanted frequencies in an incoming signal. As a per-requisite, I urge you to read through this post: Introduction to digital filter design

Background on transfer function

The transfer function of a system provides the underlying support for ascertaining vital system response characteristics without solving the difference equations of the system. As mentioned earlier, the transfer function of a generic filter in Z-domain is given by ratio of polynomials in z

\[H(z) = \frac{ \displaystyle{\sum_{i=0}^{M} b_k z^{-1}}}{ 1 + \displaystyle{\sum_{i=1}^{N} a_k z^{-1}} } \quad\quad (1)\]

The values of z when H(z) =0 are called zeros of H(z). The values of z when H(z) = ∞ are called poles of H(z).

It is often easy to solve for zeros {z_i} and poles {p_j}, when the polynomials in the numerator and denominator are expressed as resolvable factors.

\[H(z) = \frac{ \displaystyle{\sum_{i=0}^{M} b_k z^{-1}}}{ 1 + \displaystyle{\sum_{i=1}^{N} a_k z^{-1}} } = \frac{N(z)}{D(z)} = \frac{b_M}{a_N} \frac{(z – z_1)(z – z_2)\cdots (z – z_M)}{(z – p_1)(z – p_2)\cdots (z – p_N)} \quad\quad (2) \]

The zeros {z_i} are obtained by finding the roots of the equation

\[N(z) = 0 \quad\quad (3)\]

The poles {p_j} are obtained by finding the roots of the equation

\[D(z) = 0 \quad\quad (4) \]

Pole-zero plots are suited for visualizing the relationship between the Z-domain and the frequency response characteristics. As mentioned before, the frequency response of the system H(e^jω) can be computed by evaluating the transfer function H(z) at specific values of z = e^jω. Because the frequency response is periodic with period 2π, it is sufficient to evaluate the frequency response for the range -π <= ω < π (that is one loop around the unit circle on the z-plane starting from z=-1 and ending at the same point).

FIR filter design

FIR filters contain only zeros and no poles in the pole-zero plot (in fact all the poles sit at the origin for a causal FIR). For an FIR filter, the location of zeros of H(z) on the unit-circle nullifies specific frequencies. So, to design an FIR filter to nullify specific frequency ω, we just have to place the zeros at corresponding locations on the unit circle (z=e^jω) where the gain of the filter needs to be 0. Let’s illustrate this using an example.

For this illustration, we would use this audio file as an input to the filtering system. As a first step in the filter design process, we should understand the nature of the input signal. Discrete-time Fourier transform (DTFT) is a tool for analyzing the frequency domain characteristics of the given signal.

The following function is used to compute the DTFT of the sequence read from the audio file.

import numpy as np
import matplotlib.pyplot as plt

from math import ceil,log,pi,cos
from scipy.fftpack import fft,fftfreq,fftshift

def compute_DTFT(x,M=0):
    """
    Compute DTFT of the given sequence x
    M is the desired length for computing DTFT (optional).
    Returns the DTFT X[k] and corresponding frequencies w (omega) arranged as -pi to pi
    """
    N = max(M,len(x))
    N = 2**(ceil(log(N)/log(2)))
    
    X = fftshift(fft(x,N))
    w = 2*pi*fftshift(fftfreq(N))    
    return (X,w)

Let’s read the audio file, load the samples as a signal sequence , and plot the sequence in time-domain/frequency domain (using DTFT).

from scipy.io.wavfile import read

samplerate, x = read('speechWithNoise.wav')
duration = len(x)/samplerate
time = np.arange(0,duration,1/samplerate)

fig1, (ax1,ax2) = plt.subplots(nrows=2,ncols=1)
ax1.plot(time,x)
ax1.set_xlabel('Time (s)')
ax1.set_ylabel('Amplitude')
ax1.set_title('speechWithNoise.wav - x[n]')

(X,w)= compute_DTFT(x)
ax2.plot(w,abs(X))
ax2.set_xlabel('Normalized frequency (radians/sample)')
ax2.set_ylabel('|X[k]|')
ax2.set_title('Magnitude vs. Frequency')

*Figure 1: Time-domain and frequency domain characteristics of the given audio sample*

The magnitude vs. frequency plot simply shows huge spikes at θ=±1.32344 radians. The location of the spikes are captured by using the numpy.argmax↗ function.

maxIndex = np.argmax(X)
theta = w[maxIndex]
print(theta)

Since a sinusoid can be mathematically represented as

\[x[n] = cos (\theta n) = \frac{1}{2}\left( e^{j \theta n} + e^{-j \theta n }\right) \quad\quad (5)\]

The two spikes at θ=±1.32344 radians in the frequency domain, will manifest as a sinusoidal signal in the time domain.

Zooming in the area between θ= ±0.4 radians, the frequency domain plot reveals a hidden signal.

*Figure 2: Hidden signal revealed in frequency domain*

Now, our goal is to design an FIR filter that should reject the sinusoid at θ=±1.32344 radians, so that only the hidden signal gets filtered in.

Since the sinusoid that we want to reject is occurring at some θ radians in the frequency domain, for the FIR filter design, we place two zeros at

\[z_1 = e^{j \theta} \quad\quad z_2 = e^{-j \theta}\quad\quad (6)\]

Therefore, the transfer function of the filter system is given by

\[\begin{aligned} H_f(z) &= \left( 1 – z_1 z^{-1}\right)\left(1 – z_2 z^{-1} \right) \\ &= \left( 1 – e^{j \theta} z^{-1}\right)\left(1 -e^{-j \theta}z^{-1} \right) \\ & = 1 – 2\;cos\left(\theta\right)z^{-1} + z^{-2} \end{aligned}\]

This is a second order FIR filter. The coefficients of the filter are

\[b_0 = 1,\; b_1 = – 2 cos (\theta),\; b_2 = 1 \text{ and } a_0=1\]

For the given problem, to should reject the sinusoid at θ=±1.32344 radians, we set θ=1.32344 in the filter coefficients above.

Filter in action

Filter the input audio signal through the designed filter and plot the filtered output in time-domain and frequency domain. The lfilter function from scipy.signal package↗ is utilized for the filtering operation.

from scipy.signal import lfilter
y_signal = lfilter(b, a, x)
fig3, (ax3,ax4) = plt.subplots(nrows=1,ncols=2)
ax3.plot(time,y_signal,'g')
ax3.set(title='Noise removed speech - y[n]',xlabel='Time (s)',ylabel='Amplitude')

(Y,w)= compute_DTFT(y_signal)
ax4.plot(w,abs(Y)/max(abs(Y)),'r')
ax4.set(title='Frequency content of Y',xlabel='Normalized frequency (radians/sample)',ylabel='Magnitude - |Y[k]|')

The filter has effectively removed the sinusoidal noise, as evident from both time-domain and frequency domain plots.

*Figure 4: Extracted speech and its frequency content*

Save the filtered output signal as .wav file for audio playback

from scipy.io.wavfile import write
output_data = np.asarray(y_signal, dtype=np.int16)#convert y to int16 format
write("noise_removed_output.wav", samplerate, output_data)

Characteristics of the designed filter

Let’s compute the double sided frequency response of the designed FIR filter. The frequency response of the designed FIR digital filter is computed using freqz function from the scipy.signal package↗.

from scipy.signal import freqz
b = [1,-2*cos(theta),1] #filter coefficients
a = [1]
w, h = freqz(b,a,whole=True)#frequency response h[e^(jw)]
#whole = True returns output for whole range 0 to 2*pi
#To plot double sided response, use fftshift
w = w - 2*np.pi*(w>=np.pi) #convert to range -pi to pi
w = fftshift(w)
h = fftshift(h)

Plot the magnitude response, phase response, pole-zero plot and the impulse response of the designed filter.

#Plot Magnitude-frequency response
fig2, (ax) = plt.subplots(nrows=2,ncols=2)
ax[0,0].plot(w,abs(h),'b')
ax[0,0].set(title='Magnitude response',xlabel='Frequency [radians/sample]',ylabel='Magnitude [dB]')
ax[0,0].grid();ax[0,0].axis('tight');

#Plot phase response
angles = np.unwrap(np.angle(h))
ax[0,1].plot(w,angles,'r')
ax[0,1].set(title='Phase response',xlabel='Frequency [radians/sample]',ylabel='Angles [radians]')
ax[0,1].grid();ax[0,1].axis('tight');

#Transfer function to pole-zero representation
from scipy.signal import tf2zpk
z, p, k = tf2zpk(b, a)

#Plot pole-zeros on a z-plane
from  matplotlib import patches
patch = patches.Circle((0,0), radius=1, fill=False,
                    color='black', ls='dashed')
ax[1,0].add_patch(patch)
ax[1,0].plot(np.real(z), np.imag(z), 'xb',label='Zeros')
ax[1,0].plot(np.real(p), np.imag(p), 'or',label='Poles')
ax[1,0].legend(loc=2)
ax[1,0].set(title='Pole-Zero Plot',ylabel='Real',xlabel='Imaginary')
ax[1,0].grid()

#Impulse response
#create an impulse signal
imp = np.zeros(20)
imp[0] = 1

from scipy.signal import lfilter
y_imp = lfilter(b, a, imp) #drive the impulse through the filter
ax[1,1].stem(y_imp,linefmt='-.')
ax[1,1].set(title='Impulse response',xlabel='Sample index [n]',ylabel='Amplitude')
ax[1,1].axis('tight')

*Figure 3: Magnitude response, phase response, pole-zero plot and impulse response of the designed second order FIR filter*

Questions

Use the comment form below to answer the following questions

1) Is the filter that we designed a lowpass, highpass, bandpass or a bandstop filter ?
2) To reject a single sinusoidal signal, why two zeros are needed in the above filter design ?
3) What do you understand from the phase response plotted above ?

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author

Wireless Communication Systems in Matlab Second Edition(PDF) Note: There is a rating embedded within this post, please visit this post to rate it. Checkout Added to cart	Digital Modulations using Python (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. Checkout Added to cart	Digital Modulations using Matlab (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. Checkout Added to cart
Hand-picked Best books on Communication Engineering Best books on Signal Processing

Plot audio file as time series using Scipy python

Often the most basic step in signal processing of audio files, one would like to visualize an audio sample file as time-series data.

Audio sounds can be thought of as an one-dimensional vector that stores numerical values corresponding to each sample. The time-series plot is a two dimensional plot of those sample values as a function of time.

Python’s SciPy library comes with a collection of modules for reading from and writing data to a variety of file formats. For example, the scipy.io.wavfile module can be used to read from and write to a .wav format file.

For the following demonstration, sample audio files given in this URL are used for the visualization task.

The read function in the scipy.io.wavefile module can be utilized to open the selected wav file. It returns the sample rate and the data samples.

>>> from scipy.io.wavfile import read #import the required function from the module

>>> samplerate, data = read('CantinaBand3.wav')

>>> samplerate #echo samplerate
22050

>>> data #echo data -> note that the data is a single dimensional array
array([   3,    7,    0, ...,  -12, -427, -227], dtype=int16)

Compute the duration and the time vector of the audio sample from the sample rate

>>> duration = len(data)/samplerate
>>> time = np.arange(0,duration,1/samplerate) #time vector

Plot the time-series data using matplotlib package

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> plt.plot(time,data)
>>> plt.xlabel('Time [s]')
>>> plt.ylabel('Amplitude')
>>> plt.title('CantinaBand3.wav')
>>> plt.show()

Figure 1: Time series plot of audio file using Python Scipy

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author

Wireless Communication Systems in Matlab Second Edition(PDF) Note: There is a rating embedded within this post, please visit this post to rate it. Checkout Added to cart	Digital Modulations using Python (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. Checkout Added to cart	Digital Modulations using Matlab (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. Checkout Added to cart
Hand-picked Best books on Communication Engineering Best books on Signal Processing

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Category: Audio signal processing

Spectrogram Analysis using Python

Introduction

What is a Spectrogram?

How are Spectrograms Generated?

Spectrogram using python

Applications of Spectrograms in Signal Processing

Conclusion

Statistical measures for stochastic signals

Deterministic and stochastic signals

Summary of descriptive statistical measures

Quantitative measures of shape:

Measure of central tendency – mean – the first moment:

Measure of dispersion – variance – the second moment:

Higher order moments – skewness and kurtosis:

Measures of association

Scatter plot

Correlation

Cross-correlation

Auto-correlation

Books by the author

For further reading

Design FIR filter to reject unwanted frequencies

Background on transfer function

FIR filter design

Filter in action

Characteristics of the designed filter

Questions

Similar topics

Books by the author

Plot audio file as time series using Scipy python

Books by the author