Design FIR filter to reject unwanted frequencies

Let’s see how to design a simple digital FIR filter to reject unwanted frequencies in an incoming signal. As a per-requisite, I urge you to read through this post: Introduction to digital filter design

Background on transfer function

The transfer function of a system provides the underlying support for ascertaining vital system response characteristics without solving the difference equations of the system. As mentioned earlier, the transfer function of a generic filter in Z-domain is given by ratio of polynomials in z

\[H(z) = \frac{ \displaystyle{\sum_{i=0}^{M} b_k z^{-1}}}{ 1 + \displaystyle{\sum_{i=1}^{N} a_k z^{-1}} } \quad\quad (1)\]

The values of z when H(z) =0 are called zeros of H(z). The values of z when H(z) = ∞ are called poles of H(z).

It is often easy to solve for zeros {zi} and poles {pj}, when the polynomials in the numerator and denominator are expressed as resolvable factors.

\[H(z) = \frac{ \displaystyle{\sum_{i=0}^{M} b_k z^{-1}}}{ 1 + \displaystyle{\sum_{i=1}^{N} a_k z^{-1}} } = \frac{N(z)}{D(z)} = \frac{b_M}{a_N} \frac{(z – z_1)(z – z_2)\cdots (z – z_M)}{(z – p_1)(z – p_2)\cdots (z – p_N)} \quad\quad (2) \]

The zeros {zi} are obtained by finding the roots of the equation

\[N(z) = 0 \quad\quad (3)\]

The poles {pj} are obtained by finding the roots of the equation

\[D(z) = 0 \quad\quad (4) \]

Pole-zero plots are suited for visualizing the relationship between the Z-domain and the frequency response characteristics. As mentioned before, the frequency response of the system H(e) can be computed by evaluating the transfer function H(z) at specific values of z = e. Because the frequency response is periodic with period , it is sufficient to evaluate the frequency response for the range -π <= ω < π (that is one loop around the unit circle on the z-plane starting from z=-1 and ending at the same point).

FIR filter design

FIR filters contain only zeros and no poles in the pole-zero plot (in fact all the poles sit at the origin for a causal FIR). For an FIR filter, the location of zeros of H(z) on the unit-circle nullifies specific frequencies. So, to design an FIR filter to nullify specific frequency ω, we just have to place the zeros at corresponding locations on the unit circle (z=e) where the gain of the filter needs to be 0. Let’s illustrate this using an example.

For this illustration, we would use this audio file as an input to the filtering system. As a first step in the filter design process, we should understand the nature of the input signal. Discrete-time Fourier transform (DTFT) is a tool for analyzing the frequency domain characteristics of the given signal.

The following function is used to compute the DTFT of the sequence read from the audio file.

import numpy as np
import matplotlib.pyplot as plt

from math import ceil,log,pi,cos
from scipy.fftpack import fft,fftfreq,fftshift

def compute_DTFT(x,M=0):
    """
    Compute DTFT of the given sequence x
    M is the desired length for computing DTFT (optional).
    Returns the DTFT X[k] and corresponding frequencies w (omega) arranged as -pi to pi
    """
    N = max(M,len(x))
    N = 2**(ceil(log(N)/log(2)))
    
    X = fftshift(fft(x,N))
    w = 2*pi*fftshift(fftfreq(N))    
    return (X,w)

Let’s read the audio file, load the samples as a signal sequence , and plot the sequence in time-domain/frequency domain (using DTFT).

from scipy.io.wavfile import read

samplerate, x = read('speechWithNoise.wav')
duration = len(x)/samplerate
time = np.arange(0,duration,1/samplerate)

fig1, (ax1,ax2) = plt.subplots(nrows=2,ncols=1)
ax1.plot(time,x)
ax1.set_xlabel('Time (s)')
ax1.set_ylabel('Amplitude')
ax1.set_title('speechWithNoise.wav - x[n]')

(X,w)= compute_DTFT(x)
ax2.plot(w,abs(X))
ax2.set_xlabel('Normalized frequency (radians/sample)')
ax2.set_ylabel('|X[k]|')
ax2.set_title('Magnitude vs. Frequency')
Time-domain and frequency domain characteristics of the given audio sample
Figure 1: Time-domain and frequency domain characteristics of the given audio sample

The magnitude vs. frequency plot simply shows huge spikes at θ=±1.32344 radians. The location of the spikes are captured by using the numpy.argmax function.

maxIndex = np.argmax(X)
theta = w[maxIndex]
print(theta)

Since a sinusoid can be mathematically represented as

\[x[n] = cos (\theta n) = \frac{1}{2}\left( e^{j \theta n} + e^{-j \theta n }\right) \quad\quad (5)\]

The two spikes at θ=±1.32344 radians in the frequency domain, will manifest as a sinusoidal signal in the time domain.

Zooming in the area between θ= ±0.4 radians, the frequency domain plot reveals a hidden signal.

Figure 2: Hidden signal revealed in frequency domain

Now, our goal is to design an FIR filter that should reject the sinusoid at θ=±1.32344 radians, so that only the hidden signal gets filtered in.

Since the sinusoid that we want to reject is occurring at some θ radians in the frequency domain, for the FIR filter design, we place two zeros at

\[z_1 = e^{j \theta} \quad\quad z_2 = e^{-j \theta}\quad\quad (6)\]

Therefore, the transfer function of the filter system is given by

\[\begin{aligned} H_f(z) &= \left( 1 – z_1 z^{-1}\right)\left(1 – z_2 z^{-1} \right) \\ &= \left( 1 – e^{j \theta} z^{-1}\right)\left(1 -e^{-j \theta}z^{-1} \right) \\ & = 1 – 2\;cos\left(\theta\right)z^{-1} + z^{-2} \end{aligned}\]

This is a second order FIR filter. The coefficients of the filter are

\[b_0 = 1,\; b_1 = – 2 cos (\theta),\; b_2 = 1 \text{ and } a_0=1\]

For the given problem, to should reject the sinusoid at θ=±1.32344 radians, we set θ=1.32344 in the filter coefficients above.

Filter in action

Filter the input audio signal through the designed filter and plot the filtered output in time-domain and frequency domain. The lfilter function from scipy.signal package↗ is utilized for the filtering operation.

from scipy.signal import lfilter
y_signal = lfilter(b, a, x)
fig3, (ax3,ax4) = plt.subplots(nrows=1,ncols=2)
ax3.plot(time,y_signal,'g')
ax3.set(title='Noise removed speech - y[n]',xlabel='Time (s)',ylabel='Amplitude')

(Y,w)= compute_DTFT(y_signal)
ax4.plot(w,abs(Y)/max(abs(Y)),'r')
ax4.set(title='Frequency content of Y',xlabel='Normalized frequency (radians/sample)',ylabel='Magnitude - |Y[k]|')

The filter has effectively removed the sinusoidal noise, as evident from both time-domain and frequency domain plots.

Figure 4: Extracted speech and its frequency content

Save the filtered output signal as .wav file for audio playback

from scipy.io.wavfile import write
output_data = np.asarray(y_signal, dtype=np.int16)#convert y to int16 format
write("noise_removed_output.wav", samplerate, output_data)

Characteristics of the designed filter

Let’s compute the double sided frequency response of the designed FIR filter. The frequency response of the designed FIR digital filter is computed using freqz function from the scipy.signal package↗.

from scipy.signal import freqz
b = [1,-2*cos(theta),1] #filter coefficients
a = [1]
w, h = freqz(b,a,whole=True)#frequency response h[e^(jw)]
#whole = True returns output for whole range 0 to 2*pi
#To plot double sided response, use fftshift
w = w - 2*np.pi*(w>=np.pi) #convert to range -pi to pi
w = fftshift(w)
h = fftshift(h)

Plot the magnitude response, phase response, pole-zero plot and the impulse response of the designed filter.

#Plot Magnitude-frequency response
fig2, (ax) = plt.subplots(nrows=2,ncols=2)
ax[0,0].plot(w,abs(h),'b')
ax[0,0].set(title='Magnitude response',xlabel='Frequency [radians/sample]',ylabel='Magnitude [dB]')
ax[0,0].grid();ax[0,0].axis('tight');

#Plot phase response
angles = np.unwrap(np.angle(h))
ax[0,1].plot(w,angles,'r')
ax[0,1].set(title='Phase response',xlabel='Frequency [radians/sample]',ylabel='Angles [radians]')
ax[0,1].grid();ax[0,1].axis('tight');

#Transfer function to pole-zero representation
from scipy.signal import tf2zpk
z, p, k = tf2zpk(b, a)

#Plot pole-zeros on a z-plane
from  matplotlib import patches
patch = patches.Circle((0,0), radius=1, fill=False,
                    color='black', ls='dashed')
ax[1,0].add_patch(patch)
ax[1,0].plot(np.real(z), np.imag(z), 'xb',label='Zeros')
ax[1,0].plot(np.real(p), np.imag(p), 'or',label='Poles')
ax[1,0].legend(loc=2)
ax[1,0].set(title='Pole-Zero Plot',ylabel='Real',xlabel='Imaginary')
ax[1,0].grid()

#Impulse response
#create an impulse signal
imp = np.zeros(20)
imp[0] = 1

from scipy.signal import lfilter
y_imp = lfilter(b, a, imp) #drive the impulse through the filter
ax[1,1].stem(y_imp,linefmt='-.')
ax[1,1].set(title='Impulse response',xlabel='Sample index [n]',ylabel='Amplitude')
ax[1,1].axis('tight')
Figure 3: Magnitude response, phase response, pole-zero plot and impulse response of the designed second order FIR filter

Questions

Use the comment form below to answer the following questions

1) Is the filter that we designed a lowpass, highpass, bandpass or a bandstop filter ?
2) To reject a single sinusoidal signal, why two zeros are needed in the above filter design ?
3) What do you understand from the phase response plotted above ?

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Similar topics

Essentials of Signal Processing
● Generating standard test signals
 □ Sinusoidal signals
 □ Square wave
 □ Rectangular pulse
 □ Gaussian pulse
 □ Chirp signal
Interpreting FFT results - complex DFT, frequency bins and FFTShift
 □ Real and complex DFT
 □ Fast Fourier Transform (FFT)
 □ Interpreting the FFT results
 □ FFTShift
 □ IFFTShift
Obtaining magnitude and phase information from FFT
 □ Discrete-time domain representation
 □ Representing the signal in frequency domain using FFT
 □ Reconstructing the time domain signal from the frequency domain samples
● Power spectral density
Power and energy of a signal
 □ Energy of a signal
 □ Power of a signal
 □ Classification of signals
 □ Computation of power of a signal - simulation and verification
Polynomials, convolution and Toeplitz matrices
 □ Polynomial functions
 □ Representing single variable polynomial functions
 □ Multiplication of polynomials and linear convolution
 □ Toeplitz matrix and convolution
Methods to compute convolution
 □ Method 1: Brute-force method
 □ Method 2: Using Toeplitz matrix
 □ Method 3: Using FFT to compute convolution
 □ Miscellaneous methods
Analytic signal and its applications
 □ Analytic signal and Fourier transform
 □ Extracting instantaneous amplitude, phase, frequency
 □ Phase demodulation using Hilbert transform
Choosing a filter : FIR or IIR : understanding the design perspective
 □ Design specification
 □ General considerations in design

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Digital filter design – Introduction

Key focus: Develop basic understanding of digital filter design. Learn about fundamentals of FIR and IIR filters and the design choices.

Analog filters and digital filters are the two major classification of filters, depending on the type of signal signal they process. An analog filter, processes continuous-time signal analog signals. Whereas, digital filters process sampled, discrete-time signals.

Design of digital filters, involve the use of both frequency domain and time domain techniques.This is because, the filter specifications are often specified in frequency domain and the implementation is done in time-domain in the form of difference equations. Z-transform and discrete-time frequency transform (DTFT) are typical tools used for frequency domain analysis of filters.

Linear Time-Invariant Causal filter

The choice of filter and the design process depends on design specification, application and the performance issues associates with them. In this section, the focus is on the principles used for designing linear time-invariant causal filters.

A linear system obeys the principle of superposition. This means that an arbitrary signal can be represented as the weighted sums of shifted unit impulse functions. In a linear time-invariant filtering system, the filter coefficients do not change with time. Therefore, if the input signal is time-shifted, there will be a corresponding time-shift in the output signal. The term causal implies that the output of the system depends only on the present and past samples of input or output, and not on the future samples. Causality is required for real-time applications.

Solution for LTI causal filtering

The input-output relationship in a linear time-invariant causal filter system (shown above) is mathematically described using the following difference equation

\[ \begin{align} \sum_{k=0}^{N} a_k y[n-k] &= \sum_{i=0}^{M} b_i x[n-i] \\ \Rightarrow y[n] &= \sum_{i=0}^{M} b_i x[n-i] – \sum_{k=1}^{N} a_k y[n-k] \end{align} \quad \quad (1) \]

where, x[n] and y[n] are the input and output (sampled discrete-time) signals of the filtering system, ak and bi are the coefficients of the filter that is programmed to certain values for achieving the given filtering task. The values of M and N determine the number of such coefficients, in other words, the filter has M zeros and N poles.

Linear time-invariant systems are completely characterized by it response to an impulse signal δ[n]. Therefore, the analysis and solution for a given filtering task, is easily achieved by using the impulse response h[n], which is the response of the LTI system to an impulse signal δ[n] .

Therefore, the solution for the recursive equation in [1] is found by exciting the input with an impulse , that is x[n]= δ[n].

\[h[n] = \sum_{i=0}^{M} b_i \delta[n-i] – \sum_{k=1}^{N} a_k h[n-k] \quad \quad (2) \]

Z-transform and DTFT

Z-transform and discrete-time frequency transform (DTFT) are important tools for analyzing difference equations and frequency response of filters. Z-transform converts a discrete-time signal into a complex frequency domain representation.

The impulse response of a discrete-time causal system is analyzed using the unilateral or one-sided Z-transform. The unilateral Z-transform of a discrete-time signal x(n) is given by

where, n is an integer and z is a complex number with magnitude r and complex argument ω in radians.

\[ z = re^{j \omega} = r \cdot \left[ cos(\omega) + j\; sin (\omega) \right] \quad \quad (4)\]
Figure 1: Z-transform and Z-plane

If we let |z| = r = 1, then the equation for Z-transform transforms into discrete-time Fourier transform.

The Z-plane contains all values of z, whereas, the unit circle (defined by |z|=r=1) contains only z=e values. Therefore, we can state that Z-transform may exist anywhere on the Z-plane and DTFT exists only on the unit circle. One period of DTFT is equivalent to one loop around the unit circle on the Z-plane.

One of the important properties of Z-transform is with regards to time-shifting of discrete-time samples. A delay of K samples in the time domain is equivalent to multiplication by z-k in the Z-transform domain.

\[x[n-K] \rightleftharpoons z^{-k} X(z) \quad \quad (7)\]

Re-writing equation (1) in Z-domain,

Therefore, the transfer function of the generic digital filter is given by

From equations (5) and (6), the frequency response of the system can be computed by evaluating the transfer function H(z) on the unit circle (i.e, |z| = r =1 → z = e)

IIR filter

From implementation standpoint, there are two classes of digital filters: infinite impulse response (IIR) and finite impulse response (FIR) filters.

When ak ≠ 0 as in equation (2), the filter structure is characterized by the presence of feedback elements. Due to the presence of feedback elements, the impulse response of the filter may not become zero beyond certain point in time, but continues indefinitely and hence the name infinite impulse response (IIR) filter.

\[IIR: \quad \quad h[n] = \sum_{i=0}^{M} b_i \delta[n-i] – \sum_{k=1}^{N} a_k h[n-k], \quad a_k \neq 0 \quad \quad (11) \]

Therefore, equation (9) and (10) are essentially the transfer function and the frequency response of an IIR filtering system.

There exist different methods for implementing the filter structure. Examples include, direct form I structure, direct form II structure, lattice structure, transposition, state space representation etc.., Each method has its own advantage and disadvantage. For example, some method may be robust to precision issues, coefficient sensitivity, lesser memory and computation requirement. The methods are chosen depending on the application.

Figure 2 shows the direct form I signal flow graph for the generic IIR filter described by equation (1). The boxes represented by Z-1 are unit delays. This is because, the Z-transform of unit delay is Z-1

Figure 2: Direct form I signal flow graph for an IIR filter

FIR filter

When ak=0 for all ks, there is no feedback in the filter structure, no poles in the pole-zero plot (in fact all the poles sit at the origin for a causal FIR). The equation is no longer recursive. The impulse response of such filter dies out (becomes zero) beyond certain point in time and hence the name finite impulse response (FIR) filter.

Setting x[n] = δ[n] and ak=0 for all k, the impulse response of an FIR filter is given by,

\[FIR: \quad \quad h[n] = \sum_{i=0}^{M} b_i \delta[n-i] \quad \quad (13) \]

Evaluating the z-transform of impulse input x[n] = δ[n] , in Z-domain following mapping holds

From equation (13) and (14), the transfer function of the FIR filter in Z-domain is given by

The frequency response is given by evaluating the transfer function on the unit circle |z|=1.

The direct form I signal flow graph for the FIR filter is shown in Figure 3

Figure 3: Direct form I signal flow graph for FIR filter

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References

[1] Z- transform – MIT open course ware ↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Similar topics

Essentials of Signal Processing
● Generating standard test signals
 □ Sinusoidal signals
 □ Square wave
 □ Rectangular pulse
 □ Gaussian pulse
 □ Chirp signal
Interpreting FFT results - complex DFT, frequency bins and FFTShift
 □ Real and complex DFT
 □ Fast Fourier Transform (FFT)
 □ Interpreting the FFT results
 □ FFTShift
 □ IFFTShift
Obtaining magnitude and phase information from FFT
 □ Discrete-time domain representation
 □ Representing the signal in frequency domain using FFT
 □ Reconstructing the time domain signal from the frequency domain samples
● Power spectral density
Power and energy of a signal
 □ Energy of a signal
 □ Power of a signal
 □ Classification of signals
 □ Computation of power of a signal - simulation and verification
Polynomials, convolution and Toeplitz matrices
 □ Polynomial functions
 □ Representing single variable polynomial functions
 □ Multiplication of polynomials and linear convolution
 □ Toeplitz matrix and convolution
Methods to compute convolution
 □ Method 1: Brute-force method
 □ Method 2: Using Toeplitz matrix
 □ Method 3: Using FFT to compute convolution
 □ Miscellaneous methods
Analytic signal and its applications
 □ Analytic signal and Fourier transform
 □ Extracting instantaneous amplitude, phase, frequency
 □ Phase demodulation using Hilbert transform
Choosing a filter : FIR or IIR : understanding the design perspective
 □ Design specification
 □ General considerations in design

Linear regression using python – demystified

Key focus: Let’s demonstrate basics of univariate linear regression using Python SciPy functions. Train the model and use it for predictions.

Linear regression model

Regression is a framework for fitting models to data. At a fundamental level, a linear regression model assumes linear relationship between input variables () and the output variable (). The input variables are often referred as independent variables, features or predictors. The output is often referred as dependent variable, target, observed variable or response variable.

If there are only one input variable and one output variable in the given dataset, this is the simplest configuration for coming up with a regression model and the regression is termed as univariate regression. Multivariate regression extends the concept to include more than one independent variables and/or dependent variables.

Univariate regression example

Let us start by considering the following example of a fictitious dataset. To begin we construct the fictitious dataset by our selves and use it to understand the problem of linear regression which is a supervised machine learning technique. Let’s consider linear looking randomly generated data samples.

import numpy as np
import matplotlib.pyplot as plt #for plotting

np.random.seed(0) #to generate predictable random numbers

m = 100 #number of samples
x = np.random.rand(m,1) #uniformly distributed random numbers
theta_0 = 50 #intercept
theta_1 = 35 #coefficient
noise_sigma = 3

noise = noise_sigma*np.random.randn(m,1) #gaussian random noise

y = theta_0 + theta_1*x + noise #noise added target
 
plt.ion() #interactive plot on
fig,ax = plt.subplots(nrows=1,ncols=1)
plt.plot(x,y,'.',label='training data')
plt.xlabel(r'Feature $x_1$');plt.ylabel(r'Target $y$')
plt.title('Feature vs. Target')
Figure 1: Simulated data for linear regression problem

In this example, the data samples represent the feature and the corresponding targets . Given this dataset, how can we predict target as a function of ? This is a typical regression problem.

Linear regression

Let be the pair that forms one training example (one point on the plot above). Assuming there are such sample points as training examples, then the set contains all the pairs .

In the univariate linear regression problem, we seek to approximate the target as a linear function of the input , which implies the equation of a straight line (example in Figure 2) as given by

where, is the intercept, is the slope of the straight line that is sought and is always . The approximated target serves as a guideline for prediction. The approximated target is denoted by

Using all the samples from the training set , we wish to find the parameters that well approximates the relationship between the given target samples and the straight line function .

If we represent the variables s, the input samples for and the target samples as matrices, then, equation (1) can be expressed as a dot product between the two sequences

It may seem that the solution for finding is straight forward

However, matrix inversion is not defined for matrices that are not square. Moore-Penrose pseudo inverse generalizes the concept of matrix inversion to a matrix. Denoting the Moore-Penrose pseudo inverse for as , the solution for finding is

For coding in Python, we utilize the scipy.linalg.pinv function to compute Moore-Penrose pseudo inverse and estimate .

xMat = np.c_[ np.ones([len(x),1]), x ] #form x matrix
from scipy.linalg import pinv
theta_estimate = pinv(xMat).dot(y)
print(f'theta_0 estimate: {theta_estimate[0]}')
print(f'theta_1 estimate: {theta_estimate[1]}')

The code results in the following estimates for , which are very close to the values used to generate the random data points for this problem.

>> theta_0 estimate: [50.66645323]
>> theta_1 estimate: [34.81080506]

Now, we know the parameters of our example system, the target predictions for new values of feature can be done as follows

x_new = np.array([[-0.2],[0.5],[1.2] ]) #new unseen inputs
x_newmat = np.c_[ np.ones([len(x_new),1]), x_new ] #form xNew matrix
y_predict  = np.dot(x_newmat,theta_estimate)
>>> y_predict #predicted y values for new inputs for x_1
array([[43.70429222],
       [68.07185576],
       [92.43941931]])

The approximated target as a linear function of feature, is plotted as a straight line.

plt.plot(x_new,y_predict,'-',label='prediction')
plt.text(0.7, 55, r'Intercept $\theta_0$ = %0.2f'%theta_estimate[0])
plt.text(0.7, 50, r'Coefficient $\theta_1$ = %0.2f'%theta_estimate[1])
plt.text(0.5, 45, r'y= $\theta_0+ \theta_1 x_1$ = %0.2f + %0.2f $x_1$'%(theta_estimate[0],theta_estimate[1]))
plt.legend() #plot legend
Figure 2: Linear Regression – training samples and prediction

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References

[1] Boyd and Vandenberghe , “Convex Optimization”, ISBN: 978-0521833783, Cambridge University Press, 1 edition, March 2004.↗

Related topics

[1] Introduction to Signal Processing for Machine Learning
[2] Generating simulated dataset for regression problems - sklearn make_regression
[3] Hands-on: Basics of linear regression

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Yule Walker Estimation and simulation in Matlab

If a time series data is assumed to be following an Auto-Regressive (AR(N)) model of given form,

the natural tendency is to estimate the model parameters a1,a2,…,aN.

Least squares method can be applied here to estimate the model parameters but the computations become cumbersome as the order N increases.

Fortunately, the AR model co-efficients can be solved for using Yule Walker Equations.

Yule-Walker Equations

Yule Walker equations relate auto-regressive model parameters to auto-covariance rxx[k] of random process x[n]. It can be applied to find the AR(N) model parameters from a given time-series data x[n] that indeed can be assumed to be AR process (visually examining auto-correlation function (ACF) and partial auto-correlation function (PACF) may give clue on whether the data can be assumed as an AR or MA process with appropriate model order N).

Steps to find model parameters using Yule-Walker Equations:

  • Given x[n], estimate auto-covariance of the process rxx[k]
  • Solve Yule Walker Equations to find ak (k=0,1,…,N) and σ2 from

To solve Yule-Walker equations, one has to know the model order. Although the Yule-Walker equations can be used to find the model parameters, it cannot give any insight into the model order N directly. Several methods exists to ascertain the model order.

  • ACF and PACF Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) may give insight into the model order of the underlying process
  • Akaike Information Criterion (AIC)[1] It associates the number of model parameters and the goodness of fit. It also associates a penalty factor to avoid over-fitting (increasing model order always improves the fit – the fitting process has to be stopped somewhere)
  • Bayesian Information Criterion (BIC)[2] Very similar to AIC
  • Cross Validation[3] The given time series is broken into subsets. One subset of data is taken at a time and model parameters estimated. The estimated model parameters are cross validated with data from the remaining subsets.

Deriving Yule-Walker Equations

A generic AR model

can be written in compact form as

Note that the scaling factor x[n] term is a0=1

Multiplying (2) by x[n-l]

One can readily identify the auto-correlation and cross-correlation terms as

The next step is to compute the identified cross-correlation rwx[l] term and relate it to the auto-correlation term rxx[l-k]

The term x[n-l] can also be obtained from equation (1) as

Note that data and noise are always uncorrelated, therefore: x[n-k-l]w[n]=0. Also the auto-correlation of noise is zero at all lags except at lag 0 where its value is equal to σ2 (remember the flat Power Spectral Density of white noise and its auto-correlation). These two properties are used in the following steps. Restricting the lags only to positive values and zero ,

Substituting (6) in (4),

Here there are two cases to be solved – l>0 and l=0
For l>0 case, equation (7) becomes,

Clue: notice the lower limit of the summation which changed from k=0 to k=1 .

Now, equation (8) can be written in matrix form

This is the Yule-Walker Equations which comprises of a set of N linear equations and N unknown parameters.

Representing equation (9) in a compact form

The solutions can be solved by

Once we solve for , equivalently ak, – the model parameters, the noise variance σ2 can be found by applying the estimated values of ak in equation (7) by setting l=0

Matlab’s “aryule” efficiently solves the “Yule-Walker” equations using “Levinson Algorithm”[4][5]

Simulation:

Let’s generate an AR(3) process and pretend that we do not anything about the model parameters. We will take this as input data to Yule-Walker and check if it can estimate the model parameters properly

Generating the data from the AR(3) process given above

a=[1  0.9   0.6 0.5]; %model parameters
x=filter(1,a,randn(1000,1)); %data generation with gaussian noise - 1000 samples
plot(x);
title('Sample data from AR(3) process');
xlabel('Sample index');
ylabel('Value');
Figure 1: Simulated data from AR(3) process

Plot the periodogram (PSD plot) for reference. The PSD plots from the estimated model will be checked against this reference plot later.

figure(2);
periodogram(x); hold on; %plot the original frequency response of the data

Running the simulation for three model orders and checking which model order suits the best.

N=[2,3,4];

for i=1:3,
    [a,p] = aryule(x,N(i))
    [H,w]=freqz(sqrt(p),a);
    hp = plot(w/pi,20*log10(2*abs(H)/(2*pi))); %plotting in log scale
end
xlabel('Normalized frequency (\times \pi rad/sample)');
ylabel('One-sided PSD (dB/rad/sample)');
legend('PSD estimate of x','PSD of model output N=2','PSD of model output N=3','PSD of model output N=4');
Figure 2: Power Spectral density of modeled data

The estimated model parameters and the noise variances computed by the Yule-Walker system are given below. It can be ascertained that the estimated parameters are approximately same as that of what is expected. See how the error decreases as the model order N increases. The optimum model order in this case is N=3 since the error has not changed significantly when increasing the order and also the model parameter a4 of the AR(4) process is not-significantly different from zero.

Figure 3: Estimated model parameters and prediction errors for the given model model order

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References

[1] Akaike H. (1998) Information Theory and an Extension of the Maximum Likelihood Principle. In: Parzen E., Tanabe K., Kitagawa G. (eds) Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1694-0_15
[2] McQuarrie, A. D. R., and Tsai, C.-L., 1998. Regression and Time Series Model Selection. World Scientific.↗
[3] Arlot, Sylvain, and Alain Celisse. “A survey of cross-validation procedures for model selection.” Statistics surveys 4 (2010): 40-79.
[4] J. Durbin.The fitting of time series in models.Review of the International Statistical Institute, 28:233-243, 1960
[5] G. H. Golub and C. F. Van Loan, “Matrix Computations”, The Johns Hopkins University Press, third edition, 1996.↗

Related topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Solving ARMA model – minimization of squared error

Key focus: Can a unique solution exists when solving ARMA (Auto Regressive Moving Average) model ? Apply minimization of squared error to find out.

As discussed in the previous post, the ARMA model is a generalized model that is a mix of both AR and MA model. Given a signal x[n], AR model is easiest to find when compared to finding a suitable ARMA process model. Let’s see why this is so.

AR model error and minimization

In the AR model, the present output sample x[n] and the past N-1 output samples determine the source input w[n]. The difference equation that characterizes this model is given by

The model can be viewed from another perspective, where the input noise w[n] is viewed as an error – the difference between present output sample x[n] and the predicted sample of x[n] from the previous N-1 output samples. Let’s term this “AR model error“. Rearranging the difference equation,

The summation term inside the brackets are viewed as output sample predicted from past N-1 output samples and their difference being the error w[n].

Least squared estimate of the co-efficients – ak are found by evaluating the first derivative of the squared error with respect to ak and equating it to zero – finding the minima.From the equation above, w2[n] is the squared error that we wish to minimize. Here, w2[n] is a quadratic equation of unknown model parameters ak. Quadratic functions have unique minima, therefore it is easier to find the Least Squared Estimates of ak by minimizing w2[n].

ARMA model error and minimization

The difference equation that characterizes this model is given by

Re-arranging, the ARMA model error w[n] is given by

Now, the predictor (terms inside the brackets) considers weighted combinations of past values of both input and output samples.

The squared error, w2[n] is NOT a quadratic function and we have two sets of unknowns – ak and bk. Therefore, no unique solution may be available to minimize this squared error-since multiple minima pose a difficult numerical optimization problem.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Related topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

For further reading

[1] Thiesson et al, “ARMA time series modeling with graphical models”, Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence, July 2004↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Understand AR, MA and ARMA models

Key focus: AR, MA & ARMA models express the nature of transfer function of LTI system. Understand the basic idea behind those models & know their frequency responses.

Introduction

Signal models are used to analyze stationary univariate time series. The goal of signal modeling is to estimate the process from which the desired signal is generated. Though the concept described here is related to the topic of “system identification”, they are quite different.

A signal model is an unique combination of a filter and a source input, that may fall into any of the following categories

  • Filter: state-space model, AR, MA, ARMA (see below)
  • Source:pulse, pulse train, white noise,…

Motivation

Let’s say we observe a real world signal x[n] that has a spectrum x[ɷ] (the spectrum can be arbitrary – bandpass, baseband etc..,). We would like to describe the long sequence of x[n] using very few parameters (application : Linear Predictive Coding (LPC) ). The modelling approach, described here, tries to answer the following two questions.

  • Is it possible to model the first order (mean/variance) and second order (correlations, spectrum) statistics of the signal just by shaping a white noise spectrum using a transfer function ? (see Figure 1)
  • Does this produce the same statistics (spectrum, correlations, mean and variance) for a white noise input ?

If the answer is “yes” to the above two questions, we can simply set the modeled parameters of the system and excite the system with white (flat) noise to produce the desired real world signal. This reduces the amount to data we wish to transmit in a communication system application.

Figure 1: Shaping a white noise spectrum (flat spectrum) to achieve desired spectrum

LTI system model

In the model given below, the random signal x[n] is observed. Given the observed signal x[n], the goal here is to find a model that best describes the spectral properties of x[n] under the following assumptions

x[n] is WSS (Wide Sense Stationary) and ergodic.
● The input signal to the LTI system is white noise following Gaussian distribution – zero mean and variance σ2.
● The LTI system is BIBO (Bounded Input Bounded Output) stable.

Figure 2: Linear Time Invariant (LTI) system – signal model

In the model shown above, the input to the LTI system is a white noise following Gaussian distribution – zero mean and variance σ2. The power spectral density (PSD) of the noise w[n] is

The noise process drives the LTI system with frequency response H(e) producing the signal of interest x[n]. The PSD of the output process is therefore,

Three cases are possible given the nature of the transfer function of the LTI system that is under investigation here.

  • Auto Regressive (AR) models : H(e) is an all-poles system
  • Moving Average (MA) models : H(e) is an all-zeros system
  • Auto Regressive Moving Average (ARMA) models : H(e) is a pole-zero system

Auto Regressive (AR) models (all-poles model)

In the AR model, the present output sample x[n] and the past N output samples determine the source input w[n]. The difference equation that characterizes this model is given by

Here, the LTI system is an Infinite Impulse Response (IIR) filter. This is evident from the fact that the above equation considered past samples of x[n] when determining w[n], there by creating a feedback loop from the output of the filter.

The frequency response of the IIR filter is well known

Figure 3: Spectrum of all-pole transfer function (representing AR model)

The transfer function H(e) is an all-pole transfer function (when the denominator is set to zero, the transfer function goes to infinity -> creating peaks in the spectrum). Poles are best suited to model resonant peaks in a given spectrum. At the peaks, the poles are closer to unit circle. This model is well suited for modeling peaky spectra.

Read all articles tagged Auto-regressive model.

Moving Average (MA) models (all-zeros model)

In the MA model, the present output sample x[n] is determined by the present source input w[n] and past N samples of source input w[n]. The difference equation that characterizes this model is given by

Here, the LTI system is an Finite Impulse Response (FIR) filter. This is evident from the fact that the above equation that no feedback is involved from output to input.

The frequency response of the FIR filter is well known

The transfer function H(e) is an all-zero transfer function (when the numerator is set to zero, the transfer function goes to zero -> creating nulls in the spectrum). Zeros are best suited to model sharp nulls in a given spectrum.

Figure 4: Spectrum of all-zeros transfer function (representing MA model)

Auto Regressive Moving Average (ARMA) model (pole-zero model)

ARMA model is a generalized model that is a combination of AR and MA model. The output of the filter is linear combination of both weighted inputs (present and past samples) and weight outputs (present and past samples). The difference equation that characterizes this model is given by

The frequency response of this generalized filter is well known

The transfer function H(e) is a pole-zero transfer function. It is best suited for modelling complex spectra having well defined resonant peaks and nulls.

Next post: Comparing AR and ARMA model – minimization of squared error

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Related topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing