As discussed in the previous post, the ARMA model is a generalized model that is a mix of both AR and MA model. Given a signal \(x[n]\), AR model is easiest to find when compared to finding a suitable ARMA process model. Let’s see why this is so.

## AR model error and minimization

In the AR model, the present output sample \(x[n]\) and the past \(N-1\) output samples determine the source input \(w[n]\).The difference equation that characterizes this model is given by

$$x[n] + a_1 x[n-1] + a_2 x[n-2] + … + a_N x[n-N] = w[n]$$

The model can be viewed from another perspective, where the input noise \(w[n]\) is viewed as an error – the difference between present output sample \(x[n]\) and the predicted sample of \(x[n]\) from the previous \(N-1\) output samples. Let’s term this “AR model error”. Rearranging the difference equation,

$$w[n]= x[n]-\left(-\sum^{N}_{k=1}a_k x[n-k] \right)$$

The summation term inside the brackets are viewed as output sample predicted from past \(N-1\) output samples and their difference being the error \(w[n]\).

Least Squared Estimated of the co-efficients – \(a_k\) are found by evaluating the first derivative of the squared error with respect to \(a_k\) and equating it to zero – finding the minima.From the equation above, \(w^2[n]\) is the squared error that we wish to minimize. Here, \(w^2[n]\) is a quadratic equation of unknown model parameters \(a_k\). Quadratic functions have unique minima, therefore it is easier to find the Least Squared Estimates of \(a_k\) by minimizing \(w^2[n]\).

## ARMA model error and minimization

The difference equation that characterizes this model is given by

$$x[n] + a_1 x[n-1] + … + a_N x[n-N] = b_0 w[n] + b_1 w[n-1] + … + b_M w[n-M]$$

Re-arranging, the ARMA model error \(w[n]\) is given by

$$w[n]= x[n]-\left(-\sum^{N}_{k=1}a_k x[n-k] + \sum^{M}_{k=1}b_k w[n-k] \right)$$

Now, the predictor (terms inside the brackets) considers weighted combinations of past values of both input and output samples.

The squared error, \(w^2[n]\) is NOT a quadratic function and we have two sets of unknowns – \(a_k\) and \(b_k\). Therefore, no unique solution may be available to minimize this squared error-since multiple minimas pose a difficult numerical optimization problem.