Linear Models – Least Squares Estimator (LSE)

Key focus: Understand step by step, the least squares estimator for parameter estimation. Hands-on example to fit a curve using least squares estimation

Background:

The various estimation concepts/techniques like Maximum Likelihood Estimation (MLE), Minimum Variance Unbiased Estimation (MVUE), Best Linear Unbiased Estimator (BLUE) – all falling under the umbrella of classical estimation – require assumptions/knowledge on second order statistics (covariance) before the estimation technique can be applied. Linear estimators, discussed here, do not require any statistical model to begin with. It only requires a signal model in linear form.

Linear models are ubiquitously used in various fields for studying the relationship between two or more variables. Linear models include regression analysis models, ANalysis Of VAriance (ANOVA) models, variance component models etc. Here, one variable is considered as a dependent (response) variable which can be expressed as a linear combination of one or more independent (explanatory) variables.

Studying the dependence between variables is fundamental to linear models. For applying the concepts to real application, following procedure is required

  1. Problem identification
  2. Model selection
  3. Statistical performance analysis
  4. Criticism of the model based on statistical analysis
  5. Conclusions and recommendations

Following text seeks to elaborate on linear models when applied to parameter estimation using Ordinary Least Squares (OLS).

Linear Regression Model

A regression model relates a dependent (response) variable y to a set of k independent explanatory variables {x1, x2 ,…, xk} using a function. When the relationship is not exact, an error term e is introduced.

$latex y = f(x_1,x_2,…,x_k) + e \quad\quad (1) &s=1$

If the function f is not a linear function, the above model is referred as Non-Linear Regression Model. If f is linear, equation (1) is expressed as linear combination of independent variables xk weighted by unknown vector parameters θ = {θ1, θ2,…, θk } that we wish to estimate.

$latex y = x_1 \theta_1 + x_2 \theta_2 + … + x_k \theta_k + e \quad\quad (2) &s=1$

Equation (2) is referred as Linear Regression model. When N such observations are made

$latex y_i = x_{1i} \theta_1 + x_{2i} \theta_2 + … + x_{ki} \theta_k + e , \left(i=1,2,…,N \right) \quad (3) &s=1$

where,
yi – response variable
xi – independent variables – known expressed as observed matrix X with rank k
θi – set of parameters to be estimated
e – disturbances/measurement errors – modeled as noise vector with PDF N(0, σ2 I)

It is convenient to express all the variables in matrix form when N observations are made.

$latex y=\begin{bmatrix} y_1\\ \vdots \\ y_n \end{bmatrix} ,\; X=\begin{bmatrix} x_{11} & x_{21} & … & x_{k1} \\ \vdots &\vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & … & x_{kn} \end{bmatrix} ,\; \theta =\begin{bmatrix} \theta_1\\ \vdots \\ \theta_k \end{bmatrix} ,\; e=\begin{bmatrix} e_1\\ \vdots \\ e_n \end{bmatrix} \quad (4) &s=1$

Denoting equation (3) using (4),

$latex y = X \theta + e \quad\quad (5) &s=1$

Except for X which is a matrix, all other variables are column/row vectors.

Ordinary Least Squares Estimation (OLS)

In OLS – all errors are considered equal as opposed to Weighted Least Squares where some errors are considered significant than others.

If $latex \hat{\theta}$ is a k ⨉ 1 vector of estimates of θ, then the estimated model can be written as

$latex y = X \hat{\theta} + e \quad\quad(6) &s=1$

Thus the error vector e can be computed from the observed data matrix y and the estimated $latex \hat{\theta}$ as

$latex e = y-X \hat{\theta} \quad\quad (7) &s=1$

Here, the errors are assumed to be following multivariate normal distribution with zero mean and standard deviation σ2.

To determine the least squares estimator, we write the sum of squares of the residuals (as a function of $latex \hat{\theta}$ ) as

$latex \begin{aligned} S(\hat{\theta})&=\sum e^2_i = e^Te=(y-X\hat{\theta})^T(y-X\hat{\theta})\\ &=y^Ty-y^T X \hat{\theta} -\hat{\theta}^TX^Ty + \hat{\theta}^TX^TX\hat{\theta} \end{aligned} \quad (8) &s=1$

The least squares estimator is obtained by minimizing $latex S(\hat{\theta})$. In order to get the estimate that gives the least square error, differentiate with respect to $latex \hat{\theta}$ and equate to zero.

$latex \begin{aligned} \frac{\delta S}{\delta \hat{\theta}}&= -2X^Ty+2X^TX\hat{\theta} = 0\\ &=> \hat{\theta} = \left (X^TX \right )^{-1}X^Ty \end{aligned}\quad (9) &s=1$

Thus, the least squared estimate of θ is given by

$latex \boxed{ \hat{\theta} = \left (X^TX \right )^{-1}X^Ty } &s=1$

where the operator T denotes Hermitian Transpose (conjugate transpose).

Summary of computations

  1. Step 1: Choice of variables. Choose the variable to be explained (y) and the explanatory variables { x1, x2 ,…, xk } where x1 is often considered a constant (optional) that always takes the value 1 – this is to incorporate a DC component in the model.
  2. Step 2: Collect data. Collect n observations of y and for a set of known values of { x1, x2 ,…, xk }. Example: { x1, x2 ,…, xk } is the pilot data in OFDM using which we would like to estimate the channel impulse response θ and y is the received vector of samples. Store the observed data y in an – n⨉1 vector and the data on the explanatory variables in the n⨉k matrix X.
  3. Step 3: Compute the estimates. Compute the least squares estimates by the formula
    $latex \boxed{ \hat{\theta} = \left (X^TX \right )^{-1}X^Ty } &s=1$

The superscript T indicates Hermitian Transpose (conjugate transpose) operation.

Key Points

  • We do not need a probabilistic assumption but only a deterministic signal model.
  • It has a broader range of applications.
  • Least squares is unbiased.
  • Estimating the disturbance variance (k variables to estimate and n observations are available).
    $latex \sigma^2 = \frac{e^Te}{n-k} &s=1$
  • To keep the variance low, the number of observations must be greater than the number of variables to estimate.
  • The observation matrix X should have maximum rank – this leads to independent rows and columns which always happens with real data. This will make sure (XTX) is invertible.
  • Least Squares Estimator can be used in block processing mode with overlapping segments – similar to Welch’s method of PSD estimation.
  • Useful in time-frequency analysis.
  • Adaptive filters are utilized for non-stationary applications.

LSE applied to curve fitting

Matlab snippet for implementing Least Estimate to fit a curve is given below.

x = -5:.1:5; % set of x- values - known explanatory variables
y = 5.3 + 1.2* x; % Straight line without noise
e=randn(size(y));
y = y + e; % adding random noise to get observed variable - 
%Linear model - Y=Xa+e where a - parameters to be estimated

X = [ ones(length(x),1) x']; %first column treated aas all ones since x_1=1
y = y'; %column vector for proper dimension during multiplication
a = inv(X'*X)*X'*y  % Least Squares Estimator - equivalent code X\y
h=plot ( x , y , 'o'); %original data
hold on;
plot( x , a(1)+ a(2)*x , 'r-' ); %Fitted line
legend('observed samples',['y=' num2str(a(1)) '+' num2str(a(2)) 'x']) 
title('Least Squares Estimate for Curve Fitting');
xlabel('X values');
ylabel('Y values');

Simulation Results

Least Squares Estimate for Curve Fitting Matlab
Figure 1: Least Squares Estimate for Curve Fitting

Rate this article: [ratings]

Related topics:

[table id=7 /]

Books by the author

[table id = 23/]

4 thoughts on “Linear Models – Least Squares Estimator (LSE)”

  1. can u please tell me how to do same estimation of parameter in linear model using Maximum likelihood? as soon as possible…in MLE u have solved only x=A+wn but I want to know for x = H*s(n)+w

    Reply
  2. Hello Sir

    I want to do channel equalization and I am using the zero forcing equalizer.

    I am using this code.

    enbtx=dlmread(‘input.txt’);

    uerx_cap=dlmread(‘output.txt’);

    enbtx=enbtx(:,1)+1i*enbtx(:,2);

    enbtx_norm=enbtx/max(abs(enbtx));

    uerx_cap=uerx_cap(:,1)+1i*uerx_cap(:,2);

    uerx_cap_norm=uerx_cap/max(abs(uerx_cap));

    x=enbtx_norm; % I/P

    y=uerx_cap_norm; %o/p

    X=fft(x,);

    Y=fft(y,);

    H=Y*pinv(X); channel estimation

    H_zf=pinv(H); making 1/H(z)

    As channel is estimated then I take new data which is passed by the same channel

    z is the new data taken

    Z=fft(z);

    Y_eq=H_zf*Y;

    y_eq=ifft(Y_eq);

    But for the new input output the equalizer is not working
    Kindly help me, I am stuck in it.

    With warm regards

    Reply

Leave a Reply to Girish Cancel reply