The Mean Square Error – Why do we use it for estimation problems

Note: There is a rating embedded within this post, please visit this post to rate it.
“Mean Square Error”, abbreviated as MSE, is an ubiquitous term found in texts on estimation theory. Have you ever wondered what this term actually means and why is this getting used in estimation theory very often ?

Any communication system has a transmitter, a channel or medium to communicate and a receiver. Given the channel impulse response and the channel noise, the goal of a receiver is to decipher what was sent from the transmitter. A simple channel is usually characterized by a channel response – \( h \) and an additive noise term – \( n \). In time domain, this can be written as

$$ y = h \circledast x + n $$

Here, is the convolution operation. Equivalently, in frequency domain, the convolution operation is equivalent to multiplication in frequency domain and vice-versa.

$$ Y = HX + N $$

Remember!!! Capitalized letters indicate frequency domain representation and small caps indicate time domain representation. The frequency domain equation looks simple and the only spoiler is the noise term. The receiver receives the information over the channel corrupted by noise and tries to decipher what was sent from the transmitter – \(X\). If the noise is cancelled out in the receiver, \(N =0\), the observed spectrum at the receiver will look like,

$$ Y = HX $$

Now, to know \( X \), the receiver has to know \(H\). Then it can simple divide the observed spectrum \(Y\) with the channel frequency response \(H\) to get \(X\). Unfortunately, things are not that easy. Cancellation of noise from the received samples/spectrum is the hardest part. Complete nullification of noise at the receiver is hardest to achieve and the entire communication system design engineering revolves around reducing this noise to minimum acceptable level to achieve acceptable performance.

Given the noise term, how do we know \(H\) from the observed/received spectrum \(Y\). This is a classical estimation problem.

Usually a known sequence ( pilot sequence in OFDM and training sequence in GSM etc.., ) is transmitted and sent across the channel and from that the channel response \(H\) or the impulse response \(h\) is estimated. This estimated channel response is used to decipher the transmitted spectrum/sequence when receiving the actual data. This type of estimation is useful if and only if the channel response remains constant across the frequency band of interest (Channel is flat across the band of interest – “flat fading”).

Okay !!! To estimate \(H\) in the presence of noise, we need some metric to quantify the accuracy of the estimation.

Line equations:

Consider a generic line equation \(y = mx+c\), where \(m\) is the slope of the line and \(c\) is the intercept. Both \(m\) and \(c\) are constants. Since \(x\) is a first order term (highest degree of \(x\)’s degree is one), this is called a linear equation. If the equation looks like \(y = mx^2+kx+c\), the highest degree of \(x\) is 2 and it becomes a quadratic equation. Similarly, the term \(x^3\) in the polynomial equation gives rise to the name cubic equation, \(x^4\) – quartic equation so on and so forth.

Naming a univariate Polynomial equation

Univariate Polynomial EquationHighest degree of 'x'Name
1Linear
2Quadratic
3Cubic
4Quartic
5Quintic

Coming back to the linear-line equation, \(y = mx +c\), to simplify things, let’s assign \(m=2\) and \(c=0\) and generate values for \(y\) by varying \(x\) from x=0 to 9 in steps of 1.

In the above equation, the input \(x\) gets transformed into output \(y\) by the transformation \(y = mx\). This can be considered analogous to a communication system in frequency domain, where the input \(X\) is transmitted, it gets transformed by a channel \(H\) and gives the output \(Y\).

$$ Y = HX $$

Frequency domain is considered because it has the same structure as the linear equation, whereas, in time domain the output of the channel is the convolution of the channel impulse response \(h\) and the input \(x\).

We can now consider that the channel impulse response in frequency domain \(H\) is equal to the constant \(m\) (flat fading assumption).

To make the channel look closer to a real one, we will add Additive White Noise Gaussian (AWGN) noise to the channel.

$$Y = HX + N $$

To represent this scenario in our line fitting problem, the noise is represented as being generated from a set of uniformly generated random numbers – ‘\(n\)’. We call this – “observed data”.

$$ y_1 = mx + n $$

Note: The term \(n\) in the above is not a constant but a random variable, whereas,the term \( c \) is a constant (This can be considered as a DC bias in the observed data , if present). I have generated the following table for illustration. For convenience and to illustrate the importance of the term MSE, the noise terms in the following table are not drawn from an uniform set of random numbers, instead, they are manually created in a way to make the total error term zero.

The first column is the input \( x \), the second column is the ideal (actual) output \( y \) that follows the equation \( y = mx + c \), with \( c \) set to \( 0 \). The third column is the noise term. The fourth column is the observed samples at the receiver after the ideal samples are corrupted by the noise term. The fourth column represents the equation \( y_1 = mx + n \).

Now, our job is to estimated the constant \( m \) in the presence of noise. Can you think of a possible metric which could aid in this estimation problem ? Given that a known data \( x \) is transmitted, the obvious choice is to measure the average error between the observed sequence of data and the actual data and to use a brute force search for \( m \). Plug-in various values in the place of \(m\) and choose the one that gives the minimum error.

Selecting the “error” as a metric seems to be a great and simple approach. But there exists a basic flaw in this approach. Now, consider the fifth column in the table which measures the error between the observed and actual data. The noise terms in the third column are chosen such that the average-error-measured becomes zero. Even though the average error is zero, it is obvious that the observed data is far from the ideal one. This is a big drawback in the error metric. This is because the positive and the negative errors cancel out. This can happen in the real scenario too, where the errors across all samples of observed data can cancel out each other.

To circumvent this problem, lets square the error terms (sixth column) and average them out. This metric is called – Mean Squared Error. Now, no matter what the sign of error is, the squaring operation always amplifies the errors in the positive direction. The issue of errors cancelling each other is solved by this approach. An estimation approach that attempts to Minimize the Mean Square Error is called a Minimum Mean Square Error (MMSE) estimator.1

I hope that this text might have helped in understanding the logic behind using Mean Square Error as a metric for estimation problems. Comments/suggestions for improvements are welcome. The next post will focus on Ordinary Least Squares (OLS) algorithm (using the mean square error metric) applied to a linear-line fitting problem.

Related Posts

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Normalized CRLB – an alternate form of CRLB

Key focus: Normalized CRLB (Cramér-Rao Lower bound) is an alternate form of CRLB. Let’s explore how normalized CRLB is related to estimator sensitivity.

The variance of an estimate is always greater than or equal to Cramér-Rao Lower Bound of the estimate. The CRLB is in turn given by inverse of Fisher Information. The following equation concisely summarizes the above point.

The Fisher Information can be re-written as

Thus the variance of the estimate can be written as

Consider an incremental change in , that is, . This causes the PDF to change from . We wish to answer the following question : How sensitive is to that change ? Sensitivity (denoted by ) is given by the ratio of change in to the change in .

Letting

From Calculus,

Thus the sensitivity is given by,

The variance of the estimate can now be put in the following form.

The above expression is the normalized version of CRLB. It can be interpreted that the normalized CRLB is equal to the inverse of mean square sensitivity.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Similar topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Cramer Rao Lower Bound for Phase Estimation

Key focus: Derive the Cramer-Rao lower bound for phase estimation applied to DSB transmission. Find out if an efficient estimator actually exists for phase estimation.

Problem formulation

Consider the DSB carrier frequency estimation problem given in the introductory chapter to estimation theory. A message is sent across a channel modulated by a sinusoidal carrier with carrier frequency = fc and amplitude= A. The transmitted signal gets affected by zero-mean AWGN noise when it travels across the medium. The receiver receives the signal and digitizes it for further processing.

To recover the message at the receiver, one has to know every details of the sinusoid:
1) Amplitude-A
2) Carrier Frequency – fc
3) Any uncertainty in its phase – ϕc.

Given a set of digitized samples x[n] and assuming that both amplitude and carrier frequency are known, we are tasked with the objective of estimating the phase of the embedded sinusoid (cosine wave). For analyzing this scenario we should have a model to begin with.

The digitized samples at the receiver are modeled as

Here A and fc are assumed to be known and w[n] is an AWGN noise with mean μ=0 and variance=σ2. We will use CRLB and try to find an efficient estimator to estimate the phase component.

CRLB for Phase Estimation:

As a pre-requisite to this article, readers are advised to go through the previous article on “Steps to find CRLB”

In order to derive CRLB, we need to have a PDF (Probability Density Function) to begin with. Since the underlying noise is modeled as an AWGN noise with mean μ=0 and variance=σ2, the PDF of the observed sample that gets affected by this noise is given by a multivariate Gaussian distribution function.

The sample mean is given by

The PDF is re-written as

Since the observed samples x[n] are fixed in the above equation, we will use the likelihood notation instead of PDF notation. That is, is simply rewritten in terms of log likelihood as . The log likelihood function is given by

For simplicity,we will denote ϕc as ϕ. Next, take the first partial derivative of log likelihood function with respect to ϕ.

Taking the second partial derivative of the log likelihood function,

Since the above term is still dependent on the observed samples \(x[n]\), take expectation of the entire equation to average out the variations.

Let’s derive the terms like fisher information, CRLB and find out whether we can find an efficient estimator from the equations.

Fisher Information:

The Fisher Information for the given problem is

Cramer Rao Lower Bound:

The CRLB is the reciprocal of Fisher Information.

The variance of any estimator estimating the phase of the carrier for given problem will always be higher than this CRLB. That is,

As we can see from the above result, that the variance of the estimates as . Such estimators are called Asymptotically Efficient Estimators.

Figure 1: Asymptotically Efficient Estimator and the Cramer-Rao lower bound

An efficient estimator exists if and only if the first partial derivative of log likelihood function can be written in the form

Re-writing our earlier result,

We can clearly see that the above two equations are not having the same form. Thus, an efficient estimator does not exist for this problem.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] Steven M. Kay, “Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory”, ISBN: 978-0133457117, Prentice Hall, Edition 1, 1993.↗

Related Topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Efficient Estimators by applying CRLB

It has been reiterated that not all estimators are efficient. Even not all the Minimum Variance Unbiased Estimators (MVUE) are efficient. Then how do we quantify whether the estimator designed by us is efficient or not?

An efficient estimator is defined as the one that is
* Unbiased (mean of the estimate = true value of the parameter)
* Attains Cramer-Rao Lower Bound (CRLB).

How to Identify Efficient Estimators?

As mentioned in the previous article, the second partial derivative of log likelihood function of the observed signal model may be (not true always) written in a form like the one below.

If we can write the CRLB equation in the above form, then the estimator is an efficient estimator.

Example:

In an another previous article, CRLB for an estimator that estimates the DC component from a set of observed samples (affected with AWGN noise) was derived. The intermediate step that derived the above requirement for the scenario is given below

From the above equation, it can be ascertained that the efficient estimator exists for the case and it is given by . The efficient estimator is simply given by sample mean of the observed samples.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)

Note: There is a rating embedded within this post, please visit this post to rate it.
It was mentioned in one of the earlier articles that CRLB may provide a way to find a MVUE (Minimum Variance Unbiased Estimators).

Theorem:

There exists an unbiased estimator that attains CRLB if and only if,

Here \( ln \; L(\mathbf{x};\theta) \) is the log likelihood function of x parameterized by \(\theta\) – the parameter to be estimated, \( I(\theta)\) is the Fisher Information and \( g(x)\) is some function.

Then, the estimator that attains CRLB is given by

Steps to find MVUE using CRLB:

If we could write the equation (as given above) in terms of Fisher Matrix and some function \( g(x)\) then \(g(x)\) is a Minimum Variable Unbiased Estimator.
1) Given a signal model \( x \), compute \(\frac{\partial\;ln\;L(\mathbf{x};\theta) }{\partial \theta }\)
2) Check if the above computation can be put in the form like the one given in the above theorem
3) Then \(g(\mathbf{x})\) given an MVUE

Let’s look at how CRLB can be used to find an MVUE for a signal that has a DC component embedded in AWGN noise.

Finding a MVUE to estimate DC component embedded in noise:

Consider the signal model where a DC component – \(A\) is embedded in an AWGN noise with zero mean and variance=\(\sigma \).
Our goal is to find an MVUE that could estimate the DC component from the observed samples \(x[n]\).

$$x[n] = A + w[n], \;\;\; n=0,1,2,\cdots,N-1 $$

We calculate CRLB and see if it can help us find a MVUE.

From the previous derivation

From the above equation we can readily identify \( I(A)\) and \(g(\mathbf{x})\) as follows

Thus,the Fisher Information \(I(A)\) and the MVUE \(g(\mathbf{x})\) are given by

Thus for a signal model which has a DC component in AWGN, the sample mean of observed samples \(x[n]\) gives a Minimum Variance Unbiased Estimator to estimate the DC component.

See also:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Cramér-Rao Lower Bound (CRLB)-Scalar Parameter Estimation

Key focus: Discuss scalar parameter estimation using CRLB. Estimate DC component from observed data in the presence of AWGN noise.

Consider a set of observed data samples and is the scalar parameter that is to be estimated from the observed samples. The accuracy of the estimate depends on how well the observed data is influenced by the parameter . The observed data is considered as a random data whose PDF is influenced by . The PDF describes the dependence of X on .

If the of PDF depends weakly on then the estimates will be poor.If the of PDF on depends strongly on then the estimates will be good.

As seen in the previous section, the curvature of the likelihood function (Fisher Information) is related to the concentration of PDF. More the curvature, more is the concentration of PDF, more will be accuracy of estimates. The Fisher Information is calculated from log likelihood function as,

Under the regularity condition that the score of the log likelihood function is zero,

The inverse of the Fisher Information gives the Cramér-Rao Lower Bound (CRLB).

Theoretical method to find CRLB:

1) Given a model for observed data samples – , write the log likelihood function as a function of   –
2) Keep as fixed and take the second partial derivative of the log likelihood function with respect to parameter to be estimated –

3) If the result depends on , fix and take the expected value with respect to . This step can be skipped if the result does not depend on .
4) If the result depends on , then evaluate the result at specific values of
5) Take the reciprocal of the result and negate it.

Let’s see an example for scalar parameter estimation using CRLB.

Derivation of CRLB for an embedded DC component in AWGN Noise:

Here is a constant DC value that has to be estimated from the observed data samples and is the AWGN noise with zero mean and variance=.

Given the fact that the samples are influenced by the AWGN noise with zero mean and variance=, the likelihood function can be written as

The log likelihood function is formed as,

Taking the first partial derivative of log likelihood function with respect to A,

Computing the second partial derivative of log likelihood function by differentiating one more time,

The Fisher Information is given by taking the expectation and negating it.

The Cramér-Rao Lower Bound is the reciprocal of Fisher Information I(A)

The variance of any estimator that estimates the DC component from the given observed samples will always be greater that the CRLB. That is, the CRLB acts as the lower bound for the variance of the estimates. This can be conveniently represented as

Tweaking the CRLB:

Now that we have found an expression for CRLB for the estimation of the DC component, we can look for schemes that may affect the CRLB. From the expression of CRLB, following points can be inferred.

1) The CRLB does not depend on the parameter to be estimated ()
2) The CRLB increases linearly with
3) The CRLB decreases inversely with

For further reading

[1] Debrati et al,“A Novel Frequency Synchronization Algorithm and its Cramer Rao Bound in Practical UWB Environment for MB-OFDM Systems”, RADIOENGINEERING, VOL. 18, NO. 1, APRIL 2009.↗

Similar topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author:


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Cramér-Rao Lower Bound: Introduction

Key concept: Cramér-Rao bound is the lower bound on variance of unbiased estimators that estimate deterministic parameters.

Introduction

The criteria for existence of having an Minimum Variance Unbiased Estimator (MVUE) was discussed in a previous article. To have an MVUE, it is necessary to have estimates that are unbiased and that give minimum variance (compared to the true parameter value). This is given by the following two equations

For a MVUE, it is easier to verify the first criteria (unbiased-ness) using the first equation, but verifying the second criteria (minimum variance) is tricky. We can only calculate the variance of the estimator, but how can we make sure that it is “the minimum”? How can we make sure that a designed estimator gives the minimum variance? There may exist other numerous unbiased estimators (which we may not know) that may give minimum variance. Other words, how do we make sure that our estimate is the best MVUE in the world? Cramér-Rao Lower Bound (CRLB) may come to our rescue.

Cramér-Rao Lower Bound (CRLB):

Harald Cramér and Radhakrishna Rao derived a way to express the lower bound on the variance of unbiased estimators that estimate deterministic parameters. This lower bound is called as the Cramér-Rao Lower Bound (CRLB).

If is an unbiased estimate of a deterministic parameter , then the relationship between the variance of the estimates ( ) and CRLB can be expressed as

CRLB tell us the best minimum variance that we can expect to get from an unbiased estimator.

Applications of CRLB include :

1) Making judgment on proposed estimators. Estimators whose variance is not close to CRLB are considered inferior.
2) To do feasibility studies as to whether a particular estimator/system can meet given specifications. It is also used to rule out impossible estimators – No estimator can beat CRLB (example: Figure 1).
3) Benchmark for comparing unbiased estimators.
4) It may sometimes provide MVUE. If an unbiased estimator achieved CRLB, it means that it is a MVUE.

Figure 1: CRLB and the efficient estimator for phase estimation

Feasibility Studies :

Derivation of CRLB for a particular given scenario or proposed algorithm of estimation is often found in research texts. The derived theoretical CRLB for a system/algorithm is compared with actual variance of the implemented system and conclusions are drawn. For example, in the paper titled “A Novel Frequency Synchronization Algorithm and its Cramer Rao Bound in Practical UWB Environment for MB-OFDM Systems”[1] – a frequency offset estimation algorithm was proposed for estimating frequency offsets in multi-band orthogonal frequency division multiplexing (MB-OFDM) systems. The performance of the algorithm was studied by BER analysis (Eb/N0 Vs BER curves). Additionally,the estimator performance is further validated by comparing the simulated estimator variance with the derived theoretical CRLB for four UWB channel models.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Reference

[1] Debrati et al,“A Novel Frequency Synchronization Algorithm and its Cramer Rao Bound in Practical UWB Environment for MB-OFDM Systems”, RADIOENGINEERING, VOL. 18, NO. 1, APRIL 2009.↗

Similar topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Score, Fisher Information and Estimator Sensitivity

As we have seen in the previous articles, that the estimation of a parameter from a set of data samples depends strongly on the underlying PDF. The accuracy of the estimation is inversely proportional to the variance of the underlying PDF. That is, less the variance of PDF more is the accuracy of estimation and vice versa. In other words, the estimation accuracy depends on the sharpness of the PDF curve. Sharper the PDF curve more is the accuracy.

Gradient and score :

In geometry, given any curve, the gradient (also called slope) of the curve is zero at maximum and minimum points of the curve. Gradient of a function (representing a curve) is calculated by its first derivative. The gradient of log likelihood function is called score and is used to find Maximum Likelihood estimate of a parameter.

Figure: The gradient of log likelihood function is called score

Denoting the score as u(θ),

At the MLE point, where the true value of the parameter θ is equal to the ML estimate the gradient is zero. Thus equating the score to zero and finding the corresponding gives the ML estimate of θ (provided the log likelihood function is a concave curve).

Curvature and Fisher Information :

In geometry, the sharpness of a curve is measured by its Curvature. The sharpness of a PDF curve is influenced by its variance. More the variance less is the sharpness and vice versa. The accuracy of the estimator is measure by the sharpness of the underlying PDF curve. In differential geometry, the curvature is related to second derivative of a function.

The mean of the score evaluated at ML estimate (or true value of estimate) θ is zero. This gives,

Under this regularity condition that the expectation of the score is zero, the variance of the score is called Fisher Information. That is the expectation of second derivative of log likelihood function is called Fisher Information. It measures the sharpness of the log likelihood function. More the value of Fisher Information; more is the sharpness of the curve and vice versa. So if we can calculate the Fisher Information of a log likelihood function, then we can know more about the accuracy or sensitivity of the estimator with respect to the parameter to be estimated.

Figure 2: The variance of the score is called Fisher Information

The Fisher Information denoted by I(θ) is given by the variance of the score.

Here the operator indicates the operation of taking complex conjugate. The negative sign in the above equation is introduced to bring inverse relationship between variance and the Fisher Information (i.e. Fisher Information will be high for log likelihood functions that have low variance). As we can see from the above equation, that the Fisher Information is related to the second derivative (Curvature or Sharpness) of the log likelihood function. The I(θ) computed above is also called Observed Fisher Information.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] Songfeng Zheng, “Fisher Information and Cramer-Rao Bound”, lecture notes, Statistical Theory II, Missouri State University.↗

Topics in this series

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Theoretical derivation of MLE for Gaussian Distribution:

Note: There is a rating embedded within this post, please visit this post to rate it.
As a pre-requisite, check out the previous article on the logic behind deriving the maximum likelihood estimator for a given PDF.

Let X=(x1,x2,…, xN) are the samples taken from Gaussian distribution given by

Calculating the Likelihood

The log likelihood is given by,

Differentiating and equating to zero to find the maxim (otherwise equating the score to zero)

Thus the mean of the samples gives the MLE of the parameter .

For the derivation of other PDFs see the following links
Theoretical derivation of Maximum Likelihood Estimator for Poisson PDF
Theoretical derivation of Maximum Likelihood Estimator for Exponential PDF

See also:

[1] An Introduction to Estimation Theory
[2] Bias of an Estimator
[3] Minimum Variance Unbiased Estimators (MVUE)
[4] Maximum Likelihood Estimation
[5] Maximum Likelihood Decoding
[6] Probability and Random Process
[7] Likelihood Function and Maximum Likelihood Estimation (MLE)

Theoretical derivation of MLE for Exponential Distribution:

Note: There is a rating embedded within this post, please visit this post to rate it.
As a pre-requisite, check out the previous article on the logic behind deriving the maximum likelihood estimator for a given PDF.

Let X=(x1,x2,…, xN) are the samples taken from Exponential distribution given by

Calculating the Likelihood

The log likelihood is given by,

Differentiating and equating to zero to find the maxim (otherwise equating the score to zero)

Thus the inverse of mean of the samples gives the MLE of the parameter .

For the derivation of other PDFs see the following links
Theoretical derivation of Maximum Likelihood Estimator for Poisson PDF
Theoretical derivation of Maximum Likelihood Estimator for Gaussian PDF

See also:

[1] An Introduction to Estimation Theory
[2] Bias of an Estimator
[3] Minimum Variance Unbiased Estimators (MVUE)
[4] Maximum Likelihood Estimation
[5] Maximum Likelihood Decoding
[6] Probability and Random Process
[7] Likelihood Function and Maximum Likelihood Estimation (MLE)