Cramér-Rao Lower Bound (CRLB)-Scalar Parameter Estimation

Key focus: Discuss scalar parameter estimation using CRLB. Estimate DC component from observed data in the presence of AWGN noise.

Consider a set of observed data samples and is the scalar parameter that is to be estimated from the observed samples. The accuracy of the estimate depends on how well the observed data is influenced by the parameter . The observed data is considered as a random data whose PDF is influenced by . The PDF describes the dependence of X on .

If the of PDF depends weakly on then the estimates will be poor.If the of PDF on depends strongly on then the estimates will be good.

As seen in the previous section, the curvature of the likelihood function (Fisher Information) is related to the concentration of PDF. More the curvature, more is the concentration of PDF, more will be accuracy of estimates. The Fisher Information is calculated from log likelihood function as,

Under the regularity condition that the score of the log likelihood function is zero,

The inverse of the Fisher Information gives the Cramér-Rao Lower Bound (CRLB).

Theoretical method to find CRLB:

1) Given a model for observed data samples – , write the log likelihood function as a function of   –
2) Keep as fixed and take the second partial derivative of the log likelihood function with respect to parameter to be estimated –

3) If the result depends on , fix and take the expected value with respect to . This step can be skipped if the result does not depend on .
4) If the result depends on , then evaluate the result at specific values of
5) Take the reciprocal of the result and negate it.

Let’s see an example for scalar parameter estimation using CRLB.

Derivation of CRLB for an embedded DC component in AWGN Noise:

Here is a constant DC value that has to be estimated from the observed data samples and is the AWGN noise with zero mean and variance=.

Given the fact that the samples are influenced by the AWGN noise with zero mean and variance=, the likelihood function can be written as

The log likelihood function is formed as,

Taking the first partial derivative of log likelihood function with respect to A,

Computing the second partial derivative of log likelihood function by differentiating one more time,

The Fisher Information is given by taking the expectation and negating it.

The Cramér-Rao Lower Bound is the reciprocal of Fisher Information I(A)

The variance of any estimator that estimates the DC component from the given observed samples will always be greater that the CRLB. That is, the CRLB acts as the lower bound for the variance of the estimates. This can be conveniently represented as

Tweaking the CRLB:

Now that we have found an expression for CRLB for the estimation of the DC component, we can look for schemes that may affect the CRLB. From the expression of CRLB, following points can be inferred.

1) The CRLB does not depend on the parameter to be estimated ()
2) The CRLB increases linearly with
3) The CRLB decreases inversely with

For further reading

[1] Debrati et al,“A Novel Frequency Synchronization Algorithm and its Cramer Rao Bound in Practical UWB Environment for MB-OFDM Systems”, RADIOENGINEERING, VOL. 18, NO. 1, APRIL 2009.↗

Similar topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author:


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Cramér-Rao Lower Bound: Introduction

Key concept: Cramér-Rao bound is the lower bound on variance of unbiased estimators that estimate deterministic parameters.

Introduction

The criteria for existence of having an Minimum Variance Unbiased Estimator (MVUE) was discussed in a previous article. To have an MVUE, it is necessary to have estimates that are unbiased and that give minimum variance (compared to the true parameter value). This is given by the following two equations

For a MVUE, it is easier to verify the first criteria (unbiased-ness) using the first equation, but verifying the second criteria (minimum variance) is tricky. We can only calculate the variance of the estimator, but how can we make sure that it is “the minimum”? How can we make sure that a designed estimator gives the minimum variance? There may exist other numerous unbiased estimators (which we may not know) that may give minimum variance. Other words, how do we make sure that our estimate is the best MVUE in the world? Cramér-Rao Lower Bound (CRLB) may come to our rescue.

Cramér-Rao Lower Bound (CRLB):

Harald Cramér and Radhakrishna Rao derived a way to express the lower bound on the variance of unbiased estimators that estimate deterministic parameters. This lower bound is called as the Cramér-Rao Lower Bound (CRLB).

If is an unbiased estimate of a deterministic parameter , then the relationship between the variance of the estimates ( ) and CRLB can be expressed as

CRLB tell us the best minimum variance that we can expect to get from an unbiased estimator.

Applications of CRLB include :

1) Making judgment on proposed estimators. Estimators whose variance is not close to CRLB are considered inferior.
2) To do feasibility studies as to whether a particular estimator/system can meet given specifications. It is also used to rule out impossible estimators – No estimator can beat CRLB (example: Figure 1).
3) Benchmark for comparing unbiased estimators.
4) It may sometimes provide MVUE. If an unbiased estimator achieved CRLB, it means that it is a MVUE.

Figure 1: CRLB and the efficient estimator for phase estimation

Feasibility Studies :

Derivation of CRLB for a particular given scenario or proposed algorithm of estimation is often found in research texts. The derived theoretical CRLB for a system/algorithm is compared with actual variance of the implemented system and conclusions are drawn. For example, in the paper titled “A Novel Frequency Synchronization Algorithm and its Cramer Rao Bound in Practical UWB Environment for MB-OFDM Systems”[1] – a frequency offset estimation algorithm was proposed for estimating frequency offsets in multi-band orthogonal frequency division multiplexing (MB-OFDM) systems. The performance of the algorithm was studied by BER analysis (Eb/N0 Vs BER curves). Additionally,the estimator performance is further validated by comparing the simulated estimator variance with the derived theoretical CRLB for four UWB channel models.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Reference

[1] Debrati et al,“A Novel Frequency Synchronization Algorithm and its Cramer Rao Bound in Practical UWB Environment for MB-OFDM Systems”, RADIOENGINEERING, VOL. 18, NO. 1, APRIL 2009.↗

Similar topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Score, Fisher Information and Estimator Sensitivity

As we have seen in the previous articles, that the estimation of a parameter from a set of data samples depends strongly on the underlying PDF. The accuracy of the estimation is inversely proportional to the variance of the underlying PDF. That is, less the variance of PDF more is the accuracy of estimation and vice versa. In other words, the estimation accuracy depends on the sharpness of the PDF curve. Sharper the PDF curve more is the accuracy.

Gradient and score :

In geometry, given any curve, the gradient (also called slope) of the curve is zero at maximum and minimum points of the curve. Gradient of a function (representing a curve) is calculated by its first derivative. The gradient of log likelihood function is called score and is used to find Maximum Likelihood estimate of a parameter.

Figure: The gradient of log likelihood function is called score

Denoting the score as u(θ),

At the MLE point, where the true value of the parameter θ is equal to the ML estimate the gradient is zero. Thus equating the score to zero and finding the corresponding gives the ML estimate of θ (provided the log likelihood function is a concave curve).

Curvature and Fisher Information :

In geometry, the sharpness of a curve is measured by its Curvature. The sharpness of a PDF curve is influenced by its variance. More the variance less is the sharpness and vice versa. The accuracy of the estimator is measure by the sharpness of the underlying PDF curve. In differential geometry, the curvature is related to second derivative of a function.

The mean of the score evaluated at ML estimate (or true value of estimate) θ is zero. This gives,

Under this regularity condition that the expectation of the score is zero, the variance of the score is called Fisher Information. That is the expectation of second derivative of log likelihood function is called Fisher Information. It measures the sharpness of the log likelihood function. More the value of Fisher Information; more is the sharpness of the curve and vice versa. So if we can calculate the Fisher Information of a log likelihood function, then we can know more about the accuracy or sensitivity of the estimator with respect to the parameter to be estimated.

Figure 2: The variance of the score is called Fisher Information

The Fisher Information denoted by I(θ) is given by the variance of the score.

Here the operator indicates the operation of taking complex conjugate. The negative sign in the above equation is introduced to bring inverse relationship between variance and the Fisher Information (i.e. Fisher Information will be high for log likelihood functions that have low variance). As we can see from the above equation, that the Fisher Information is related to the second derivative (Curvature or Sharpness) of the log likelihood function. The I(θ) computed above is also called Observed Fisher Information.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] Songfeng Zheng, “Fisher Information and Cramer-Rao Bound”, lecture notes, Statistical Theory II, Missouri State University.↗

Topics in this series

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Theoretical derivation of MLE for Gaussian Distribution:

Note: There is a rating embedded within this post, please visit this post to rate it.
As a pre-requisite, check out the previous article on the logic behind deriving the maximum likelihood estimator for a given PDF.

Let X=(x1,x2,…, xN) are the samples taken from Gaussian distribution given by

Calculating the Likelihood

The log likelihood is given by,

Differentiating and equating to zero to find the maxim (otherwise equating the score to zero)

Thus the mean of the samples gives the MLE of the parameter .

For the derivation of other PDFs see the following links
Theoretical derivation of Maximum Likelihood Estimator for Poisson PDF
Theoretical derivation of Maximum Likelihood Estimator for Exponential PDF

See also:

[1] An Introduction to Estimation Theory
[2] Bias of an Estimator
[3] Minimum Variance Unbiased Estimators (MVUE)
[4] Maximum Likelihood Estimation
[5] Maximum Likelihood Decoding
[6] Probability and Random Process
[7] Likelihood Function and Maximum Likelihood Estimation (MLE)

Theoretical derivation of MLE for Exponential Distribution:

Note: There is a rating embedded within this post, please visit this post to rate it.
As a pre-requisite, check out the previous article on the logic behind deriving the maximum likelihood estimator for a given PDF.

Let X=(x1,x2,…, xN) are the samples taken from Exponential distribution given by

Calculating the Likelihood

The log likelihood is given by,

Differentiating and equating to zero to find the maxim (otherwise equating the score to zero)

Thus the inverse of mean of the samples gives the MLE of the parameter .

For the derivation of other PDFs see the following links
Theoretical derivation of Maximum Likelihood Estimator for Poisson PDF
Theoretical derivation of Maximum Likelihood Estimator for Gaussian PDF

See also:

[1] An Introduction to Estimation Theory
[2] Bias of an Estimator
[3] Minimum Variance Unbiased Estimators (MVUE)
[4] Maximum Likelihood Estimation
[5] Maximum Likelihood Decoding
[6] Probability and Random Process
[7] Likelihood Function and Maximum Likelihood Estimation (MLE)

Theoretical derivation of Maximum Likelihood Estimator for Poisson PDF:

Note: There is a rating embedded within this post, please visit this post to rate it.
Suppose X=(x1,x2,…, xN) are the samples taken from a random distribution whose PDF is parameterized by the parameter . If the PDF of the underlying parameter satisfies some regularity condition (if the log of the PDF is differentiable) then the likelihood function is given by

Here is the PDF of the underlying distribution.

Hereafter we will denote as .

The maximum likelihood estimate of the unknown parameter can be found by selecting the say some for which the likelihood function attains maximum. We usually use log of the likelihood function to simplify multiplications into additions. So restating this, the maximum likelihood estimate of the unknown parameter can be found by selecting the say some for which the log likelihood function attains maximum.

In differential geometry, the maximum of a function f(x) is found by taking the first derivative of the function and equating it to zero. Similarly, the maximum likelihood estimate of a parameter – is found by partially differentiating the likelihood function or the log likelihood function and equating it to zero.

The first partial derivative of log likelihood function with respect to is also called score. The variance of the score (partial derivative of score with respect to ) is known as Fisher Information.

Calculating MLE for Poisson distribution:

Let X=(x1,x2,…, xN) are the samples taken from Poisson distribution given by

Calculating the Likelihood

The log likelihood is given by,

Differentiating and equating to zero to find the maxim (otherwise equating the score to zero)

Thus the mean of the samples gives the MLE of the parameter .

To be updated soon

For the derivation of other PDFs see the following links
Theoretical derivation of Maximum Likelihood Estimator for Exponential PDF
Theoretical derivation of Maximum Likelihood Estimator for Gaussian PDF

See also:

[1] An Introduction to Estimation Theory
[2] Bias of an Estimator
[3] Minimum Variance Unbiased Estimators (MVUE)
[4] Maximum Likelihood Estimation
[5] Maximum Likelihood Decoding
[6] Probability and Random Process
[7] Likelihood Function and Maximum Likelihood Estimation (MLE)

Books on Estimation Theory:

Maximum Likelihood Estimation (MLE) : Understand with example

Key focus: Understand maximum likelihood estimation (MLE) using hands-on example. Know the importance of log likelihood function and its use in estimation problems.

Likelihood Function:

Suppose X=(x1,x2,…, xN) are the samples taken from a random distribution whose PDF is parameterized by the parameter θ. The likelihood function is given by

Here fN(xN;θ) is the PDF of the underlying distribution.

The above equation differs significantly from the joint probability calculation that in joint probability calculation, θ is considered a random variable. In the above equation, the parameter θ is the parameter to be estimated.

Example:

Consider the DC estimation problem presented in the previous article where a transmitter transmits continuous stream of data samples representing a constant value – A. The data samples sent via a communication channel gets added with White Gaussian Noise – w[n] (with μ=0 and σ2=1 ). The receiver receives the samples and its goal is to estimate the actual DC component – A in the presence of noise.

Figure 1: The problem of DC estimation

Likelihood as an Estimation Metric:

Let’s use the likelihood function as estimation metric. The estimation of A depends on the PDF of the underlying noise-w[n]. The estimation accuracy depends on the variance of the noise. More the variance less is the accuracy of estimation and vice versa.

Let’s fix A=1.3 and generate 10 samples from the above model (Use the Matlab script given below to test this. You may get different set of numbers). Now we pretend that we do not know anything about the model and all we want to do is to estimate the DC component (Parameter to be estimated θ=A) from the observed samples:

Assuming a variance of 1 for the underlying PDF, we will try a range of values for A from -2.0 to +1.5 in steps of 0.1 and calculate the likelihood function for each value of A.

Matlab script:

% Demonstration of Maximum Likelihood Estimation in Matlab
%   Author: Mathuranathan (https://www.gaussianwaves.com)
%   License : creative commons : Attribution-NonCommercial-ShareAlike 3.0 Unported

A=1.3;
N=10; %Number of Samples to collect
x=A+randn(1,N);

s=1; %Assume standard deviation s=1

rangeA=-2:0.1:5; %Range of values of estimation parameter to test
L=zeros(1,length(rangeA)); %Place holder for likelihoods

for i=1:length(rangeA)
    %Calculate Likelihoods for each parameter value in the range
    L(i) = exp(-sum((x-rangeA(i)).^2)/(2*s^2));  %Neglect the constant term (1/(sqrt(2*pi)*sigma))^N as it will pull %down the likelihood value to zero for increasing value of N
end

[maxL,index]=max(L); %Select the parameter value with Maximum Likelihood
display('Maximum Likelihood of A');
display(rangeA(index));

%Plotting Commands
plot(rangeA,L);hold on;
stem(rangeA(index),L(index),'r'); %Point the Maximum Likelihood Estimate
displayText=['\leftarrow Likelihood of A=' num2str(rangeA(index))];
title('Maximum Likelihood Estimation of unknown Parameter A');
xlabel('\leftarrow A');
ylabel('Likelihood');
text(rangeA(index),L(index)/3,displayText,'HorizontalAlignment','left');

figure(2);
plot(rangeA,log(L));hold on;
YL = ylim;YMIN = YL(1);
plot([rangeA(index) rangeA(index)],[YMIN log(L(index))] ,'r'); %Point the Maximum Likelihood Estimate
title('Log Likelihood Function');
xlabel('\leftarrow A');
ylabel('Log Likelihood');
text([rangeA(index)],YMIN/2,displayText,'HorizontalAlignment','left');

Simulation Result:

For the above mentioned 10 samples of observation, the likelihood function over the range (-2:0.1:1.5) of DC component values is plotted below. The maximum likelihood value happens at A=1.4 as shown in the figure. The estimated value of A is 1.4 since the maximum value of likelihood occurs there.

This estimation technique based on maximum likelihood of a parameter is called Maximum Likelihood Estimation (MLE). The estimation accuracy will increase if the number of samples for observation is increased. Try the simulation with the number of samples N set to 5000 or 10000 and observe the estimated value of A for each run.

Figure 2: Maximum likelihood estimation of unknown parameter A

Log Likelihood Function:

It is often useful to calculate the log likelihood function as it reduces the above mentioned equation to series of additions instead of multiplication of several terms. This is particularly useful when implementing the likelihood metric in digital signal processors. The log likelihood is simply calculated by taking the logarithm of the above mentioned equation. The decision is again based on the maximum likelihood criterion.

$latex \begin{aligned} ln \left[L(\theta;X)\right ] &= \prod_{i=1}^{N} ln \left[f_i(x_i;\theta)\right ] \\
&= ln\left[f_1(x_1;\theta) \right ]+ln\left[f_2(x_2;\theta) \right ] + \cdots+ ln\left[f_N(x_N;\theta) \right ]
\end{aligned} &s=1$

The corresponding plot is given below

Figure 3: Maximum likelihood estimation using log likelihood function

Advantages of Maximum Likelihood Estimation:

* Asymptotically Efficient – meaning that the estimate gets better with more samples
* Asymptotically unbiased
* Asymptotically consistent
* Easier to compute
* Estimation without any prior information
* The estimates closely agree with the data

Disadvantages of Maximum Likelihood Estimation:

* Since the estimates closely agree with data, it will give noisy estimates for data mixed with noise.
* It does not utilize any prior information for the estimation. But in real world scenario, we always have some prior information about the parameter to be estimated. We should always use it to our advantage despite it introducing bias in the estimates.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] Steven M. Kay, “Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory”, ISBN: 978-0133457117, Prentice Hall, Edition 1, 1993.↗

Related topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Maximum Likelihood estimation

Keywords: maximum likelihood estimation, statistical method, probability distribution, MLE, models, practical applications, finance, economics, natural sciences.

Introduction

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution by finding the set of values that maximize the likelihood function of the observed data. In other words, MLE is a method of finding the most likely values of the unknown parameters that would have generated the observed data.

The likelihood function is a function that describes the probability of observing the data given the parameters of the probability distribution. The MLE method seeks to find the set of parameter values that maximizes this likelihood function.

For example, suppose we have a set of data that we believe to be normally distributed, but we do not know the mean or variance of the distribution. We can use MLE to estimate these parameters by finding the mean and variance that maximize the likelihood function of the observed data.

The MLE method is widely used in statistical inference, hypothesis testing, and model fitting in many areas, including economics, finance, engineering, and the natural sciences. MLE is a powerful and flexible method that can be applied to a wide range of statistical models, making it a valuable tool in data analysis and modeling.

Difference between MLE and MLD

Maximum likelihood estimation (MLE) and maximum likelihood decoding (MLD) are two different concepts used in different contexts.

Maximum likelihood estimation is a statistical method used to estimate the parameters of a probability distribution based on a set of observed data. The goal is to find the set of parameter values that maximize the likelihood function of the observed data. MLE is commonly used in statistical inference, hypothesis testing, and model fitting.

On the other hand, maximum likelihood decoding (MLD) is a method used in digital communications and signal processing to decode a received signal that has been transmitted through a noisy channel. The goal is to find the transmitted message that is most likely to have produced the received signal, based on a given probabilistic model of the channel.

In maximum likelihood decoding, the receiver calculates the likelihood of each possible transmitted message, given the received signal and the channel model. The maximum likelihood decoder then selects the transmitted message that has the highest likelihood as the decoded message.

While both MLE and MLD involve the concept of maximum likelihood, they are used in different contexts. MLE is used in statistical estimation, while MLD is used in digital communications and signal processing for decoding.

MLE applied to communication systems

Maximum Likelihood estimation (MLE) is an important tool in determining the actual probabilities of the assumed model of communication.

In reality, a communication channel can be quite complex and a model becomes necessary to simplify calculations at decoder side.The model should closely approximate the complex communication channel. There exist a myriad of standard statistical models that can be employed for this task; Gaussian, Binomial, Exponential, Geometric, Poisson,etc., A standard communication model is chosen based on empirical data.

Each model mentioned above has unique parameters that characterizes them. Determination of these parameters for the chosen model is necessary to make them closely model the communication channel at hand.

Suppose a binomial model is chosen (based on observation of data) for the error events over a particular channel, it is essential to determine the probability of succcess (\(p\)) of the binomial model.

If a Gaussian model (normal distribution!!!) is chosen for a particular channel then estimating mean (\(\mu\)) and variance (\(\sigma^{2}\)) are necessary so that they can be applied while computing the conditional probability of p(y received | x sent)

Similarly estimating the mean number of events within a given interval of time or space (\(\lambda\)) is a necessity for a Poisson distribution model.

Maximum likelihood estimation is a method to determine these unknown parameters associated with the corresponding chosen models of the communication channel.

Python code example for MLE

The following program is an implementation of maximum likelihood estimation (MLE) for the binary symmetric channel (BSC) using the binomial probability mass function (PMF).

The goal of MLE is to estimate the value of an unknown parameter (in this case, the error probability \(p\)) based on observed data. The BSC is a simple channel model where each transmitted bit is flipped (with probability \(p\)) independently of other bits during transmission. The goal of the following program is to estimate the error probability \(p\) of the BSC based on a given binary data sequence.

import numpy as np
from scipy.optimize import minimize
from scipy.special import binom
import matplotlib.pyplot as plt

def BSC_MLE(data):
    """
    Maximum likelihood estimation (MLE) for the Binary Symmetric Channel (BSC).
    This function estimates the error probability p of the BSC based on the observed data.
    """
    
    # Define the binomial probability mass function
    def binom_PMF(p):
        n = len(data)
        k = np.sum(data)
        p = np.clip(p, 1e-10, 1 - 1e-10)  # Regularization to avoid problems due to small estimation errors
        logprob = np.log(binom(n, k)) + k*np.log(p) + (n-k)*np.log(1-p)
        return -logprob
    
    # Use the minimize function from scipy.optimize to find the value of p that maximizes the binomial PMF
    #x0 argument specifies the initial guess for the value of p that maximizes the binomial PMF. For BSC x0=0.5
    #BFGS is Broyden-Fletcher-Goldfarb-Shanno optimization algorithm used for unconstrained nonlinear optimization
    res = minimize(lambda p: binom_PMF(p), x0=0.5, method='BFGS')
    p_est = res.x[0]

    # Plot the observed data as a histogram
    plt.hist(data, bins=2, density=True, alpha=0.5)
    plt.axvline(p_est, color='r', linestyle='--')
    plt.xlabel('Bit value')
    plt.ylabel('Frequency')
    plt.title('Observed data')
    plt.show()
    
    return p_est

data = np.random.randint(2, size=1000)
p_est = BSC_MLE(data)
print('Estimated error probability: {:.4f}'.format(p_est))

The program first defines a function called BSC_MLE that takes a binary data sequence as input and returns the estimated error probability p_est. The BSC_MLE function defines the binomial PMF, which represents the probability of observing a certain number of errors (i.e., bit flips) in the data sequence given a specific error probability p. The binomial PMF is then maximized using the minimize function from the scipy.optimize module to find the value of p that maximizes the likelihood of observing the data.

The program then generates a random binary data sequence of length 100 using the np.random.randint() function and calls the BSC_MLE function to estimate the error probability based on the observed data. Finally, the program prints the estimated error probability. Try increasing the sequence length to 1000 and observe the estimated error probability.

Figure 1: Maximum Likelihood Estimation (MLE) : Plotting the observed data as a histogram

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)