Maximum Likelihood estimation

Keywords: maximum likelihood estimation, statistical method, probability distribution, MLE, models, practical applications, finance, economics, natural sciences.

Introduction

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution by finding the set of values that maximize the likelihood function of the observed data. In other words, MLE is a method of finding the most likely values of the unknown parameters that would have generated the observed data.

The likelihood function is a function that describes the probability of observing the data given the parameters of the probability distribution. The MLE method seeks to find the set of parameter values that maximizes this likelihood function.

For example, suppose we have a set of data that we believe to be normally distributed, but we do not know the mean or variance of the distribution. We can use MLE to estimate these parameters by finding the mean and variance that maximize the likelihood function of the observed data.

The MLE method is widely used in statistical inference, hypothesis testing, and model fitting in many areas, including economics, finance, engineering, and the natural sciences. MLE is a powerful and flexible method that can be applied to a wide range of statistical models, making it a valuable tool in data analysis and modeling.

Difference between MLE and MLD

Maximum likelihood estimation (MLE) and maximum likelihood decoding (MLD) are two different concepts used in different contexts.

Maximum likelihood estimation is a statistical method used to estimate the parameters of a probability distribution based on a set of observed data. The goal is to find the set of parameter values that maximize the likelihood function of the observed data. MLE is commonly used in statistical inference, hypothesis testing, and model fitting.

On the other hand, maximum likelihood decoding (MLD) is a method used in digital communications and signal processing to decode a received signal that has been transmitted through a noisy channel. The goal is to find the transmitted message that is most likely to have produced the received signal, based on a given probabilistic model of the channel.

In maximum likelihood decoding, the receiver calculates the likelihood of each possible transmitted message, given the received signal and the channel model. The maximum likelihood decoder then selects the transmitted message that has the highest likelihood as the decoded message.

While both MLE and MLD involve the concept of maximum likelihood, they are used in different contexts. MLE is used in statistical estimation, while MLD is used in digital communications and signal processing for decoding.

MLE applied to communication systems

Maximum Likelihood estimation (MLE) is an important tool in determining the actual probabilities of the assumed model of communication.

In reality, a communication channel can be quite complex and a model becomes necessary to simplify calculations at decoder side.The model should closely approximate the complex communication channel. There exist a myriad of standard statistical models that can be employed for this task; Gaussian, Binomial, Exponential, Geometric, Poisson,etc., A standard communication model is chosen based on empirical data.

Each model mentioned above has unique parameters that characterizes them. Determination of these parameters for the chosen model is necessary to make them closely model the communication channel at hand.

Suppose a binomial model is chosen (based on observation of data) for the error events over a particular channel, it is essential to determine the probability of succcess (\(p\)) of the binomial model.

If a Gaussian model (normal distribution!!!) is chosen for a particular channel then estimating mean (\(\mu\)) and variance (\(\sigma^{2}\)) are necessary so that they can be applied while computing the conditional probability of p(y received | x sent)

Similarly estimating the mean number of events within a given interval of time or space (\(\lambda\)) is a necessity for a Poisson distribution model.

Maximum likelihood estimation is a method to determine these unknown parameters associated with the corresponding chosen models of the communication channel.

Python code example for MLE

The following program is an implementation of maximum likelihood estimation (MLE) for the binary symmetric channel (BSC) using the binomial probability mass function (PMF).

The goal of MLE is to estimate the value of an unknown parameter (in this case, the error probability \(p\)) based on observed data. The BSC is a simple channel model where each transmitted bit is flipped (with probability \(p\)) independently of other bits during transmission. The goal of the following program is to estimate the error probability \(p\) of the BSC based on a given binary data sequence.

import numpy as np
from scipy.optimize import minimize
from scipy.special import binom
import matplotlib.pyplot as plt

def BSC_MLE(data):
    """
    Maximum likelihood estimation (MLE) for the Binary Symmetric Channel (BSC).
    This function estimates the error probability p of the BSC based on the observed data.
    """
    
    # Define the binomial probability mass function
    def binom_PMF(p):
        n = len(data)
        k = np.sum(data)
        p = np.clip(p, 1e-10, 1 - 1e-10)  # Regularization to avoid problems due to small estimation errors
        logprob = np.log(binom(n, k)) + k*np.log(p) + (n-k)*np.log(1-p)
        return -logprob
    
    # Use the minimize function from scipy.optimize to find the value of p that maximizes the binomial PMF
    #x0 argument specifies the initial guess for the value of p that maximizes the binomial PMF. For BSC x0=0.5
    #BFGS is Broyden-Fletcher-Goldfarb-Shanno optimization algorithm used for unconstrained nonlinear optimization
    res = minimize(lambda p: binom_PMF(p), x0=0.5, method='BFGS')
    p_est = res.x[0]

    # Plot the observed data as a histogram
    plt.hist(data, bins=2, density=True, alpha=0.5)
    plt.axvline(p_est, color='r', linestyle='--')
    plt.xlabel('Bit value')
    plt.ylabel('Frequency')
    plt.title('Observed data')
    plt.show()
    
    return p_est

data = np.random.randint(2, size=1000)
p_est = BSC_MLE(data)
print('Estimated error probability: {:.4f}'.format(p_est))

The program first defines a function called BSC_MLE that takes a binary data sequence as input and returns the estimated error probability p_est. The BSC_MLE function defines the binomial PMF, which represents the probability of observing a certain number of errors (i.e., bit flips) in the data sequence given a specific error probability p. The binomial PMF is then maximized using the minimize function from the scipy.optimize module to find the value of p that maximizes the likelihood of observing the data.

The program then generates a random binary data sequence of length 100 using the np.random.randint() function and calls the BSC_MLE function to estimate the error probability based on the observed data. Finally, the program prints the estimated error probability. Try increasing the sequence length to 1000 and observe the estimated error probability.

Plotting the observed data as a histogram for maximum likelihood estimation (MLE)
Figure 1: Maximum Likelihood Estimation (MLE) : Plotting the observed data as a histogram

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author

Wireless Communication Systems in Matlab
Wireless Communication Systems in Matlab
Second Edition(PDF)

PoorBelow averageAverageGoodExcellent (162 votes, average: 3.78 out of 5)

Digital modulations using Python
Digital Modulations using Python
(PDF ebook)

PoorBelow averageAverageGoodExcellent (123 votes, average: 3.60 out of 5)

digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

PoorBelow averageAverageGoodExcellent (126 votes, average: 3.70 out of 5)

Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Post your valuable comments !!!