Having covered the techniques of hard and soft decision decoding , its time to illustrate the most important concept of Maximum Likelihood Decoding.

### Maximum Likelihood Decoding:

Consider a set of possible codewords (valid codewords – set ‘Y’) generated by an encoder in the transmitter side. We pick one codeword out of this set ( call it ‘y’ ) and transmit it via a Binary Symmetric Channel (BSC) with probability of error p ( To know what is a BSC - click here ). At the receiver side we receive the distorted version of ‘y’ ( call this erroneous codeword ‘x’).

Maximum Likelihood Decoding chooses one codeword from ‘Y’ (the list of all possible codewords) which maximizes the following probability.

$\mathbb{P}(y\:sent\mid x\:received )$

Meaning that the receiver computes P(y1,x) , P(y2,x) , P(y3,x),…,P(yn,x). and chooses a codeword (y) which gives the maximum probability.

In practice we don’t know Y (at the receiver) but we know ‘x’. So how to compute the probability ? Maximum Likelihood Estimation (MLE) comes to our rescue. For a detailed explanation on MLE – refer here[1]
The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. Understanding the difference between prediction and estimation is important at this point.

Estimation differs from prediction in the following way … In estimation problems, likelihood of the parameters is estimated based on given data/observation vector. In prediction problems, probability is used as a measure to predict the outcome from known parameters of a model.

### Examples for “Prediction” and “Estimation” :

1) Probability of getting a “Head” in a single toss of a fair coin is 0.5. The coin is tossed 100 times in a row.Prediction helps in predicting the outcome ( head or tail ) of the 101th toss based on the probability.
2) A coin is tossed 100 times and the data ( head or tail information) is recorded. Assuming the event follows Binomial distribution model, estimation helps in determining the probability of the event. The actual probability may or may not be 0.5.

Maximum Likelihood Estimation estimates the conditional probability based on the observed data ( received data – x) and an assumed model.

### Example of Maximum Likelihood Decoding:

Let y=11001001 and x=10011001 . Assuming Binomial distribution model for the event with probability of error 0.1 (i.e the reliability of the BSC is 1-p = 0.9), the distance between codewords is y-x = 2 . For binomial model,

$\mathbb{P}(y\:received\mid x\:sent ) = (1-p)^{n-d}.p^{d}$

where d=the hamming distance between the received and the sent codewords

n= number of bit sent

p= error probability of the BSC.
1-p = reliability of BSC

Substituting d=2, n=8 and p=0.1 , then P(y received | x sent) = 0.005314.

Note : Here, Hamming distance is used to compute the probability. So the decoding can be called as “minimum distance decoding” (which minimizes the Hamming distance) or “maximum likelihood decoding”. Euclidean distance may also be used to compute the conditional probability.

As mentioned earlier, in practice y is not known at the receiver. Lets see how to estimate P(y received | x sent) when y is unknown based on the binomial model.

Since the receiver is unaware of the particular y corresponding to the x received, the receiver computes P(y received | x sent) for each codeword in Y. The “y” which gives the maximum probability is concluded as the codeword that was sent.