Euclidean and Hamming distances

Key focus: Euclidean & Hamming distances are used to measure similarity or dissimilarity between two sequences. Used in Soft & Hard decision decoding.

Distance is a measure that indicates either similarity or dissimilarity between two words. Given a pair of words a=(a0,a1, … ,an-1) and b=(b0,b1,…,bn-1) , there are variety of ways one can characterize the distance, d(a,b), between the two words. Euclidean and Hamming distances are familiar ones. Euclidean distance is extensively applied in analysis of convolutional codes and Trellis codes. Hamming distance is frequently encountered in the analysis of block codes.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Euclidean distance

The Euclidean distance between the two words is defined as

\[d_{Euclidean}(\mathbf{a},\mathbf{b}) = \sqrt{(a_0-b_0)^2+(a_1-b_1)^2 + \cdots + (a_{n-1}-b_{n-1})^2}\]

Soft decision decoding

In contrast to classical hard-decision decoders (see below) which operate on binary values, a soft-decision decoder directly processes the unquantized (or quantized in more than two levels in practice) samples at the output of the matched filter for each bit-period, thereby avoiding the loss of information.

If the outputs of the matched filter at each bit-period are unquantized or quantized in more than two levels, the demodulator is said to make soft-decisions. The process of decoding the soft-decision received sequence is called soft-decision decoding. Since the decoder uses the additional information contained in the multi-level quantized or unquantized received sequence, soft-decision decoding provides better performance compared to hard-decision decoding. For soft-decision decoding, metrics like likelihood function, Euclidean distance and correlation are used.

For illustration purposes, we consider the communication system model shown in Figure 1. A block encoder encodes the information blocks m=(m1,m2,…,mk) and generates the corresponding codeword vector c=(c1,c2,…,cn). The codewords are modulated and sent across an AWGN channel. The received signal is passed through a matched filter and the multi-level quantizer outputs the soft-decision vector r .

Soft-decision receiver model for decoding linear block codes for AWGN channel (Euclidean Hamming distances)
Figure 1: Soft-decision receiver model for decoding linear block codes for AWGN channel

The goal of a decoder is to produce an estimate of the information sequence m based on the received sequence r. Equivalently, the information sequence m and the codeword c has one-to-one correspondence, the decoder can also produce an estimate ĉ of the codeword c. If the codeword c was transmitted, a decoding error occurs if ĉ ≠ c.

For equi-probable codewords, The decoder that selects a codeword that maximizes the conditional probability P(r, c). This is called a maximum lihelihood decoder (MLD).

For an AWGN channel with two-sided power spectral density N0/2, the conditional probability is given by

\[P\left(\mathbf{r} | \mathbf{c}\right) = \frac{1}{\left(\pi N_0\right)^{-n/2}} exp \left\lbrace – \sum_{i=1}^{n} \left[r_i – s_i\right]^2 \right\rbrace\]

The sum is the squared Euclidean distance between the received sequence r and the coded signal sequence s. We can note that the term is common for all codewords and n is a constant. This simplifies the MLD decoding rule where we select a codeword from the code dictionary that minimizes the Euclidean distance D(r, s)$.

Hamming distance

Hamming distance between two words a=(a0,a1, … ,an-1) and b=(b0,b1,…,bn-1) in Galois Field GF(2), is the number of coordinates in which the two blocks differ.

\[d_{Hamming} = d_H(\mathbf{a},\mathbf{b}) = \#\{j : a_j \neq b_j, j = 0,1,\cdots,n-1\}\]

For example, the Hamming distance between (0,0,0,1) and (1,0,1,0) in GF(2) is 3, since they differ in three digits. For an independent and identically distributed (i.i.d) error model with (discrete) uniform error amplitude distribution, the most appropriate measure is Hamming distance.

Minimum distance

The minimum distance of block code C, is the smallest distance between all distance pairs of code words in C. The minimum distance of a block code determines both its error-detecting ability and error-correcting ability. A large minimum distance guarantees reliability against random errors. The general relationship between a block code’s minimum distance and the error-detecting and error correcting capability is as follows.

● If dmin is the minimum Hamming distance of a block code, the code is guaranteed to detect up to e=dmin-1 errors. Consequently, let c1 and c2 be the two closest codewords in the codeword dictionary C. If c1 was transmitted and c2 is received, the error is undetectable.

● If dmin is the minimum Hamming distance of a block code and if the optimal decoding procedure of nearest-neighbor decoding is used at the receiver, the code is guaranteed to correct up to t=(dmin-1 )/2 errors.

Sub-optimal hard decision decoding

In soft-decision decoding, the bit samples to the decoder are either unquantized or quantized to multi-levels and the maximum likelihood decoder (MLD) needs to compute M correlation metrics, where M is the number of codewords in the codeword dictionary. Although this provides the best performance, the computational complexity of the decoder increases when the number of codewords M becomes large. To reduce the computational burden, the output of the matched filter at each bit-period can be quantized to only two levels, denoted as 0 and 1, that results in a hard-decision binary sequence. Then, the decoder processes this hard-decision sequence based on a specific decoding algorithm. This type of decoding, illustrated in Figure 1, is called hard-decision decoding.

Figure 2: Hard-decision receiver model for decoding linear block codes for AWGN channel

The hard-decision decoding methods use Hamming distance metric to decode the hard-decision received sequence to the closest codeword. The objective of such decoding methods is to choose a codeword that provides the minimum Hamming distance with respect to the hard-decision received sequence. Since the hard-decision samples are only quantized to two levels, resulting in loss of information, the hard-decision decoding results in performance degradation when compared to soft-decision decoding.

Decoding using standard array and syndrome decoding are popular hard-decision decoding methods encountered in practice.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] I. Dokmanic, R. Parhizkar, J. Ranieri and M. Vetterli, “Euclidean Distance Matrices: Essential theory, algorithms, and applications,” in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 12-30, Nov. 2015, doi: 10.1109/MSP.2015.2398954.↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Maximum Likelihood Decoding

Keywords: maximum likelihood decoding, digital communication, data storage, noise, interference, wireless communication systems, optical communication systems, digital storage systems, probability, likelihood estimation, python

Introduction

Maximum likelihood decoding is a technique used to determine the most likely transmitted message in a digital communication system, based on the received signal and statistical models of noise and interference. The method uses maximum likelihood estimation to calculate the probability of each possible transmitted message and then selects the one with the highest probability.

To perform maximum likelihood decoding, the receiver uses a set of pre-defined models to estimate the likelihood of each possible transmitted message based on the received signal. The method is commonly used in various digital communication and data storage systems, such as wireless communication and digital storage. However, it can be complex and time-consuming, particularly in systems with large message spaces or complex noise and interference models.

Maximum Likelihood Decoding:

Consider a set of possible codewords (valid codewords – set \(Y\)) generated by an encoder in the transmitter side. We pick one codeword out of this set ( call it \(y\) ) and transmit it via a Binary Symmetric Channel (BSC) with probability of error \(p\) ( To know what is a BSC – click here ). At the receiver side we receive the distorted version of \(y\) ( call this erroneous codeword \(x\)).

Maximum Likelihood Decoding chooses one codeword from \(Y\) (the list of all possible codewords) which maximizes the following probability.

\[\mathbb{P}(y\;sent\mid x\;received )\]

Meaning that the receiver computes \(P(y_1,x) , P(y_2,x) , P(y_3,x),\cdots,P(y_n,x)\). and chooses a codeword (\(y\)) which gives the maximum probability.  In practice we don’t know \(y\) (at the receiver) but we know \(x\). So how to compute the probability ? Maximum Likelihood Estimation (MLE) comes to our rescue. For a detailed explanation on MLE – refer here[1] The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. Understanding the difference between prediction and estimation is important at this point.   Estimation differs from prediction in the following way … In estimation problems, likelihood of the parameters is estimated based on given data/observation vector. In prediction problems, probability is used as a measure to predict the outcome from known parameters of a model.

Examples for “Prediction” and “Estimation” :

1) Probability of getting a “Head” in a single toss of a fair coin is \(0.5\). The coin is tossed 100 times in a row.Prediction helps in predicting the outcome ( head or tail ) of the \(101^{th}\) toss based on the probability.

2) A coin is tossed 100 times and the data ( head or tail information) is recorded. Assuming the event follows Binomial distribution model, estimation helps in determining the probability of the event. The actual probability may or may not be \(0.5\).   Maximum Likelihood Estimation estimates the conditional probability based on the observed data ( received data – \(x\)) and an assumed model.

Example of Maximum Likelihood Decoding:

Let \(y=11001001\) and \(x=10011001\) . Assuming Binomial distribution model for the event with probability of error \(0.1\) (i.e the reliability of the BSC is \(1-p = 0.9\)), the Hamming distance between codewords is \(2\) . For binomial model,

\[\mathbb{P}(y\;received\mid x\;sent ) = (1-p)^{n-d}.p^{d}\]

where \(d\) =the hamming distance between the received and the sent codewords n= number of bit sent
\(p\)= error probability of the BSC.
\(1-p\) = reliability of BSC

Substituting \(d=2, n=8\) and \(p=0.1\) , then \(P(y\;received \mid x\;sent) = 0.005314\).

Note : Here, Hamming distance is used to compute the probability. So the decoding can be called as “minimum distance decoding” (which minimizes the Hamming distance) or “maximum likelihood decoding”. Euclidean distance may also be used to compute the conditional probability.

As mentioned earlier, in practice \(y\) is not known at the receiver. Lets see how to estimate \(P(y \;received \mid x\; sent)\) when \(y\) is unknown based on the binomial model.

Since the receiver is unaware of the particular \(y\) corresponding to the \(x\) received, the receiver computes \(P(y\; received \mid x\; sent)\) for each codeword in \(Y\). The \(y\) which gives the maximum probability is concluded as the codeword that was sent.

Python code implementing Maximum Likelihood Decoding:

The following program for demonstrating the maximum likelihood decoding, involves generating a noisy signal from a transmitted message and then using maximum likelihood decoding to estimate the transmitted message from the noisy signal.

  1. The maximum_likelihood_decoding function takes three arguments: received_signal, noise_variance, and message_space.
  2. The calculate_probabilities function is called to calculate the probability of each possible message given the received signal, using the known noise variance.
  3. The probabilities are normalized so that they sum to 1.
  4. The maximum_likelihood_decoding function finds the index of the most likely message (i.e., the message with the highest probability).
  5. The function returns the most likely message.
  6. An example usage is demonstrated where a binary message space is defined ([0, 1]), along with a noise variance and a transmitted message.
  7. The transmitted message is added to noise to generate a noisy received signal.
  8. The maximum_likelihood_decoding function is called to decode the noisy signal.
  9. The transmitted message, received signal, and decoded message are printed to the console for evaluation.
import numpy as np
import matplotlib.pyplot as plt

# Define a function to calculate the probability of each possible message given the received signal
def calculate_probabilities(received_signal, noise_variance, message_space):
    probabilities = np.zeros(len(message_space))

    for i, message in enumerate(message_space):
        error = received_signal - message
        probabilities[i] = np.exp(-np.sum(error ** 2) / (2 * noise_variance))

    return probabilities / np.sum(probabilities)

# Define a function to perform maximum likelihood decoding
def maximum_likelihood_decoding(received_signal, noise_variance, message_space):
    probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
    most_likely_message_index = np.argmax(probabilities)
    return message_space[most_likely_message_index]

# Example usage
message_space = np.array([0, 1])
noise_variance = 0.4
transmitted_message = 1
received_signal = transmitted_message + np.sqrt(noise_variance) * np.random.randn()
decoded_message = maximum_likelihood_decoding(received_signal, noise_variance, message_space)

print('Transmitted message:', transmitted_message)
print('Received signal:', received_signal)
print('Decoded message:', decoded_message)

# Plot probability distribution
probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
plt.bar(message_space, probabilities)
plt.title('Probability Distribution for Received Signal = {}'.format(received_signal))
plt.xlabel('Transmitted Message')
plt.ylabel('Probability')
plt.ylim([0, 1])
plt.show()

The probability of the received signal given a specific transmitted message is calculated as follows:

  1. Compute the difference between the received signal and the transmitted message.
  2. Compute the sum of squares of this difference vector.
  3. Divide this sum by twice the known noise variance.
  4. Take the negative exponential of this value.

This results in a probability density function (PDF) for the received signal given the transmitted message, assuming that the noise is Gaussian and zero-mean.

The probabilities for each possible transmitted message are then normalized so that they sum to 1. This is done by dividing each individual probability by the sum of all probabilities.

The maximum_likelihood_decoding function determines the most likely transmitted message by selecting the message with the highest probability, which corresponds to the maximum likelihood estimate of the transmitted message given the received signal and the statistical model of the noise.

Sample outputs

Transmitted message: 1
Received signal: 0.21798306949364643
Decoded message: 0

Transmitted message: 1
Received signal: -0.5115453787966966
Decoded message: 0

Transmitted message: 1
Received signal: 0.8343088336355061
Decoded message: 1

Transmitted message: 1
Received signal: -0.5479891887158619
Decoded message: 0

The probability distribution for the last sample output is shown below

Figure: Probability distribution for a sample run of the code

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Hard and Soft decision decoding

What are hard and soft decision decoding

Hard decision decoding and soft decision decoding are two different methods used for decoding error-correcting codes.

With hard decision decoding, the received signal is compared to a set threshold value to determine whether the transmitted bit is a 0 or a 1. This is commonly used in digital communication systems that experience noise or interference, resulting in a low signal-to-noise ratio.

Soft decision decoding, on the other hand, treats the received signal as a probability distribution and calculates the likelihood of each possible transmitted bit based on the characteristics of the received signal. This approach is often used in modern digital communication and data storage systems where the signal-to-noise ratio is relatively high and there is a need for higher accuracy and reliability.

While soft decision decoding can achieve better error correction, it is more complex and computationally expensive than hard decision decoding.

More details

Let’s expatiate on the concepts of hard decision and soft decision decoding. Consider a simple even parity encoder given below.

Input Bit 1
Input Bit 2
Parity bit added by encoder
Codeword Generated
0
0
0
000
0
1
1
011
1
0
1
101
1
1
0
110

The set of all possible codewords generated by the encoder are 000,011,101 and 110.

Lets say we are want to transmit the message “01” through a communication channel.

Hard decision decoding

Case 1 : Assume that our communication model consists of a parity encoder, communication channel (attenuates the data randomly) and a hard decision decoder

The message bits “01” are applied to the parity encoder and we get “011” as the output codeword.

Figure 1: Hard decision decoding – a simple illustration

The output codeword “011” is transmitted through the channel. “0” is transmitted as “0 Volt and “1” as “1 Volt”. The channel attenuates the signal that is being transmitted and the receiver sees a distorted waveform ( “Red color waveform”). The hard decision decoder makes a decision based on the threshold voltage. In our case the threshold voltage is chosen as 0.5 Volt ( midway between “0” and “1” Volt ) . At each sampling instant in the receiver (as shown in the figure above) the hard decision detector determines the state of the bit to be “0” if the voltage level falls below the threshold and “1” if the voltage level is above the threshold. Therefore, the output of the hard decision block is “001”. Perhaps this “001” output is not a valid codeword ( compare this with the all possible codewords given in the table above) , which implies that the message bits cannot be recovered properly. The decoder compares the output of the hard decision block with the all possible codewords and computes the minimum Hamming distance for each case (as illustrated in the table below).

All possible Codewords
Hard decision output
Hamming distance
000
001
1
011
001
1
101
001
1
110
001
3

The decoder’s job is to choose a valid codeword which has the minimum Hamming distance. In our case, the minimum Hamming distance is “1” and there are 3 valid codewords with this distance. The decoder may choose any of the three possibility and the probability of getting the correct codeword (“001” – this is what we transmitted) is always 1/3. So when the hard decision decoding is employed the probability of recovering our data ( in this particular case) is 1/3. Lets see what “Soft decision decoding” offers …

Soft Decision Decoding

The difference between hard and soft decision decoder is as follows

  • In Hard decision decoding, the received codeword is compared with the all possible codewords and the codeword which gives the minimum Hamming distance is selected
  • In Soft decision decoding, the received codeword is compared with the all possible codewords and the codeword which gives the minimum Euclidean distance is selected. Thus the soft decision decoding improves the decision making process by supplying additional reliability information ( calculated Euclidean distance or calculated log-likelihood ratio)

For the same encoder and channel combination lets see the effect of replacing the hard decision block with a soft decision block.

Voltage levels of the received signal at each sampling instant are shown in the figure. The soft decision block calculates the Euclidean distance between the received signal and the all possible codewords.

Valid codewords
Voltage levels at each sampling instant of received waveform
Euclidean distance calculation
Euclidean distance
0 0 0
( 0V 0V 0V )
0.2V 0.4V 0.7V
(0-0.2)2+ (0-0.4)2+ (0-0.7)2
0.69
0 1 1
( 0V 1V 1V )
0.2V 0.4V 0.7V
(0-0.2)2+ (1-0.4)2+ (1-0.7)2
0.49
1 0 1
( 1V 0V 1V )
0.2V 0.4V 0.7V
(1-0.2)2+ (0-0.4)2+ (1-0.7)2
0.89
1 1 0
( 1V 1V 0V )
 
0.2V 0.4V 0.7V
(1-0.2)2+ (1-0.4)2+ (0-0.7)2
1.49

The minimum Euclidean distance is “0.49” corresponding to “0 1 1” codeword (which is what we transmitted). The decoder selects this codeword as the output. Even though the parity encoder cannot correct errors, the soft decision scheme helped in recovering the data in this case. This fact delineates the improvement that will be seen when this soft decision scheme is used in combination with forward error correcting (FEC) schemes like convolution codes , LDPC etc

From this illustration we can understand that the soft decision decoders uses all of the information ( voltage levels in this case) in the process of decision making whereas the hard decision decoders does not fully utilize the information available in the received signal (evident from calculating Hamming distance just by comparing the signal level with the threshold whereby neglecting the actual voltage levels).

Note: This is just to illustrate the concept of Soft decision and Hard decision decoding. Prudent souls will be quick enough to find that the parity code example will fail for other voltage levels (e.g. : 0.2V , 0.4 V and 0.6V) . This is because the parity encoders are not capable of correcting errors but are capable of detecting single bit errors.

Soft decision decoding scheme is often realized using Viterbi decoders. Such decoders utilize Soft Output Viterbi Algorithm (SOVA) which takes into account the apriori probabilities of the input symbols producing a soft output indicating the reliability of the decision.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] I. Dokmanic, R. Parhizkar, J. Ranieri and M. Vetterli, “Euclidean Distance Matrices: Essential theory, algorithms, and applications,” in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 12-30, Nov. 2015, doi: 10.1109/MSP.2015.2398954.↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing