Euclidean and Hamming distances

Key focus: Euclidean & Hamming distances are used to measure similarity or dissimilarity between two sequences. Used in Soft & Hard decision decoding.

Distance is a measure that indicates either similarity or dissimilarity between two words. Given a pair of words a=(a0,a1, … ,an-1) and b=(b0,b1,…,bn-1) , there are variety of ways one can characterize the distance, d(a,b), between the two words. Euclidean and Hamming distances are familiar ones. Euclidean distance is extensively applied in analysis of convolutional codes and Trellis codes. Hamming distance is frequently encountered in the analysis of block codes.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Euclidean distance

The Euclidean distance between the two words is defined as

\[d_{Euclidean}(\mathbf{a},\mathbf{b}) = \sqrt{(a_0-b_0)^2+(a_1-b_1)^2 + \cdots + (a_{n-1}-b_{n-1})^2}\]

Soft decision decoding

In contrast to classical hard-decision decoders (see below) which operate on binary values, a soft-decision decoder directly processes the unquantized (or quantized in more than two levels in practice) samples at the output of the matched filter for each bit-period, thereby avoiding the loss of information.

If the outputs of the matched filter at each bit-period are unquantized or quantized in more than two levels, the demodulator is said to make soft-decisions. The process of decoding the soft-decision received sequence is called soft-decision decoding. Since the decoder uses the additional information contained in the multi-level quantized or unquantized received sequence, soft-decision decoding provides better performance compared to hard-decision decoding. For soft-decision decoding, metrics like likelihood function, Euclidean distance and correlation are used.

For illustration purposes, we consider the communication system model shown in Figure 1. A block encoder encodes the information blocks m=(m1,m2,…,mk) and generates the corresponding codeword vector c=(c1,c2,…,cn). The codewords are modulated and sent across an AWGN channel. The received signal is passed through a matched filter and the multi-level quantizer outputs the soft-decision vector r .

Soft-decision receiver model for decoding linear block codes for AWGN channel (Euclidean Hamming distances)
Figure 1: Soft-decision receiver model for decoding linear block codes for AWGN channel

The goal of a decoder is to produce an estimate of the information sequence m based on the received sequence r. Equivalently, the information sequence m and the codeword c has one-to-one correspondence, the decoder can also produce an estimate ĉ of the codeword c. If the codeword c was transmitted, a decoding error occurs if ĉ ≠ c.

For equi-probable codewords, The decoder that selects a codeword that maximizes the conditional probability P(r, c). This is called a maximum lihelihood decoder (MLD).

For an AWGN channel with two-sided power spectral density N0/2, the conditional probability is given by

\[P\left(\mathbf{r} | \mathbf{c}\right) = \frac{1}{\left(\pi N_0\right)^{-n/2}} exp \left\lbrace – \sum_{i=1}^{n} \left[r_i – s_i\right]^2 \right\rbrace\]

The sum Euclidean distance equationis the squared Euclidean distance between the received sequence r and the coded signal sequence s. We can note that the term Euclidean distance equation 2 is common for all codewords and n is a constant. This simplifies the MLD decoding rule where we select a codeword from the code dictionary that minimizes the Euclidean distance D(r, s)$.

Hamming distance

Hamming distance between two words a=(a0,a1, … ,an-1) and b=(b0,b1,…,bn-1) in Galois Field GF(2), is the number of coordinates in which the two blocks differ.

\[d_{Hamming} = d_H(\mathbf{a},\mathbf{b}) = \#\{j : a_j \neq b_j, j = 0,1,\cdots,n-1\}\]

For example, the Hamming distance between (0,0,0,1) and (1,0,1,0) in GF(2) is 3, since they differ in three digits. For an independent and identically distributed (i.i.d) error model with (discrete) uniform error amplitude distribution, the most appropriate measure is Hamming distance.

Minimum distance

The minimum distance of block code C, is the smallest distance between all distance pairs of code words in C. The minimum distance of a block code determines both its error-detecting ability and error-correcting ability. A large minimum distance guarantees reliability against random errors. The general relationship between a block code’s minimum distance and the error-detecting and error correcting capability is as follows.

● If dmin is the minimum Hamming distance of a block code, the code is guaranteed to detect up to e=dmin-1 errors. Consequently, let c1 and c2 be the two closest codewords in the codeword dictionary C. If c1 was transmitted and c2 is received, the error is undetectable.

● If dmin is the minimum Hamming distance of a block code and if the optimal decoding procedure of nearest-neighbor decoding is used at the receiver, the code is guaranteed to correct up to t=(dmin-1 )/2 errors.

Sub-optimal hard decision decoding

In soft-decision decoding, the bit samples to the decoder are either unquantized or quantized to multi-levels and the maximum likelihood decoder (MLD) needs to compute M correlation metrics, where M is the number of codewords in the codeword dictionary. Although this provides the best performance, the computational complexity of the decoder increases when the number of codewords M becomes large. To reduce the computational burden, the output of the matched filter at each bit-period can be quantized to only two levels, denoted as 0 and 1, that results in a hard-decision binary sequence. Then, the decoder processes this hard-decision sequence based on a specific decoding algorithm. This type of decoding, illustrated in Figure 1, is called hard-decision decoding.

Hard-decision receiver model for decoding linear block codes for AWGN channel (hamming distance)
Figure 2: Hard-decision receiver model for decoding linear block codes for AWGN channel

The hard-decision decoding methods use Hamming distance metric to decode the hard-decision received sequence to the closest codeword. The objective of such decoding methods is to choose a codeword that provides the minimum Hamming distance with respect to the hard-decision received sequence. Since the hard-decision samples are only quantized to two levels, resulting in loss of information, the hard-decision decoding results in performance degradation when compared to soft-decision decoding.

Decoding using standard array and syndrome decoding are popular hard-decision decoding methods encountered in practice.

Rate this article: PoorBelow averageAverageGoodExcellent (5 votes, average: 4.40 out of 5)

For further reading

[1] I. Dokmanic, R. Parhizkar, J. Ranieri and M. Vetterli, “Euclidean Distance Matrices: Essential theory, algorithms, and applications,” in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 12-30, Nov. 2015, doi: 10.1109/MSP.2015.2398954.↗

Books by the author

Wireless Communication Systems in Matlab
Wireless Communication Systems in Matlab
Second Edition(PDF)

PoorBelow averageAverageGoodExcellent (169 votes, average: 3.70 out of 5)

Digital modulations using Python
Digital Modulations using Python
(PDF ebook)

PoorBelow averageAverageGoodExcellent (125 votes, average: 3.61 out of 5)

digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

PoorBelow averageAverageGoodExcellent (130 votes, average: 3.68 out of 5)

Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Post your valuable comments !!!