Convolutional Codes – sliding window and shift register

Key focus : Convolutional codes: Error-correcting codes effective in noisy channels and burst error correction. Know about sliding window & shift registers and how they are used in a convolutional encoder.

Linear block codes

Linear block codes, like Hamming codes, are mainly employed in hard decision decoding, utilizing the inherent algebraic structure of the code based on finite field properties.

Hard decision decoding of these codes creates a model resembling a binary symmetric channel. This model includes the essential components: the binary modulator, waveform channel, and optimal binary detector.

The decoder’s task for these codes is to find the codeword with the shortest Hamming distance from the binary symmetric channel’s output. When designing effective linear block codes, the primary aim is to discover the code with the highest minimum distance, considering the specific values of \(n\) and \(k\), where \(n\) is the number of output bits and \(k\) is the number of input bits.

As we saw, linear block codes are easily described in terms of algebraic equations. Convolutional codes are another class of error-correcting codes that are described using graphs and trellises.

Convolutional codes

Convolutional codes are a type of error-correcting code used in digital communication and data storage systems to detect and correct errors that may occur during transmission or storage. They are particularly well-suited for channels that introduce burst errors, such as noisy communication channels or storage media with localized errors.

Unlike the linear blocks, where the message bits and parity bits are transmitted, in the convolutional coding scheme, we do not send the message bits but only send the parity bits. Which message bits participate in generation of the parity bits is determined by a sliding window.

Convolutional codes are defined by a set of shift registers and associated feedback connections. The fundamental principle involves encoding input data via a sliding bit window (Figure 1), utilizing bitwise operations to generate the encoded result. The encoded bits are the parity bits. The encoded bits are a linear combination of the current input bits and previous input bits in the shift registers.

Let’s see some definitions with respect to convolutional codes:

Rate: Convolutional codes are often described by their rate, which is the ratio of the number of input bits to the number of output bits (encoded bits). A common notation for convolutional codes is given as \(n,k,m\) where \(n\) is the number of output bits (equivalently the number of generator polynomials), \(k\) is the number of input bits (equivalently the number of bits shifted by the shifter register at a time), and \(m\) is the number of shift registers. The code rate is given by \(R_c = k/n\).

Constraint Length: The constraint length of a convolutional code is the number of shift registers in the encoder. It determines the memory of the code and affects its error-correction capabilities. Larger the constraint length, greater is the redundancy and it may provide better error correction capabilities.

Generator Polynomials: The feedback connections and the bitwise operations performed on the shift registers are defined by generator polynomials. These polynomials determine the encoding process and the redundancy (parity bits) introduced into the data. Code rate is simply calculated as \(R_c = k/n\), where \(n\) is the number of generator polynomials in the encoder structure and \(k\) is the number of input bits at a time. More generator polynomials may improve the error correction capability of the code, it effectively decreases the code rate (more overhead).

Encoding: Convolutional encoding involves shifting the input bits through the shift registers and applying bitwise operations based on the generator polynomials. The encoder bits are just parity bits. In the convolutional encoding scheme, we only transmit the parity bits and not the message itself. If we use multiple generator polynomials, we interleave the output bits of each generate and serialize them when transmitting.

Decoding: At the receiver end, a decoding algorithm (such as the Viterbi algorithm) is used to analyze the received bits and recover the original data. The decoding process tries to find the most likely sequence of input bits given the received encoded bits and the code’s properties. We will come to this process in a later post.

Figure 1 shows a \(n=2, k=1,m=3\) convolutional encoder. The symbol D represents the delay as the input bits are processed through the shift registers. The generator polynomial is specified as \(g_0 = [1,1,1]\) and \(g_1=[1,0,1]\). The generator polynomials connect the appropriate cells in the shift register and performs XOR (modulo-2) operation on the bits stored in the shift register cells. This particular convolutional code generates 2 output bits for every input bit processed. The two output bit streams are designated as \(c_0\) and \(c_1\).

Since there are two generator polynomials, the code rate is \(R_c = 1/2\)

Given an input message \(x[n]\) the equation for generating the parity bit streams is given by

\[\begin{align} c_0[n] &= x[n] + x[n − 1] + x[n − 2] & \quad \quad (modulo-2) \\ c_1[n] &= x[n] + x[n − 2] & \quad \quad (modulo-2) \end{align}\]
n=2, k=1, m=3 Convolutional encoder
Figure 1: \(n=2, k=1, m=3\) Convolutional encoder

Figure 2 shows the sliding window representation of the shift register arrangement shown in Figure 1. Figure 2 shows a 3-bit (\(m\)) sliding window shift register that advances 1-bit (\(k\)) at a time. The number of output bits (\(n\)) depend on how these bits are processes using the generator polynomials.

Figure 2: Generating convolutional codes using sliding window operation and shift register

Sample encoding operation

Figure 2 visualizes the encoding process for a \(n=2, k=1, m=3\) convolutional code. The generator polynomial is specified as \(g_0 =[1,1,1]\) and \(g_1 = [1, 0, 1]\). The generator polynomials connect the appropriate cells in the shift register and performs XOR (modulo-2) operation on the bits in the shift register cells.

These two generator polynomials generate the corresponding output bits \(c_0\) and \(c_1\). The input message to encode is \([1,0,1,1,0,0]\). The output generated by this particular convolutional encoder is \([11,10,00,01,01,11]\).

Figure 2: Encoding process for convolutional codes

Encoder state machine

As illustrated in Figure 2, the constraint length \(m\) determines the number of memory cells in the shift register structure. The input bit and the \(m-1\) bits of current state determine the encoder state on the next clock cycle.

Because the encoder’s output relies on both its input and current state, state diagram is a more concise illustration than the trellis. Essentially, the state diagram represents a graphical depiction of the encoder’s potential states and the potential transition between these states.

From the state diagram, we can immediately recognize the fact : convolutional encoder is indeed a Markov chain.

To know more about Markov chain refer here. Implementing a Markov chain in python is discussed here.

Figure 4: Encoder state machine for convolutional code shown in Figure 1

Figure 4 depicts the state diagram for the convolutional encoder described in Figure 1. For a convolutional encoder with constraint length m, there will be \(2^{m-1}\) possible states. For this specific case, there are \(2^{3-1} = 4\) possible states.

The current state is represented as \(x[n-1]\) and \(x[n-2]\) (corresponding to the values of the memory cells \(D_1\) and \(D_2\). The output of the encoder is determined by the current input bit \(x[n]\) and the current state. Since there are two generator polynomials \(g_0\) and \(g_1\), they generate two outputs \(c_0[n]\) and \(c_1[n]\) respectively. For an input bit \(x[n]\), the encoder transits from one state to the next state, it emits \(c_0[n]\) and \(c_1[n]\) as output. This is depicted as \(x[n]/c_0[n]c_1[n]\).

Figure 5, shows the encoding process when the input message is \([1, 0, 1, 1, 0, 0]\). The encoder transits through various states and generates the output \([11, 10, 00, 01, 01, 11]\).

Figure 5: Encoder state machine processing the input message [1,0,1,1,0,0]

Euclidean and Hamming distances

Key focus: Euclidean & Hamming distances are used to measure similarity or dissimilarity between two sequences. Used in Soft & Hard decision decoding.

Distance is a measure that indicates either similarity or dissimilarity between two words. Given a pair of words a=(a0,a1, … ,an-1) and b=(b0,b1,…,bn-1) , there are variety of ways one can characterize the distance, d(a,b), between the two words. Euclidean and Hamming distances are familiar ones. Euclidean distance is extensively applied in analysis of convolutional codes and Trellis codes. Hamming distance is frequently encountered in the analysis of block codes.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Euclidean distance

The Euclidean distance between the two words is defined as

\[d_{Euclidean}(\mathbf{a},\mathbf{b}) = \sqrt{(a_0-b_0)^2+(a_1-b_1)^2 + \cdots + (a_{n-1}-b_{n-1})^2}\]

Soft decision decoding

In contrast to classical hard-decision decoders (see below) which operate on binary values, a soft-decision decoder directly processes the unquantized (or quantized in more than two levels in practice) samples at the output of the matched filter for each bit-period, thereby avoiding the loss of information.

If the outputs of the matched filter at each bit-period are unquantized or quantized in more than two levels, the demodulator is said to make soft-decisions. The process of decoding the soft-decision received sequence is called soft-decision decoding. Since the decoder uses the additional information contained in the multi-level quantized or unquantized received sequence, soft-decision decoding provides better performance compared to hard-decision decoding. For soft-decision decoding, metrics like likelihood function, Euclidean distance and correlation are used.

For illustration purposes, we consider the communication system model shown in Figure 1. A block encoder encodes the information blocks m=(m1,m2,…,mk) and generates the corresponding codeword vector c=(c1,c2,…,cn). The codewords are modulated and sent across an AWGN channel. The received signal is passed through a matched filter and the multi-level quantizer outputs the soft-decision vector r .

Figure 1: Soft-decision receiver model for decoding linear block codes for AWGN channel

The goal of a decoder is to produce an estimate of the information sequence m based on the received sequence r. Equivalently, the information sequence m and the codeword c has one-to-one correspondence, the decoder can also produce an estimate ĉ of the codeword c. If the codeword c was transmitted, a decoding error occurs if ĉ ≠ c.

For equi-probable codewords, The decoder that selects a codeword that maximizes the conditional probability P(r, c). This is called a maximum lihelihood decoder (MLD).

For an AWGN channel with two-sided power spectral density N0/2, the conditional probability is given by

\[P\left(\mathbf{r} | \mathbf{c}\right) = \frac{1}{\left(\pi N_0\right)^{-n/2}} exp \left\lbrace – \sum_{i=1}^{n} \left[r_i – s_i\right]^2 \right\rbrace\]

The sum is the squared Euclidean distance between the received sequence r and the coded signal sequence s. We can note that the term is common for all codewords and n is a constant. This simplifies the MLD decoding rule where we select a codeword from the code dictionary that minimizes the Euclidean distance D(r, s)$.

Hamming distance

Hamming distance between two words a=(a0,a1, … ,an-1) and b=(b0,b1,…,bn-1) in Galois Field GF(2), is the number of coordinates in which the two blocks differ.

\[d_{Hamming} = d_H(\mathbf{a},\mathbf{b}) = \#\{j : a_j \neq b_j, j = 0,1,\cdots,n-1\}\]

For example, the Hamming distance between (0,0,0,1) and (1,0,1,0) in GF(2) is 3, since they differ in three digits. For an independent and identically distributed (i.i.d) error model with (discrete) uniform error amplitude distribution, the most appropriate measure is Hamming distance.

Minimum distance

The minimum distance of block code C, is the smallest distance between all distance pairs of code words in C. The minimum distance of a block code determines both its error-detecting ability and error-correcting ability. A large minimum distance guarantees reliability against random errors. The general relationship between a block code’s minimum distance and the error-detecting and error correcting capability is as follows.

● If dmin is the minimum Hamming distance of a block code, the code is guaranteed to detect up to e=dmin-1 errors. Consequently, let c1 and c2 be the two closest codewords in the codeword dictionary C. If c1 was transmitted and c2 is received, the error is undetectable.

● If dmin is the minimum Hamming distance of a block code and if the optimal decoding procedure of nearest-neighbor decoding is used at the receiver, the code is guaranteed to correct up to t=(dmin-1 )/2 errors.

Sub-optimal hard decision decoding

In soft-decision decoding, the bit samples to the decoder are either unquantized or quantized to multi-levels and the maximum likelihood decoder (MLD) needs to compute M correlation metrics, where M is the number of codewords in the codeword dictionary. Although this provides the best performance, the computational complexity of the decoder increases when the number of codewords M becomes large. To reduce the computational burden, the output of the matched filter at each bit-period can be quantized to only two levels, denoted as 0 and 1, that results in a hard-decision binary sequence. Then, the decoder processes this hard-decision sequence based on a specific decoding algorithm. This type of decoding, illustrated in Figure 1, is called hard-decision decoding.

Figure 2: Hard-decision receiver model for decoding linear block codes for AWGN channel

The hard-decision decoding methods use Hamming distance metric to decode the hard-decision received sequence to the closest codeword. The objective of such decoding methods is to choose a codeword that provides the minimum Hamming distance with respect to the hard-decision received sequence. Since the hard-decision samples are only quantized to two levels, resulting in loss of information, the hard-decision decoding results in performance degradation when compared to soft-decision decoding.

Decoding using standard array and syndrome decoding are popular hard-decision decoding methods encountered in practice.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] I. Dokmanic, R. Parhizkar, J. Ranieri and M. Vetterli, “Euclidean Distance Matrices: Essential theory, algorithms, and applications,” in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 12-30, Nov. 2015, doi: 10.1109/MSP.2015.2398954.↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Hidden Markov Models (HMM) – Simplified !!!

Markov chains are useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This bring us to the following topic– the hidden Markov models.

Hidden Markov models enables us to visualize both observations and the associated hidden events. Let’s consider an example for understanding the concept.

The cheating casino and the gullible gambler

Consider a dishonest casino that deceives it player by using two types of die : a fair die (F) and a loaded die (L). For a fair die, each of the faces has the same probability of landing facing up. For the loaded die, the probabilities of the faces are skewed as given next

When the gambler throws the die, numbers land facing up. These are our observations at a given time t (denoted as Ot = {1,2,3,4,5,6}). At any given time t, whether these number are rolled from a fair die (state St = F) or a loaded die (St = L), is unknown to an observer and therefore they are the hidden events.

Emission probabilities

The probabilities associated with the observations are the observation likelihoods, also called emission probabilities (B).

Initial probabilities

The initial probability of starting (at time = 0) with any of fair die or loaded die (hidden event) is 50%.

Transition probabilities

A gullible gambler switches from the fair die to loaded die with 10% probability. He switches back from loaded die to fair die with 5% probability.

The probabilities of transitioning from one hidden event to another is described by the transition probability matrix (A). The elements of the probability transition matrix, are the transition probabilities (pij) of moving from one hidden state i to another hidden state j.

The transition probabilities from time t-1 to t, for the hidden events are

Therefore, the transition probability matrix is

Based on the given information so far, a probability model is constructed. This is the Hidden Markov Model (HMM) for the given problem.

Figure 1: Hidden Markov Model for the cheating Casino problem

Assumptions

We saw, in previous article, that the Markov models come with assumptions. Similarly, HMMs models also have such assumptions.

1. Assumption on probability of hidden states

In the model given here, the probability of a given hidden state depends only on the previous hidden state. This is a typical first order Markov chain assumption.

2. Assumption on Output

The probability of any observation (output) depends on the hidden state that produce it and not on any other hidden state or output observations.

Problems and Algorithms

Let’s briefly discuss the different problems and the related algorithms for HMMs. The algorithms will be explained in detail in the future articles.

In the dishonest casino, the gambler rolls the following numbers:

Figure 2: Sample Observations

1. Evaluation

Given the model of the dishonest casino, what is the probability of obtaining the above sequence ? This is a typical evaluation problem in HMMs. Forward algorithm is applied for such evaluation problems.

2. Decoding

What is the most likely sequence of die (hidden states) given the above sequence ? Such problems are addressed by Viterbi decoding.

What is the probability of fourth die being loaded, given the above sequence ? Forward-backward algorithm to our rescue.

3. Learning

Learning problems involve parametrization of the model. In learning problems, we attempt to find the various parameters (transition probabilities, emission probabilities) of the HMM, given the observation. Baum-Welch algorithm helps us to find the unknown parameters of a HMM.

Some real-life examples

Here are some real-life examples of HMM applications:

  1. Speech recognition: HMMs are widely used in speech recognition systems to model the variability of speech sounds. In this application, the observable events are the acoustic features of the speech signal, while the hidden states represent the phonemes or words that generate the speech signal.
  2. Handwriting recognition: HMMs can be used to recognize handwritten characters by modeling the temporal variability of the pen strokes. In this application, the observable events are the coordinates of the pen on the writing surface, while the hidden states represent the letters or symbols that generate the handwriting.
  3. Stock price prediction: HMMs can be used to model the behavior of stock prices and predict future price movements. In this application, the observable events are the daily price movements, while the hidden states represent the different market conditions that generate the price movements.
  4. Gene prediction: HMMs can be used to identify genes in DNA sequences. In this application, the observable events are the nucleotides in the DNA sequence, while the hidden states represent the different regions of the genome that generate the sequence.
  5. Natural language processing: HMMs are used in many natural language processing tasks, such as part-of-speech tagging and named entity recognition. In these applications, the observable events are the words in the text, while the hidden states represent the grammatical structures or semantic categories that generate the text.
  6. Image and video analysis: HMMs can be used to analyze images and videos, such as for object recognition and tracking. In this application, the observable events are the pixel values in the image or video, while the hidden states represent the object or motion that generates the pixel values.
  7. Bio-signal analysis: HMMs can be used to analyze physiological signals, such as electroencephalograms (EEGs) and electrocardiograms (ECGs). In this application, the observable events are the signal measurements, while the hidden states represent the physiological states that generate the signal.
  8. Radar signal processing: HMMs can be used to process radar signals and detect targets in noisy environments. In this application, the observable events are the radar measurements, while the hidden states represent the presence or absence of targets.

Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Similar topics

Essentials of Signal Processing
● Generating standard test signals
 □ Sinusoidal signals
 □ Square wave
 □ Rectangular pulse
 □ Gaussian pulse
 □ Chirp signal
Interpreting FFT results - complex DFT, frequency bins and FFTShift
 □ Real and complex DFT
 □ Fast Fourier Transform (FFT)
 □ Interpreting the FFT results
 □ FFTShift
 □ IFFTShift
Obtaining magnitude and phase information from FFT
 □ Discrete-time domain representation
 □ Representing the signal in frequency domain using FFT
 □ Reconstructing the time domain signal from the frequency domain samples
● Power spectral density
Power and energy of a signal
 □ Energy of a signal
 □ Power of a signal
 □ Classification of signals
 □ Computation of power of a signal - simulation and verification
Polynomials, convolution and Toeplitz matrices
 □ Polynomial functions
 □ Representing single variable polynomial functions
 □ Multiplication of polynomials and linear convolution
 □ Toeplitz matrix and convolution
Methods to compute convolution
 □ Method 1: Brute-force method
 □ Method 2: Using Toeplitz matrix
 □ Method 3: Using FFT to compute convolution
 □ Miscellaneous methods
Analytic signal and its applications
 □ Analytic signal and Fourier transform
 □ Extracting instantaneous amplitude, phase, frequency
 □ Phase demodulation using Hilbert transform
Choosing a filter : FIR or IIR : understanding the design perspective
 □ Design specification
 □ General considerations in design

 

Markov Chains – Simplified !!

Key focus: Markov chains are a probabilistic models that describe a sequence of observations whose occurrence are statistically dependent only on the previous ones.

● Time-series data like speech, stock price movements.
● Words in a sentence.
● Base pairs on the rung of a DNA ladder.

States and transitions

Assume that we want to model the behavior of a driver behind the wheel. The possible behaviors are

● accelerate (state 1)
● constant speed (state 2)
● idling (engine running slowly but the vehicle is not moving – (state 3))
● brake (state 4)

Let’s refer each of these behaviors as a state. In the given example, there are N=4 states, refer them as Q = {q1,q2,q3,q4}.

We observe the following pattern in the driver’s behavior (Figure 1). That is, the driver operates the vehicle through a certain sequence of states. In the graph shown in Figure 1, the states are represented as nodes and the transitions as edges.

Figure 1: Driver’s behavior – operating the vehicle through a sequence of states

We see that, sometimes, the driver changes the state of the vehicle from one state to another and sometimes stays in the same state (as indicated by the arrows).

We also note that either the vehicle stays in the same state or changes to the next state. Therefore, from this model, if we want to predict the future state, all that matters is the current state of the vehicle. The past states has no bearing on the future state except through the current state. Take note of this important assumption for now.

Probabilistic model

We know that we cannot be certain about the driver’s behavior at any given point in time. Therefore, to model this uncertainty, the model is turned into a probabilistic model. A probabilistic model allows us to account for the likelihood of the behaviors or change of states.

An example for a probabilistic model for the given problem is given in Figure 2.

Figure 2: Driver’s behavior – a probabilistic model (transition matrix shown)

In this probabilistic model, we have assigned probability values to the transitions.These probabilities are collectively called transition probabilities. For example, considering the state named “idling”, the probability of the car to transition from this state to the next state (accelerate) is 0.8. In probability mathematics this is expressed as a conditional probability conditioned on the previous state.

p(state = “accelerate” | previous state = “idling” ) = 0.8

Usually, the transition probabilities are formulated in the form of matrix called transition probability matrix. The transition probability matrix is shown in Figure 2. In a transition matrix, denoted as A, each element aij represent the probability of transitioning from state i to state j. The elements of the transition matrix satisfy the following property.

That is, the sum of transition probabilities leaving any given state is 1.

As we know, in this example, the driver cannot start car in any state (example, it is impossible to start the car in “constant speed” state). He can only start the car from at rest (i.e, brake state). To model this uncertainty, we introduce πi – the probability that the Markov chain starts in a given state i. The set of starting probabilities for all the N states are called initial probability distribution (π = π1, π2, …, πN). In Figure 3, the starting probabilities are denoted by green arrows.

Figure 3: Markov Chain model for driver’s behavior

Markov Assumption

As noted in the definition, the Markov chain in this example, assumes that the occurrence of each event/observation is statistically dependent only on the previous one. This is a first order Markov chain (or termed as bigram language model in natural language processing application). For the states Q = {q1, …, qn}, predicting the probability of a future state depends only on the current observation, all other previous observations do not matter. In probabilistic terms, this first order Markov chain assumption is denoted as

Extending the assumption for mth order Markov chain, predicting the probability of a future observation depends only on the previous m observations. This is an m-gram model.

Given a set of n arbitrary random variables/observations Q = {q1, …, qn}, their joint probability distribution is usually computed by applying the following chain rule.

However, if the random observations in Q are of sequential in nature and follows the generic mth order Markov chain model, then the computation of joint probability gets simplified.

The Markov assumptions for first and second order of Markov models are summarized in Figure 4.Figure 4: Assumptions for 1st order and 2nd order Markov chains

Hidden Markov Model (HMM)

Markov chains are useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This brings us to the next topic of discussion – the hidden Markov models.

Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Similar topics

Essentials of Signal Processing
● Generating standard test signals
 □ Sinusoidal signals
 □ Square wave
 □ Rectangular pulse
 □ Gaussian pulse
 □ Chirp signal
Interpreting FFT results - complex DFT, frequency bins and FFTShift
 □ Real and complex DFT
 □ Fast Fourier Transform (FFT)
 □ Interpreting the FFT results
 □ FFTShift
 □ IFFTShift
Obtaining magnitude and phase information from FFT
 □ Discrete-time domain representation
 □ Representing the signal in frequency domain using FFT
 □ Reconstructing the time domain signal from the frequency domain samples
● Power spectral density
Power and energy of a signal
 □ Energy of a signal
 □ Power of a signal
 □ Classification of signals
 □ Computation of power of a signal - simulation and verification
Polynomials, convolution and Toeplitz matrices
 □ Polynomial functions
 □ Representing single variable polynomial functions
 □ Multiplication of polynomials and linear convolution
 □ Toeplitz matrix and convolution
Methods to compute convolution
 □ Method 1: Brute-force method
 □ Method 2: Using Toeplitz matrix
 □ Method 3: Using FFT to compute convolution
 □ Miscellaneous methods
Analytic signal and its applications
 □ Analytic signal and Fourier transform
 □ Extracting instantaneous amplitude, phase, frequency
 □ Phase demodulation using Hilbert transform
Choosing a filter : FIR or IIR : understanding the design perspective
 □ Design specification
 □ General considerations in design

Shannon limit on power efficiency – demystified

The Shannon power efficiency limit is the limit of a band-limited system irrespective of modulation or coding scheme. It informs us the minimum required energy per bit required at the transmitter for reliable communication. It is also called unconstrained Shannon power efficiency Limit. If we select a particular modulation scheme or an encoding scheme, we calculate the constrained Shannon limit for that scheme.

Before proceeding, I urge you to go through the fundamentals of Shannon Capacity theorem in this article.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Channel capacity and power efficiency

One of the objective of a communication system design is to reliably send information at the lowest possible power level. The system should be able to provide acceptable bit-error-rate (BER) performance at the lowest possible power level. Often, this performance is charted in terms of BER Vs. . The quantity is called power efficiency, denoted as . Power efficiency is defined as the ratio of signal energy per bit () to noise power spectral density per bit ( – required at the receiver input to achieve certain BER.

From equations (1) and (2) shown in this post, the condition for reliable transmission through a channel is given by

Re-writing in terms of spectral efficiency , the Shannon limit on power efficiency for reliable communication is given by

With this equation, we can calculate the minimum required to achieve a certain spectral efficiency. As an example, lets simulate and plot the relationship between and spectral efficiency , as given in equation (3).

k =0.1:0.001:15; EbN0=(2.ˆk-1)./k;
semilogy(10*log10(EbN0),k);
xlabel('E_b/N_o (dB)');ylabel('Spectral Efficiency (\eta)');
title('Channel Capacity & Power efficiency limit')
hold on;grid on; xlim([-2 20]);ylim([0.1 10]);
yL = get(gca,'YLim');
line([-1.59 -1.59],yL,'Color','r','LineStyle','--');

The ultimate Shannon limit

From the plot in Fig. 1, we notice that the Shannon limit on is a monotonic function of . When , the Shannon limit on is equal to . If , the limit is at . When , the Shannon limit on approaches . This value is called ultimate Shannon limit or specifically absolute Shannon power efficiency limit. This limit informs us the minimum required energy per bit required at the transmitter for reliable communication. It is one among the important measures in designing a coding scheme.

Figure 1: Shannon Power Efficiency Limit

The ultimate Shannon limit can be derived using L’Hospital’s rule as follows. The asymptotic value, , that we are seeking, is the value of as the spectral efficiency approaches .

Let and . As and the argument of the limit becomes indeterminate (), L’Hospital’s rule can be applied in this case. According to L’Hospital’s rule, if and are both zero or are both , then for any value of .

Thus, the next step boils down to finding the first derivative of and . Expressing in natural logarithm.


Let and , then by chain rule of differentiation,

Since , the first derivative of is

Using equations (8) and (9), and applying L’Hospital’s rule, the Shannon’s limit on is given by

Unconstrained and constrained Shannon limit

The absolute Shannon power efficiency limit is the limit of a band-limited system irrespective of modulation or coding scheme. This is also called unconstrained Shannon power efficiency Limit. If we select a particular modulation scheme or an encoding scheme, we calculate the constrained Shannon limit for that scheme.

Shannon power efficiency limit does not depend on error probability. Shannon limit tells us the minimum possible required for achieving an arbitrarily small probability of error as , where is the number of signaling levels for the modulation technique, for BPSK , QPSK and so on. It gives the minimum possible that satisfies the Shannon theorem. In other words, it gives the minimum possible required to achieve maximum transmission capacity ( , where, is the rate of transmission and is the channel capacity). It will not specify error probability at that limit. Nor will it give any direction on coding technique that can be used to achieve that limit. As the capacity is approached, the system complexity will increase drastically. So the aim of any system design is to achieve that limit. For example, the error probability performances of Turbo codes are very close to Shannon limit [1].

As an example, let’s evaluate the performance of a 2-PAM (Pulse Amplitude Modulation) system and determine the maximum possible coding gain that can be achieved by the most advanced coding scheme. The methodology for simulating the performance of a 2-PAM system is described in chapter 5 and 6. Using this methodology, the performance of a 2-PAM system is simulated and plotted in Figure 2. The absolute Shannon power efficiency limits when the spectral efficiency is and are also referenced on the plot.

The spectral efficiency of an ideal 2-PAM system is . Hence, if the target bit error rate is , then a coding gain of can be achieved using powerful codes, if we have to maintain the nominal spectral efficiency at .

If there is no limit on the spectral efficiency, then we can let . In this case, the absolute Shannon power efficiency limit is when . Thus a coding gain of approximately is possible with powerful codes if we let the spectral efficiency approach zero.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References

[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding and decoding: Turbocodes, IEEE Int. Conf. on Comm., (ICC ’93), 2:1064-1070, May 1993.↗

 Related topics in this chapter

Introduction
Shannon’s noisy channel coding theorem
Unconstrained capacity for bandlimited AWGN channel
● Shannon’s limit on spectral efficiency
Shannon’s limit on power efficiency
● Generic capacity equation for discrete memoryless channel (DMC)
 □ Capacity over binary symmetric channel (BSC)
 □ Capacity over binary erasure channel (BEC)
● Constrained capacity of discrete input continuous output memoryless AWGN channel
● Ergodic capacity over a fading channel

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Block Interleaver Design for RS codes

Introduction:

A Reed Solomon (RS) encoder, takes user data symbols and converts it into a n symbol wide codeword, by adding parity symbols. The error correcting capability of the RS code is computed as . That is, a RS code with parity symbols can correct a burst error of upto symbol errors.

Block Interleavers:

Suppose, assume that the dominant error mechanism in a channel is of burst type. A burst of length b is defined as a string of b unreliable consecutive symbols. If the expected burst length, b is less than or equal to t (the number of correctable symbol errors by RS coding), the code can be used as it is. However, if bursts length , the error correcting code will fail. This is where interleaving comes to our rescue.

Let us assume that and , where d (the interleaving depth) is an integer. The Reed-Solomon code can be used if we can spread the burst error sequence over several code blocks so that each block has no more than t errors (which can then be corrected). This can be accomplished using block interleaving as follows. Instead of encoding blocks of k symbols and then sending the encoded symbols consecutively, we can interleave the encoded blocks and transmit the interleaved data. In the case where ,the data bytes output from the Reed-Solomon encoder would appear as shown below , where bytes numbered 0 to 234 are the data bytes and bytes 235 to 254 are the parity check bytes.

Code Structure for Reed Solomon (RS) Codes

Here, the data is written row by row and read back column by column.Consider now the effect of a burst error of length , (where is the number of correctable errors per block) and for some , on the received symbols in the table. Because of the order in which the symbols are sent, a burst length less than or equal to will effect at most consecutive columns of the table, depending on where the burst starts. Notice that any single row (which corresponds to a codeword) has no more than v errors in it. If , these errors are within the error correction capability of the code and can be corrected. In this case, becomes the interleaving depth. The trade-off is that extra buffer space is required to store the interleaver table and additional delay is introduced. The worst case burst length determines the size of the table (and the interleaving depth) and the table size determines the amount of buffer space required and the delay.

Design Example:

Consider a (255,235) Reed Solomon coding system. This code can correct upto symbols errors. Lets assume that the channel that we are going to use, is expected to cause symbols. Then the interleaver depth is calculated as

In this case , an interleaver depth of 26 is enough to combat the burst errors introduced by the channel. The block interleaver dimensions would be (26 rows by 255 columns).

Matlab Code:

A sample matlab code that simulates the above mentioned block interleaver design is given below. The input data is a repeatitive stream of following symbols – “THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_“. A (255,235) Reed Solomon decoder (with correction capability of 10 symbols) is used. We assume that the channel is expected to produce a maximum of consecutive 20 symbols of burst error. The burst errors are denoted by ‘*‘.

%Demonstration of design of Block Interleaver for Reed Solomon Code
%Author : Mathuranathan for https://www.gaussianwaves.com
%License - Creative Commons - cc-by-nc-sa 3.0
clc;
clear;
%____________________________________________
%Input Parameters
%____________________________________________
%Interleaver Design for Reed-Solomon Codes
%RS code parameters
n=255;  %RS codeword length
k=235;  %Number of data symbols
b=20;  %Number of symbols that is expected to be corrupted by the channel
%____________________________________________

p=n-k;  %Number of parity symbols
t=p/2; %Error correction capability of RS code

fprintf('Given (%d,%d) Reed Solomon code can correct : %d symbols \n',n,k,fix(t));
fprintf('Given - expected burst error length from the channel : %d symbols \n',b);
disp('____________________________________________________________________________');
if(b>t)
    fprintf('Interleaving  MAY help in this scenario\n');
else
    fprintf('Interleaving will NOT help in this scenario\n');
end
disp('____________________________________________________________________________');
D=ceil(b/t)+1; %Intelever Depth

memory = zeros(D,n); %constructing block interleaver memory

data='THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_'; % A constant pattern used as a data

%If n>length(data) then repeat the pattern to construct a data of length 'n'
data=char([repmat(data,[1,fix(n/length(data))]),data(1:mod(n,length(data)))]);

%We sending D blocks of similar data
intlvrInput=repmat(data(1:n),[1 D]);

fprintf('Input Data to the Interleaver -> \n');
disp(char(intlvrInput));
disp('____________________________________________________________________________');

%INTERLEAVER
%Writing into the interleaver row-by-row
for index=1:D
    memory(index,1:end)=intlvrInput((index-1)*n+1:index*n);
end
intlvrOutput=zeros(1,D*n);
%Reading from the interleaver column-by-column
for index=1:n
    intlvrOutput((index-1)*D+1:index*D)=memory(:,index);
end

%Create b symbols error at 25th Symbol location for test in the interleaved output
%'*' means error in this case
intlvrOutput(1,25:24+b)=zeros(1,b)+42;
fprintf('\nInterleaver Output after being corrupted by %d symbol burst error - marked by ''*''->\n',b);
disp(char(intlvrOutput));
disp('____________________________________________________________________________');

%Deinteleaver
deintlvrOutput=zeros(1,D*n);
%Writing into the deinterleaver column-by-column
for index=1:n
    memory(:,index)=intlvrOutput((index-1)*D+1:index*D)';
end

%Reading from the deinterleaver row-by-row
for index=1:D
    deintlvrOnput((index-1)*n+1:index*n)=memory(index,1:end);
end
fprintf('Deinterleaver Output->\n');
disp(char(deintlvrOnput));
disp('____________________________________________________________________________');

Simulation Result:

Given : (255,235) Reed Solomon code can correct : 10 symbols
Given : expected burst error length from the channel : 20 symbols


Interleaving MAY help in this scenario


Input Data to the Interleaver ->
THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_T
HE_LAZY_DOG_
THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_
JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUIC
K_BROWN_FOX_JUMPS_OVER_THE_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_
QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_
LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JU
MPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_THE_QUICK_BROWN_FOX
JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUIC
K_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY

DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_O
VER_THE_
____________________________________________________________________________

Interleaver Output after being corrupted by 20 symbols burst error – marked by ‘*‘->
TTTHHHEEE___QQQUUUIIICCC********************N___FFFOOOXXX___JJJUUUMMMPPPSSS___OOOVV
VEEERRR___TTTHHHEEE___LLLAAAZZZYYY___DDDOOOGGG___TTTHHHEEE___QQQUUUIIICCCKKK
___BBBRRROOOWWWNNN___FFFOOOXXX___JJJUUUMMMPPPSSS___OOOVVVEEERRR___TTTHHH
EEE___LLLAAAZZZYYY___DDDOOOGGG___TTTHHHEEE___QQQUUUIIICCCKKK___BBBRRROOOWWWN
NN___FFFOOOXXX___JJJUUUMMMPPPSSS___OOOVVVEEERRR___TTTHHHEEE___LLLAAAZZZYYY___
DDDOOOGGG___TTTHHHEEE___QQQUUUIIICCCKKK___BBBRRROOOWWWNNN___FFFOOOXXX___JJJ
UUUMMMPPPSSS___OOOVVVEEERRR___TTTHHHEEE___LLLAAAZZZYYY___DDDOOOGGG___TTTHHHEE
E___QQQUUUIIICCCKKK___BBBRRROOOWWWNNN___FFFOOOXXX___JJJUUUMMMPPPSSS___OOOV
VVEEERRR___TTTHHHEEE___LLLAAAZZZYYY___DDDOOOGGG___TTTHHHEEE___QQQUUUIIICCCKKK___
BBBRRROOOWWWNNN___FFFOOOXXX___JJJUUUMMMPPPSSS___OOOVVVEEERRR___TTTHHHEEE__
_
____________________________________________________________________________

Deinterleaver Output->
THE_QUIC*******_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_
LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUM
PS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BR
OWN_FOX_JUMPS_OVER_THE_THE_QUIC*******_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BRO
WN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_TH
E_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_
LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_THE_QUIC******N_FOX_JUMPS_OVER_THE_L
AZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS
_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN
_FOX_JUMPS_OVER_THE_LAZY_DOG_THE_QUICK_BROWN_FOX_JUMPS_OVER_THE_
_______________
_____________________________________________________________

As we can see from the above simulation that, eventhough the channel introduces 20 symbols of consecutive burst error (which is beyond the correction capability of the RS decoder), the interleaver/deinterleaver operation has effectively distributed the errors and reduced the maximum burst length to 7 symbols (which is easier to correct by (255,235) Reed Solomon code.

See also:

[1] Introduction to Interleavers and deinterleavers
[2] Random Interleavers

Additional Resources:

[1] Notes on theory and construction of Reed Solomon Codes – Bernard Sklar
[2] Concatenation and Advanced Codes – Applications of interleavers- Stanford University

Viterbi Decoding of Convolutional codes

Note: There is a rating embedded within this post, please visit this post to rate it.
Viterbi algorithm is utilized to decode the convolutional codes. Again the decoding can be done in two approaches. One approach is called hard decision decoding which uses Hamming distance as a metric to perform the decoding operation, whereas, the soft decision decoding uses Euclidean distance as a metric.

As stated in one of the previous posts the soft decision decoding improves the performance of the code to a significant extent compared to the hard decision decoding.

Viterbi Algorithm (VA) decoding involves three stages, namely,

1) Branch Metric Calculation
2) Path Metric Calculation
3) Trace back operation

Branch Metric Calculation :

The pair of received bits (for (n=2) , if (n=3) then we call it triplets, etc.,) are compared with the corresponding branches in the trellis and the distance metrics are calculated. For hard decision decoding, Hamming distances are calculated. Suppose if the received pair of bits are ’11’ and the hamming distance to {’00’,’01’,’10’,’11’} outputs of the trellis are 2,1,1,0 respectively.

For soft decision decoding, see previous article

Path Metric Calculation:

Path metrics are calculated using a procedure called ACS (Add-Compare-Select). This procedure is repeated for every encoder state.

1. Add – for a given state, we know two states on the previous step which can move to this state, and the output bit pairs that correspond to these transitions. To calculate new path metrics, we add the previous path metrics with the corresponding branch metrics.

2. Compare, select – we now have two paths, ending in a given state. One of them (with greater metric) is dropped.

Trace back Operation :

Track back operation is needed in hardware that generally has memory limitations and if the transmitted message is of greater length compared to the memory available. It is also required to maintain a constant throughput at the output of the decoder.

Using soft decision decoding is recommended for Viterbi decoders, since it can give a gain of about 2 dB (that is, a system with a soft decision decoder can use 2 dB less transmitting power than a system with a hard decision decoder with the same error probability).

Two Level Coding System:

Convolution codes with Viterbi decoding are not good at burst error correction, but they are good at random error correction.On the contrary, the Reed Solomon coding is good at burst error correction and not so good at random error correction. So the advantages of these two codes are exploited in many systems to provide good error correction capability against both random and burst errors.

In such systems, data are encoded firstly with a Reed-Solomon code, then they are processed by an interleaver (which places symbols from the same Reed-Solomon codeword far from each other), and then encoded with a convolutional code.

At the receiver, data are firstly processed by a Viterbi decoder. The bursts of errors from its output are then deinterleaved (with erroneous symbols from one burst getting into different Reed-Solomon codewords) and decoded with a Reed-Solomon decoder.

Two Level Coding System

Additional Resources:

[1] A graphical illustration of “How Viterbi Decoding works”

Recommended Books

Maximum Likelihood estimation

Keywords: maximum likelihood estimation, statistical method, probability distribution, MLE, models, practical applications, finance, economics, natural sciences.

Introduction

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution by finding the set of values that maximize the likelihood function of the observed data. In other words, MLE is a method of finding the most likely values of the unknown parameters that would have generated the observed data.

The likelihood function is a function that describes the probability of observing the data given the parameters of the probability distribution. The MLE method seeks to find the set of parameter values that maximizes this likelihood function.

For example, suppose we have a set of data that we believe to be normally distributed, but we do not know the mean or variance of the distribution. We can use MLE to estimate these parameters by finding the mean and variance that maximize the likelihood function of the observed data.

The MLE method is widely used in statistical inference, hypothesis testing, and model fitting in many areas, including economics, finance, engineering, and the natural sciences. MLE is a powerful and flexible method that can be applied to a wide range of statistical models, making it a valuable tool in data analysis and modeling.

Difference between MLE and MLD

Maximum likelihood estimation (MLE) and maximum likelihood decoding (MLD) are two different concepts used in different contexts.

Maximum likelihood estimation is a statistical method used to estimate the parameters of a probability distribution based on a set of observed data. The goal is to find the set of parameter values that maximize the likelihood function of the observed data. MLE is commonly used in statistical inference, hypothesis testing, and model fitting.

On the other hand, maximum likelihood decoding (MLD) is a method used in digital communications and signal processing to decode a received signal that has been transmitted through a noisy channel. The goal is to find the transmitted message that is most likely to have produced the received signal, based on a given probabilistic model of the channel.

In maximum likelihood decoding, the receiver calculates the likelihood of each possible transmitted message, given the received signal and the channel model. The maximum likelihood decoder then selects the transmitted message that has the highest likelihood as the decoded message.

While both MLE and MLD involve the concept of maximum likelihood, they are used in different contexts. MLE is used in statistical estimation, while MLD is used in digital communications and signal processing for decoding.

MLE applied to communication systems

Maximum Likelihood estimation (MLE) is an important tool in determining the actual probabilities of the assumed model of communication.

In reality, a communication channel can be quite complex and a model becomes necessary to simplify calculations at decoder side.The model should closely approximate the complex communication channel. There exist a myriad of standard statistical models that can be employed for this task; Gaussian, Binomial, Exponential, Geometric, Poisson,etc., A standard communication model is chosen based on empirical data.

Each model mentioned above has unique parameters that characterizes them. Determination of these parameters for the chosen model is necessary to make them closely model the communication channel at hand.

Suppose a binomial model is chosen (based on observation of data) for the error events over a particular channel, it is essential to determine the probability of succcess (\(p\)) of the binomial model.

If a Gaussian model (normal distribution!!!) is chosen for a particular channel then estimating mean (\(\mu\)) and variance (\(\sigma^{2}\)) are necessary so that they can be applied while computing the conditional probability of p(y received | x sent)

Similarly estimating the mean number of events within a given interval of time or space (\(\lambda\)) is a necessity for a Poisson distribution model.

Maximum likelihood estimation is a method to determine these unknown parameters associated with the corresponding chosen models of the communication channel.

Python code example for MLE

The following program is an implementation of maximum likelihood estimation (MLE) for the binary symmetric channel (BSC) using the binomial probability mass function (PMF).

The goal of MLE is to estimate the value of an unknown parameter (in this case, the error probability \(p\)) based on observed data. The BSC is a simple channel model where each transmitted bit is flipped (with probability \(p\)) independently of other bits during transmission. The goal of the following program is to estimate the error probability \(p\) of the BSC based on a given binary data sequence.

import numpy as np
from scipy.optimize import minimize
from scipy.special import binom
import matplotlib.pyplot as plt

def BSC_MLE(data):
    """
    Maximum likelihood estimation (MLE) for the Binary Symmetric Channel (BSC).
    This function estimates the error probability p of the BSC based on the observed data.
    """
    
    # Define the binomial probability mass function
    def binom_PMF(p):
        n = len(data)
        k = np.sum(data)
        p = np.clip(p, 1e-10, 1 - 1e-10)  # Regularization to avoid problems due to small estimation errors
        logprob = np.log(binom(n, k)) + k*np.log(p) + (n-k)*np.log(1-p)
        return -logprob
    
    # Use the minimize function from scipy.optimize to find the value of p that maximizes the binomial PMF
    #x0 argument specifies the initial guess for the value of p that maximizes the binomial PMF. For BSC x0=0.5
    #BFGS is Broyden-Fletcher-Goldfarb-Shanno optimization algorithm used for unconstrained nonlinear optimization
    res = minimize(lambda p: binom_PMF(p), x0=0.5, method='BFGS')
    p_est = res.x[0]

    # Plot the observed data as a histogram
    plt.hist(data, bins=2, density=True, alpha=0.5)
    plt.axvline(p_est, color='r', linestyle='--')
    plt.xlabel('Bit value')
    plt.ylabel('Frequency')
    plt.title('Observed data')
    plt.show()
    
    return p_est

data = np.random.randint(2, size=1000)
p_est = BSC_MLE(data)
print('Estimated error probability: {:.4f}'.format(p_est))

The program first defines a function called BSC_MLE that takes a binary data sequence as input and returns the estimated error probability p_est. The BSC_MLE function defines the binomial PMF, which represents the probability of observing a certain number of errors (i.e., bit flips) in the data sequence given a specific error probability p. The binomial PMF is then maximized using the minimize function from the scipy.optimize module to find the value of p that maximizes the likelihood of observing the data.

The program then generates a random binary data sequence of length 100 using the np.random.randint() function and calls the BSC_MLE function to estimate the error probability based on the observed data. Finally, the program prints the estimated error probability. Try increasing the sequence length to 1000 and observe the estimated error probability.

Figure 1: Maximum Likelihood Estimation (MLE) : Plotting the observed data as a histogram

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Maximum Likelihood Decoding

Keywords: maximum likelihood decoding, digital communication, data storage, noise, interference, wireless communication systems, optical communication systems, digital storage systems, probability, likelihood estimation, python

Introduction

Maximum likelihood decoding is a technique used to determine the most likely transmitted message in a digital communication system, based on the received signal and statistical models of noise and interference. The method uses maximum likelihood estimation to calculate the probability of each possible transmitted message and then selects the one with the highest probability.

To perform maximum likelihood decoding, the receiver uses a set of pre-defined models to estimate the likelihood of each possible transmitted message based on the received signal. The method is commonly used in various digital communication and data storage systems, such as wireless communication and digital storage. However, it can be complex and time-consuming, particularly in systems with large message spaces or complex noise and interference models.

Maximum Likelihood Decoding:

Consider a set of possible codewords (valid codewords – set \(Y\)) generated by an encoder in the transmitter side. We pick one codeword out of this set ( call it \(y\) ) and transmit it via a Binary Symmetric Channel (BSC) with probability of error \(p\) ( To know what is a BSC – click here ). At the receiver side we receive the distorted version of \(y\) ( call this erroneous codeword \(x\)).

Maximum Likelihood Decoding chooses one codeword from \(Y\) (the list of all possible codewords) which maximizes the following probability.

\[\mathbb{P}(y\;sent\mid x\;received )\]

Meaning that the receiver computes \(P(y_1,x) , P(y_2,x) , P(y_3,x),\cdots,P(y_n,x)\). and chooses a codeword (\(y\)) which gives the maximum probability.  In practice we don’t know \(y\) (at the receiver) but we know \(x\). So how to compute the probability ? Maximum Likelihood Estimation (MLE) comes to our rescue. For a detailed explanation on MLE – refer here[1] The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. Understanding the difference between prediction and estimation is important at this point.   Estimation differs from prediction in the following way … In estimation problems, likelihood of the parameters is estimated based on given data/observation vector. In prediction problems, probability is used as a measure to predict the outcome from known parameters of a model.

Examples for “Prediction” and “Estimation” :

1) Probability of getting a “Head” in a single toss of a fair coin is \(0.5\). The coin is tossed 100 times in a row.Prediction helps in predicting the outcome ( head or tail ) of the \(101^{th}\) toss based on the probability.

2) A coin is tossed 100 times and the data ( head or tail information) is recorded. Assuming the event follows Binomial distribution model, estimation helps in determining the probability of the event. The actual probability may or may not be \(0.5\).   Maximum Likelihood Estimation estimates the conditional probability based on the observed data ( received data – \(x\)) and an assumed model.

Example of Maximum Likelihood Decoding:

Let \(y=11001001\) and \(x=10011001\) . Assuming Binomial distribution model for the event with probability of error \(0.1\) (i.e the reliability of the BSC is \(1-p = 0.9\)), the Hamming distance between codewords is \(2\) . For binomial model,

\[\mathbb{P}(y\;received\mid x\;sent ) = (1-p)^{n-d}.p^{d}\]

where \(d\) =the hamming distance between the received and the sent codewords n= number of bit sent
\(p\)= error probability of the BSC.
\(1-p\) = reliability of BSC

Substituting \(d=2, n=8\) and \(p=0.1\) , then \(P(y\;received \mid x\;sent) = 0.005314\).

Note : Here, Hamming distance is used to compute the probability. So the decoding can be called as “minimum distance decoding” (which minimizes the Hamming distance) or “maximum likelihood decoding”. Euclidean distance may also be used to compute the conditional probability.

As mentioned earlier, in practice \(y\) is not known at the receiver. Lets see how to estimate \(P(y \;received \mid x\; sent)\) when \(y\) is unknown based on the binomial model.

Since the receiver is unaware of the particular \(y\) corresponding to the \(x\) received, the receiver computes \(P(y\; received \mid x\; sent)\) for each codeword in \(Y\). The \(y\) which gives the maximum probability is concluded as the codeword that was sent.

Python code implementing Maximum Likelihood Decoding:

The following program for demonstrating the maximum likelihood decoding, involves generating a noisy signal from a transmitted message and then using maximum likelihood decoding to estimate the transmitted message from the noisy signal.

  1. The maximum_likelihood_decoding function takes three arguments: received_signal, noise_variance, and message_space.
  2. The calculate_probabilities function is called to calculate the probability of each possible message given the received signal, using the known noise variance.
  3. The probabilities are normalized so that they sum to 1.
  4. The maximum_likelihood_decoding function finds the index of the most likely message (i.e., the message with the highest probability).
  5. The function returns the most likely message.
  6. An example usage is demonstrated where a binary message space is defined ([0, 1]), along with a noise variance and a transmitted message.
  7. The transmitted message is added to noise to generate a noisy received signal.
  8. The maximum_likelihood_decoding function is called to decode the noisy signal.
  9. The transmitted message, received signal, and decoded message are printed to the console for evaluation.
import numpy as np
import matplotlib.pyplot as plt

# Define a function to calculate the probability of each possible message given the received signal
def calculate_probabilities(received_signal, noise_variance, message_space):
    probabilities = np.zeros(len(message_space))

    for i, message in enumerate(message_space):
        error = received_signal - message
        probabilities[i] = np.exp(-np.sum(error ** 2) / (2 * noise_variance))

    return probabilities / np.sum(probabilities)

# Define a function to perform maximum likelihood decoding
def maximum_likelihood_decoding(received_signal, noise_variance, message_space):
    probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
    most_likely_message_index = np.argmax(probabilities)
    return message_space[most_likely_message_index]

# Example usage
message_space = np.array([0, 1])
noise_variance = 0.4
transmitted_message = 1
received_signal = transmitted_message + np.sqrt(noise_variance) * np.random.randn()
decoded_message = maximum_likelihood_decoding(received_signal, noise_variance, message_space)

print('Transmitted message:', transmitted_message)
print('Received signal:', received_signal)
print('Decoded message:', decoded_message)

# Plot probability distribution
probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
plt.bar(message_space, probabilities)
plt.title('Probability Distribution for Received Signal = {}'.format(received_signal))
plt.xlabel('Transmitted Message')
plt.ylabel('Probability')
plt.ylim([0, 1])
plt.show()

The probability of the received signal given a specific transmitted message is calculated as follows:

  1. Compute the difference between the received signal and the transmitted message.
  2. Compute the sum of squares of this difference vector.
  3. Divide this sum by twice the known noise variance.
  4. Take the negative exponential of this value.

This results in a probability density function (PDF) for the received signal given the transmitted message, assuming that the noise is Gaussian and zero-mean.

The probabilities for each possible transmitted message are then normalized so that they sum to 1. This is done by dividing each individual probability by the sum of all probabilities.

The maximum_likelihood_decoding function determines the most likely transmitted message by selecting the message with the highest probability, which corresponds to the maximum likelihood estimate of the transmitted message given the received signal and the statistical model of the noise.

Sample outputs

Transmitted message: 1
Received signal: 0.21798306949364643
Decoded message: 0

Transmitted message: 1
Received signal: -0.5115453787966966
Decoded message: 0

Transmitted message: 1
Received signal: 0.8343088336355061
Decoded message: 1

Transmitted message: 1
Received signal: -0.5479891887158619
Decoded message: 0

The probability distribution for the last sample output is shown below

Figure: Probability distribution for a sample run of the code

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Hard and Soft decision decoding

What are hard and soft decision decoding

Hard decision decoding and soft decision decoding are two different methods used for decoding error-correcting codes.

With hard decision decoding, the received signal is compared to a set threshold value to determine whether the transmitted bit is a 0 or a 1. This is commonly used in digital communication systems that experience noise or interference, resulting in a low signal-to-noise ratio.

Soft decision decoding, on the other hand, treats the received signal as a probability distribution and calculates the likelihood of each possible transmitted bit based on the characteristics of the received signal. This approach is often used in modern digital communication and data storage systems where the signal-to-noise ratio is relatively high and there is a need for higher accuracy and reliability.

While soft decision decoding can achieve better error correction, it is more complex and computationally expensive than hard decision decoding.

More details

Let’s expatiate on the concepts of hard decision and soft decision decoding. Consider a simple even parity encoder given below.

Input Bit 1
Input Bit 2
Parity bit added by encoder
Codeword Generated
0
0
0
000
0
1
1
011
1
0
1
101
1
1
0
110

The set of all possible codewords generated by the encoder are 000,011,101 and 110.

Lets say we are want to transmit the message “01” through a communication channel.

Hard decision decoding

Case 1 : Assume that our communication model consists of a parity encoder, communication channel (attenuates the data randomly) and a hard decision decoder

The message bits “01” are applied to the parity encoder and we get “011” as the output codeword.

Figure 1: Hard decision decoding – a simple illustration

The output codeword “011” is transmitted through the channel. “0” is transmitted as “0 Volt and “1” as “1 Volt”. The channel attenuates the signal that is being transmitted and the receiver sees a distorted waveform ( “Red color waveform”). The hard decision decoder makes a decision based on the threshold voltage. In our case the threshold voltage is chosen as 0.5 Volt ( midway between “0” and “1” Volt ) . At each sampling instant in the receiver (as shown in the figure above) the hard decision detector determines the state of the bit to be “0” if the voltage level falls below the threshold and “1” if the voltage level is above the threshold. Therefore, the output of the hard decision block is “001”. Perhaps this “001” output is not a valid codeword ( compare this with the all possible codewords given in the table above) , which implies that the message bits cannot be recovered properly. The decoder compares the output of the hard decision block with the all possible codewords and computes the minimum Hamming distance for each case (as illustrated in the table below).

All possible Codewords
Hard decision output
Hamming distance
000
001
1
011
001
1
101
001
1
110
001
3

The decoder’s job is to choose a valid codeword which has the minimum Hamming distance. In our case, the minimum Hamming distance is “1” and there are 3 valid codewords with this distance. The decoder may choose any of the three possibility and the probability of getting the correct codeword (“001” – this is what we transmitted) is always 1/3. So when the hard decision decoding is employed the probability of recovering our data ( in this particular case) is 1/3. Lets see what “Soft decision decoding” offers …

Soft Decision Decoding

The difference between hard and soft decision decoder is as follows

  • In Hard decision decoding, the received codeword is compared with the all possible codewords and the codeword which gives the minimum Hamming distance is selected
  • In Soft decision decoding, the received codeword is compared with the all possible codewords and the codeword which gives the minimum Euclidean distance is selected. Thus the soft decision decoding improves the decision making process by supplying additional reliability information ( calculated Euclidean distance or calculated log-likelihood ratio)

For the same encoder and channel combination lets see the effect of replacing the hard decision block with a soft decision block.

Voltage levels of the received signal at each sampling instant are shown in the figure. The soft decision block calculates the Euclidean distance between the received signal and the all possible codewords.

Valid codewords
Voltage levels at each sampling instant of received waveform
Euclidean distance calculation
Euclidean distance
0 0 0
( 0V 0V 0V )
0.2V 0.4V 0.7V
(0-0.2)2+ (0-0.4)2+ (0-0.7)2
0.69
0 1 1
( 0V 1V 1V )
0.2V 0.4V 0.7V
(0-0.2)2+ (1-0.4)2+ (1-0.7)2
0.49
1 0 1
( 1V 0V 1V )
0.2V 0.4V 0.7V
(1-0.2)2+ (0-0.4)2+ (1-0.7)2
0.89
1 1 0
( 1V 1V 0V )
 
0.2V 0.4V 0.7V
(1-0.2)2+ (1-0.4)2+ (0-0.7)2
1.49

The minimum Euclidean distance is “0.49” corresponding to “0 1 1” codeword (which is what we transmitted). The decoder selects this codeword as the output. Even though the parity encoder cannot correct errors, the soft decision scheme helped in recovering the data in this case. This fact delineates the improvement that will be seen when this soft decision scheme is used in combination with forward error correcting (FEC) schemes like convolution codes , LDPC etc

From this illustration we can understand that the soft decision decoders uses all of the information ( voltage levels in this case) in the process of decision making whereas the hard decision decoders does not fully utilize the information available in the received signal (evident from calculating Hamming distance just by comparing the signal level with the threshold whereby neglecting the actual voltage levels).

Note: This is just to illustrate the concept of Soft decision and Hard decision decoding. Prudent souls will be quick enough to find that the parity code example will fail for other voltage levels (e.g. : 0.2V , 0.4 V and 0.6V) . This is because the parity encoders are not capable of correcting errors but are capable of detecting single bit errors.

Soft decision decoding scheme is often realized using Viterbi decoders. Such decoders utilize Soft Output Viterbi Algorithm (SOVA) which takes into account the apriori probabilities of the input symbols producing a soft output indicating the reliability of the decision.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] I. Dokmanic, R. Parhizkar, J. Ranieri and M. Vetterli, “Euclidean Distance Matrices: Essential theory, algorithms, and applications,” in IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 12-30, Nov. 2015, doi: 10.1109/MSP.2015.2398954.↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing