Hidden Markov Models (HMM) – Simplified !!!

Markov chains are useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This bring us to the following topic– the hidden Markov models.

Hidden Markov models enables us to visualize both observations and the associated hidden events. Let’s consider an example for understanding the concept.

The cheating casino and the gullible gambler

Consider a dishonest casino that deceives it player by using two types of die : a fair die (F) and a loaded die (L). For a fair die, each of the faces has the same probability of landing facing up. For the loaded die, the probabilities of the faces are skewed as given next

When the gambler throws the die, numbers land facing up. These are our observations at a given time t (denoted as Ot = {1,2,3,4,5,6}). At any given time t, whether these number are rolled from a fair die (state St = F) or a loaded die (St = L), is unknown to an observer and therefore they are the hidden events.

Emission probabilities

The probabilities associated with the observations are the observation likelihoods, also called emission probabilities (B).

Emission probabilities

Initial probabilities

The initial probability of starting (at time = 0) with any of fair die or loaded die (hidden event) is 50%.

Transition probabilities

A gullible gambler switches from the fair die to loaded die with 10% probability. He switches back from loaded die to fair die with 5% probability.

The probabilities of transitioning from one hidden event to another is described by the transition probability matrix (A). The elements of the probability transition matrix, are the transition probabilities (pij) of moving from one hidden state i to another hidden state j.

The transition probabilities from time t-1 to t, for the hidden events are

Therefore, the transition probability matrix is

Based on the given information so far, a probability model is constructed. This is the Hidden Markov Model (HMM) for the given problem.

Figure 1: Hidden Markov Model for the cheating Casino problem

Assumptions

We saw, in previous article, that the Markov models come with assumptions. Similarly, HMMs models also have such assumptions.

1. Assumption on probability of hidden states

In the model given here, the probability of a given hidden state depends only on the previous hidden state. This is a typical first order Markov chain assumption.

2. Assumption on Output

The probability of any observation (output) depends on the hidden state that produce it and not on any other hidden state or output observations.

Problems and Algorithms

Let’s briefly discuss the different problems and the related algorithms for HMMs. The algorithms will be explained in detail in the future articles.

In the dishonest casino, the gambler rolls the following numbers:

Figure 2: Sample Observations

1. Evaluation

Given the model of the dishonest casino, what is the probability of obtaining the above sequence ? This is a typical evaluation problem in HMMs. Forward algorithm is applied for such evaluation problems.

2. Decoding

What is the most likely sequence of die (hidden states) given the above sequence ? Such problems are addressed by Viterbi decoding.

What is the probability of fourth die being loaded, given the above sequence ? Forward-backward algorithm to our rescue.

3. Learning

Learning problems involve parametrization of the model. In learning problems, we attempt to find the various parameters (transition probabilities, emission probabilities) of the HMM, given the observation. Baum-Welch algorithm helps us to find the unknown parameters of a HMM.

Some real-life examples

Here are some real-life examples of HMM applications:

  1. Speech recognition: HMMs are widely used in speech recognition systems to model the variability of speech sounds. In this application, the observable events are the acoustic features of the speech signal, while the hidden states represent the phonemes or words that generate the speech signal.
  2. Handwriting recognition: HMMs can be used to recognize handwritten characters by modeling the temporal variability of the pen strokes. In this application, the observable events are the coordinates of the pen on the writing surface, while the hidden states represent the letters or symbols that generate the handwriting.
  3. Stock price prediction: HMMs can be used to model the behavior of stock prices and predict future price movements. In this application, the observable events are the daily price movements, while the hidden states represent the different market conditions that generate the price movements.
  4. Gene prediction: HMMs can be used to identify genes in DNA sequences. In this application, the observable events are the nucleotides in the DNA sequence, while the hidden states represent the different regions of the genome that generate the sequence.
  5. Natural language processing: HMMs are used in many natural language processing tasks, such as part-of-speech tagging and named entity recognition. In these applications, the observable events are the words in the text, while the hidden states represent the grammatical structures or semantic categories that generate the text.
  6. Image and video analysis: HMMs can be used to analyze images and videos, such as for object recognition and tracking. In this application, the observable events are the pixel values in the image or video, while the hidden states represent the object or motion that generates the pixel values.
  7. Bio-signal analysis: HMMs can be used to analyze physiological signals, such as electroencephalograms (EEGs) and electrocardiograms (ECGs). In this application, the observable events are the signal measurements, while the hidden states represent the physiological states that generate the signal.
  8. Radar signal processing: HMMs can be used to process radar signals and detect targets in noisy environments. In this application, the observable events are the radar measurements, while the hidden states represent the presence or absence of targets.

Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Similar topics

Essentials of Signal Processing
● Generating standard test signals
 □ Sinusoidal signals
 □ Square wave
 □ Rectangular pulse
 □ Gaussian pulse
 □ Chirp signal
Interpreting FFT results - complex DFT, frequency bins and FFTShift
 □ Real and complex DFT
 □ Fast Fourier Transform (FFT)
 □ Interpreting the FFT results
 □ FFTShift
 □ IFFTShift
Obtaining magnitude and phase information from FFT
 □ Discrete-time domain representation
 □ Representing the signal in frequency domain using FFT
 □ Reconstructing the time domain signal from the frequency domain samples
● Power spectral density
Power and energy of a signal
 □ Energy of a signal
 □ Power of a signal
 □ Classification of signals
 □ Computation of power of a signal - simulation and verification
Polynomials, convolution and Toeplitz matrices
 □ Polynomial functions
 □ Representing single variable polynomial functions
 □ Multiplication of polynomials and linear convolution
 □ Toeplitz matrix and convolution
Methods to compute convolution
 □ Method 1: Brute-force method
 □ Method 2: Using Toeplitz matrix
 □ Method 3: Using FFT to compute convolution
 □ Miscellaneous methods
Analytic signal and its applications
 □ Analytic signal and Fourier transform
 □ Extracting instantaneous amplitude, phase, frequency
 □ Phase demodulation using Hilbert transform
Choosing a filter : FIR or IIR : understanding the design perspective
 □ Design specification
 □ General considerations in design

 

Markov Chains – Simplified !!

Key focus: Markov chains are a probabilistic models that describe a sequence of observations whose occurrence are statistically dependent only on the previous ones.

● Time-series data like speech, stock price movements.
● Words in a sentence.
● Base pairs on the rung of a DNA ladder.

States and transitions

Assume that we want to model the behavior of a driver behind the wheel. The possible behaviors are

● accelerate (state 1)
● constant speed (state 2)
● idling (engine running slowly but the vehicle is not moving – (state 3))
● brake (state 4)

Let’s refer each of these behaviors as a state. In the given example, there are N=4 states, refer them as Q = {q1,q2,q3,q4}.

We observe the following pattern in the driver’s behavior (Figure 1). That is, the driver operates the vehicle through a certain sequence of states. In the graph shown in Figure 1, the states are represented as nodes and the transitions as edges.

Figure 1: Driver’s behavior – operating the vehicle through a sequence of states

We see that, sometimes, the driver changes the state of the vehicle from one state to another and sometimes stays in the same state (as indicated by the arrows).

We also note that either the vehicle stays in the same state or changes to the next state. Therefore, from this model, if we want to predict the future state, all that matters is the current state of the vehicle. The past states has no bearing on the future state except through the current state. Take note of this important assumption for now.

Probabilistic model

We know that we cannot be certain about the driver’s behavior at any given point in time. Therefore, to model this uncertainty, the model is turned into a probabilistic model. A probabilistic model allows us to account for the likelihood of the behaviors or change of states.

An example for a probabilistic model for the given problem is given in Figure 2.

Figure 2: Driver’s behavior – a probabilistic model (transition matrix shown)

In this probabilistic model, we have assigned probability values to the transitions.These probabilities are collectively called transition probabilities. For example, considering the state named “idling”, the probability of the car to transition from this state to the next state (accelerate) is 0.8. In probability mathematics this is expressed as a conditional probability conditioned on the previous state.

p(state = “accelerate” | previous state = “idling” ) = 0.8

Usually, the transition probabilities are formulated in the form of matrix called transition probability matrix. The transition probability matrix is shown in Figure 2. In a transition matrix, denoted as A, each element aij represent the probability of transitioning from state i to state j. The elements of the transition matrix satisfy the following property.

That is, the sum of transition probabilities leaving any given state is 1.

As we know, in this example, the driver cannot start car in any state (example, it is impossible to start the car in “constant speed” state). He can only start the car from at rest (i.e, brake state). To model this uncertainty, we introduce πi – the probability that the Markov chain starts in a given state i. The set of starting probabilities for all the N states are called initial probability distribution (π = π1, π2, …, πN). In Figure 3, the starting probabilities are denoted by green arrows.

Figure 3: Markov Chain model for driver’s behavior

Markov Assumption

As noted in the definition, the Markov chain in this example, assumes that the occurrence of each event/observation is statistically dependent only on the previous one. This is a first order Markov chain (or termed as bigram language model in natural language processing application). For the states Q = {q1, …, qn}, predicting the probability of a future state depends only on the current observation, all other previous observations do not matter. In probabilistic terms, this first order Markov chain assumption is denoted as

Extending the assumption for mth order Markov chain, predicting the probability of a future observation depends only on the previous m observations. This is an m-gram model.

Given a set of n arbitrary random variables/observations Q = {q1, …, qn}, their joint probability distribution is usually computed by applying the following chain rule.

However, if the random observations in Q are of sequential in nature and follows the generic mth order Markov chain model, then the computation of joint probability gets simplified.

The Markov assumptions for first and second order of Markov models are summarized in Figure 4.Figure 4: Assumptions for 1st order and 2nd order Markov chains

Hidden Markov Model (HMM)

Markov chains are useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This brings us to the next topic of discussion – the hidden Markov models.

Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Similar topics

Essentials of Signal Processing
● Generating standard test signals
 □ Sinusoidal signals
 □ Square wave
 □ Rectangular pulse
 □ Gaussian pulse
 □ Chirp signal
Interpreting FFT results - complex DFT, frequency bins and FFTShift
 □ Real and complex DFT
 □ Fast Fourier Transform (FFT)
 □ Interpreting the FFT results
 □ FFTShift
 □ IFFTShift
Obtaining magnitude and phase information from FFT
 □ Discrete-time domain representation
 □ Representing the signal in frequency domain using FFT
 □ Reconstructing the time domain signal from the frequency domain samples
● Power spectral density
Power and energy of a signal
 □ Energy of a signal
 □ Power of a signal
 □ Classification of signals
 □ Computation of power of a signal - simulation and verification
Polynomials, convolution and Toeplitz matrices
 □ Polynomial functions
 □ Representing single variable polynomial functions
 □ Multiplication of polynomials and linear convolution
 □ Toeplitz matrix and convolution
Methods to compute convolution
 □ Method 1: Brute-force method
 □ Method 2: Using Toeplitz matrix
 □ Method 3: Using FFT to compute convolution
 □ Miscellaneous methods
Analytic signal and its applications
 □ Analytic signal and Fourier transform
 □ Extracting instantaneous amplitude, phase, frequency
 □ Phase demodulation using Hilbert transform
Choosing a filter : FIR or IIR : understanding the design perspective
 □ Design specification
 □ General considerations in design

Shannon limit on power efficiency – demystified

The Shannon power efficiency limit is the limit of a band-limited system irrespective of modulation or coding scheme. It informs us the minimum required energy per bit required at the transmitter for reliable communication. It is also called unconstrained Shannon power efficiency Limit. If we select a particular modulation scheme or an encoding scheme, we calculate the constrained Shannon limit for that scheme.

Before proceeding, I urge you to go through the fundamentals of Shannon Capacity theorem in this article.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Channel capacity and power efficiency

One of the objective of a communication system design is to reliably send information at the lowest possible power level. The system should be able to provide acceptable bit-error-rate (BER) performance at the lowest possible power level. Often, this performance is charted in terms of BER Vs. . The quantity is called power efficiency, denoted as . Power efficiency is defined as the ratio of signal energy per bit () to noise power spectral density per bit ( – required at the receiver input to achieve certain BER.

From equations (1) and (2) shown in this post, the condition for reliable transmission through a channel is given by

Re-writing in terms of spectral efficiency , the Shannon limit on power efficiency for reliable communication is given by

With this equation, we can calculate the minimum required to achieve a certain spectral efficiency. As an example, lets simulate and plot the relationship between and spectral efficiency , as given in equation (3).

k =0.1:0.001:15; EbN0=(2.ˆk-1)./k;
semilogy(10*log10(EbN0),k);
xlabel('E_b/N_o (dB)');ylabel('Spectral Efficiency (\eta)');
title('Channel Capacity & Power efficiency limit')
hold on;grid on; xlim([-2 20]);ylim([0.1 10]);
yL = get(gca,'YLim');
line([-1.59 -1.59],yL,'Color','r','LineStyle','--');

The ultimate Shannon limit

From the plot in Fig. 1, we notice that the Shannon limit on is a monotonic function of . When , the Shannon limit on is equal to . If , the limit is at . When , the Shannon limit on approaches . This value is called ultimate Shannon limit or specifically absolute Shannon power efficiency limit. This limit informs us the minimum required energy per bit required at the transmitter for reliable communication. It is one among the important measures in designing a coding scheme.

Figure 1: Shannon Power Efficiency Limit

The ultimate Shannon limit can be derived using L’Hospital’s rule as follows. The asymptotic value, , that we are seeking, is the value of as the spectral efficiency approaches .

Let and . As and the argument of the limit becomes indeterminate (), L’Hospital’s rule can be applied in this case. According to L’Hospital’s rule, if and are both zero or are both , then for any value of .

Thus, the next step boils down to finding the first derivative of and . Expressing in natural logarithm.


Let and , then by chain rule of differentiation,

Since , the first derivative of is

Using equations (8) and (9), and applying L’Hospital’s rule, the Shannon’s limit on is given by

Unconstrained and constrained Shannon limit

The absolute Shannon power efficiency limit is the limit of a band-limited system irrespective of modulation or coding scheme. This is also called unconstrained Shannon power efficiency Limit. If we select a particular modulation scheme or an encoding scheme, we calculate the constrained Shannon limit for that scheme.

Shannon power efficiency limit does not depend on error probability. Shannon limit tells us the minimum possible required for achieving an arbitrarily small probability of error as , where is the number of signaling levels for the modulation technique, for BPSK , QPSK and so on. It gives the minimum possible that satisfies the Shannon theorem. In other words, it gives the minimum possible required to achieve maximum transmission capacity ( , where, is the rate of transmission and is the channel capacity). It will not specify error probability at that limit. Nor will it give any direction on coding technique that can be used to achieve that limit. As the capacity is approached, the system complexity will increase drastically. So the aim of any system design is to achieve that limit. For example, the error probability performances of Turbo codes are very close to Shannon limit [1].

As an example, let’s evaluate the performance of a 2-PAM (Pulse Amplitude Modulation) system and determine the maximum possible coding gain that can be achieved by the most advanced coding scheme. The methodology for simulating the performance of a 2-PAM system is described in chapter 5 and 6. Using this methodology, the performance of a 2-PAM system is simulated and plotted in Figure 2. The absolute Shannon power efficiency limits when the spectral efficiency is and are also referenced on the plot.

The spectral efficiency of an ideal 2-PAM system is . Hence, if the target bit error rate is , then a coding gain of can be achieved using powerful codes, if we have to maintain the nominal spectral efficiency at .

If there is no limit on the spectral efficiency, then we can let . In this case, the absolute Shannon power efficiency limit is when . Thus a coding gain of approximately is possible with powerful codes if we let the spectral efficiency approach zero.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References

[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding and decoding: Turbocodes, IEEE Int. Conf. on Comm., (ICC ’93), 2:1064-1070, May 1993.↗

 Related topics in this chapter

Introduction
Shannon’s noisy channel coding theorem
Unconstrained capacity for bandlimited AWGN channel
● Shannon’s limit on spectral efficiency
Shannon’s limit on power efficiency
● Generic capacity equation for discrete memoryless channel (DMC)
 □ Capacity over binary symmetric channel (BSC)
 □ Capacity over binary erasure channel (BEC)
● Constrained capacity of discrete input continuous output memoryless AWGN channel
● Ergodic capacity over a fading channel

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Performance comparison of Digital Modulation techniques

Key focus: Compare Performance and spectral efficiency of bandwidth-efficient digital modulation techniques (BPSK,QPSK and QAM) on their theoretical BER over AWGN.

More detailed analysis of Shannon’s theorem and Channel capacity is available in the following book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Simulation of various digital modulation techniques are available in these books
Digital Modulations using Matlab : Build Simulation Models from Scratch, ISBN: 978-1521493885
Digital Modulations using Python ISBN: 978-1712321638

Let’s take up some bandwidth-efficient linear digital modulation techniques (BPSK,QPSK and QAM) and compare its performance based on their theoretical BER over AWGN. (Readers are encouraged to read previous article on Shannon’s theorem and channel capacity).

Table 1 summarizes the theoretical BER (given SNR per bit ration – Eb/N0) for various linear modulations. Note that the Eb/N0 values used in that table are in linear scale [to convert Eb/N0 in dB to linear scale – use Eb/N0(linear) = 10^(Eb/N0(dB)/10) ]. A small script written in Matlab (given below) gives the following output.

Figure 1: Eb/N0 Vs. BER for various digital modulations over AWGN channel
Table 1: Theoretical BER over AWGN for various linear digital modulation techniques

The following table is obtained by extracting the values of Eb/N0 to achieve BER=10-6 from Figure-1. (Table data sorted with increasing values of Eb/N0).

Table 2: Capacity of various modulations their efficiency and channel bandwidth

where,

is the bandwidth efficiency for linear modulation with M point constellation, meaning that ηB bits can be stuffed in one symbol with Rb bits/sec data rate for a given minimum bandwidth.

is the minimum bandwidth needed for information rate of Rb bits/second. If a pulse shaping technique like raised cosine pulse [with roll off factor (a)] is used then Bmin becomes

Next the data in table 2 is plotted with Eb/N0 on the x-axis and η on the y-axis (see figure 2) along with the well known Shannon’s Capacity equation over AWGN given by,

which can be represented as (refer [1])

Figure 2: Spectral efficiency vs Eb/N0 for various modulations at Pb=10-6

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Matlab Code

EbN0dB=-4:1:24;
EbN0lin=10.^(EbN0dB/10);
colors={'b-*','g-o','r-h','c-s','m-d','y-*','k-p','b-->','g:<','r-.d'};
index=1;

%BPSK
BPSK = 0.5*erfc(sqrt(EbN0lin));
plotHandle=plot(EbN0dB,log10(BPSK),char(colors(index)));
set(plotHandle,'LineWidth',1.5);
hold on;

index=index+1;

%M-PSK
m=2:1:5;
M=2.^m;
for i=M,
    k=log2(i);
    berErr = 1/k*erfc(sqrt(EbN0lin*k)*sin(pi/i));
    plotHandle=plot(EbN0dB,log10(berErr),char(colors(index)));
    set(plotHandle,'LineWidth',1.5);
    index=index+1;
end

%Binary DPSK
Pb = 0.5*exp(-EbN0lin);
plotHandle = plot(EbN0dB,log10(Pb),char(colors(index)));
set(plotHandle,'LineWidth',1.5);
index=index+1;

%Differential QPSK
a=sqrt(2*EbN0lin*(1-sqrt(1/2)));
b=sqrt(2*EbN0lin*(1+sqrt(1/2)));
Pb = marcumq(a,b,1)-1/2.*besseli(0,a.*b).*exp(-1/2*(a.^2+b.^2));
plotHandle = plot(EbN0dB,log10(Pb),char(colors(index)));
set(plotHandle,'LineWidth',1.5);
index=index+1;

%M-QAM
m=2:2:6;
M=2.^m;

for i=M,
    k=log2(i);
    berErr = 2/k*(1-1/sqrt(i))*erfc(sqrt(3*EbN0lin*k/(2*(i-1))));
    plotHandle=plot(EbN0dB,log10(berErr),char(colors(index)));
    set(plotHandle,'LineWidth',1.5);
    index=index+1;
end

legend('BPSK','QPSK','8-PSK','16-PSK','32-PSK','D-BPSK','D-QPSK','4-QAM','16-QAM','64-QAM');
axis([-4 24 -8 0]);
set(gca,'XTick',-4:2:24); %re-name axis accordingly
ylabel('Probability of BER Error - log10(Pb)');
xlabel('Eb/N0 (dB)');
title('Probability of BER Error log10(Pb) Vs Eb/N0');
grid on;

Reference

[1] “Digital Communications” by John G.Proakis ,Chapter 7: Channel Capacity and Coding.↗

Related topics

Digital Modulators and Demodulators - Complex Baseband Equivalent Models
Introduction
Complex baseband representation of modulated signal
Complex baseband representation of channel response
● Modulators for amplitude and phase modulations
 □ Pulse Amplitude Modulation (M-PAM)
 □ Phase Shift Keying Modulation (M-PSK)
 □ Quadrature Amplitude Modulation (M-QAM)
● Demodulators for amplitude and phase modulations
 □ M-PAM detection
 □ M-PSK detection
 □ M-QAM detection
 □ Optimum detector on IQ plane using minimum Euclidean distance
● M-ary FSK modulation and detection
 □ Modulator for M orthogonal signals
 □ M-FSK detection

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Shannon theorem – demystified

Shannon theorem dictates the maximum data rate at which the information can be transmitted over a noisy band-limited channel. The maximum data rate is designated as channel capacity. The concept of channel capacity is discussed first, followed by an in-depth treatment of Shannon’s capacity for various channels.

Introduction

The main goal of a communication system design is to satisfy one or more of the following objectives.

● The transmitted signal should occupy smallest bandwidth in the allocated spectrum – measured in terms of bandwidth efficiency also called as spectral efficiency – \(\eta_B\).
● The designed system should be able to reliably send information at the lowest practical power level. This is measured in terms of power efficiency – \(\eta_P\).
● Ability to transfer data at higher rates – \(R\) bits=second.
● The designed system should be robust to multipath effects and fading.
● The system should guard against interference from other sources operating in the same frequency – low carrier-to-cochannel signal interference ratio (CCI).
● Low adjacent channel interference from near by channels – measured in terms of adjacent channel Power ratio (ACPR).
● Easier to implement and lower operational costs.

Chapter 2 in my book ‘Wireless Communication systems in Matlab’, is intended to describe the effect of first three objectives when designing a communication system for a given channel. A great deal of information about these three factors can be obtained from Shannon’s noisy channel coding theorem.

Shannon’s noisy channel coding theorem

For any communication over a wireless link, one must ask the following fundamental question: What is the optimal performance achievable for a given channel ?. The performance over a communication link is measured in terms of capacity, which is defined as the maximum rate at which the information can be transmitted over the channel with arbitrarily small amount of error.

It was widely believed that the only way for reliable communication over a noisy channel is to reduce the error probability as small as possible, which in turn is achieved by reducing the data rate. This belief was changed in 1948 with the advent of Information theory by Claude E. Shannon. Shannon showed that it is in fact possible to communicate at a positive rate and at the same time maintain a low error probability as desired. However, the rate is limited by a maximum rate called the channel capacity. If one attempts to send data at rates above the channel capacity, it will be impossible to recover it from errors. This is called Shannon’s noisy channel coding theorem and it can be summarized as follows:

● A given communication system has a maximum rate of information – C, known as the channel capacity.
● If the transmission information rate R is less than C, then the data transmission in the presence of noise can be made to happen with arbitrarily small error probabilities by using intelligent coding techniques.
● To get lower error probabilities, the encoder has to work on longer blocks of signal data. This entails longer delays and higher computational requirements.

The theorem indicates that with sufficiently advanced coding techniques, transmission that nears the maximum channel capacity – is possible with arbitrarily small errors. One can intuitively reason that, for a given communication system, as the information rate increases, the number of errors per second will also increase.

Shannon’s noisy channel coding theorem is a generic framework that can be applied to specific scenarios of communication. For example, communication through a band-limited channel in presence of noise is a basic scenario one wishes to study. Therefore, study of information capacity over an AWGN (additive white gaussian noise) channel provides vital insights, to the study of capacity of other types of wireless links, like fading channels.

Unconstrained capacity for band-limited AWGN channel

Real world channels are essentially continuous in both time as well as in signal space. Real physical channels have two fundamental limitations : they have limited bandwidth and the power/energy of the input signal to such channels is also limited. Therefore, the application of information theory on such continuous channels should take these physical limitations into account. This will enable us to exploit such continuous channels for transmission of discrete information.

In this section, the focus is on a band-limited real AWGN channel, where the channel input and output are real and continuous in time. The capacity of a continuous AWGN channel that is bandwidth limited to \(B\) Hz and average received power constrained to \(P\) Watts, is given by

\[C_{awgn} \left( P,B\right) = B\; log_2 \left( 1 + \frac{P}{N_0 B}\right) \quad bits/s \quad\quad (1)\]

Here, \(N_0/2\) is the power spectral density of the additive white Gaussian noise and \(P\) is the average power given by

\[P = E_b R \quad \quad (2) \]

where \(E_b\) is the average signal energy per information bit and \(R\) is the data transmission rate in bits-per-second. The ratio \(P/(N_0B)\) is the signal to noise ratio (SNR) per degree of freedom. Hence, the equation can be re-written as

\[C_{awgn} \left( P,B\right) = B\; log_2 \left( 1 + SNR \right) \quad bits/s \quad\quad (3)\]

Here, \(C\) is the maximum capacity of the channel in bits/second. It is also called Shannon’s capacity limit for the given channel. It is the fundamental maximum transmission capacity that can be achieved using the basic resources available in the channel, without going into details of coding scheme or modulation. It is the best performance limit that we hope to achieve for that channel. The above expression for the channel capacity makes intuitive sense:

● Bandwidth limits how fast the information symbols can be sent over the given channel.
● The SNR ratio limits how much information we can squeeze in each transmitted symbols. Increasing SNR makes the transmitted symbols more robust against noise. SNR represents the signal quality at the receiver front end and it depends on input signal power and the noise characteristics of the channel.
● To increase the information rate, the signal-to-noise ratio and the allocated bandwidth have to be traded against each other.
● For a channel without noise, the signal to noise ratio becomes infinite and so an infinite information rate is possible at a very small bandwidth.
● We may trade off bandwidth for SNR. However, as the bandwidth B tends to infinity, the channel capacity does not become infinite – since with an increase in bandwidth, the noise power also increases.

The Shannon’s equation relies on two important concepts:
● That, in principle, a trade-off between SNR and bandwidth is possible
● That, the information capacity depends on both SNR and bandwidth

It is worth to mention two important works by eminent scientists prior to Shannon’s paper [1]. Edward Amstrong’s earlier work on Frequency Modulation (FM) is an excellent proof for showing that SNR and bandwidth can be traded off against each other. He demonstrated in 1936, that it was possible to increase the SNR of a communication system by using FM at the expense of allocating more bandwidth [2]

In 1903, W.M Miner in his patent (U. S. Patent 745,734 [3]), introduced the concept of increasing the capacity of transmission lines by using sampling and time division multiplexing techniques. In 1937, A.H Reeves in his French patent (French Patent 852,183, U.S Patent 2,272,070 [4]) extended the system by incorporating a quantizer, there by paving the way for the well-known technique of Pulse Coded Modulation (PCM). He realized that he would require more bandwidth than the traditional transmission methods and used additional repeaters at suitable intervals to combat the transmission noise. With the goal of minimizing the quantization noise, he used a quantizer with a large number of quantization levels. Reeves patent relies on two important facts:

● One can represent an analog signal (like speech) with arbitrary accuracy, by using sufficient frequency sampling, and quantizing each sample in to one of the sufficiently large pre-determined amplitude levels
● If the SNR is sufficiently large, then the quantized samples can be transmitted with arbitrarily small errors

It is implicit from Reeve’s patent – that an infinite amount of information can be transmitted on a noise free channel of arbitrarily small bandwidth. This links the information rate with SNR and bandwidth.

Please refer [1] and [5]  for the actual proof by Shannon. A much simpler version of proof (I would rather call it an illustration) can be found at [6].

Figure 1: Shannon Power Efficiency Limit

Continue reading on Shannon’s limit on power efficiency…

References :

[1] C. E. Shannon, “A Mathematical Theory of Communication”, Bell Syst. Techn. J., Vol. 27, pp.379-423, 623-656, July, October, 1948.↗
[2] E. H. Armstrong:, “A Method of Reducing Disturbances in Radio Signaling by a System of Frequency-Modulation”, Proc. IRE, 24, pp. 689-740, May, 1936.↗
[3] Willard M Miner, “Multiplex telephony”, US Patent, 745734, December 1903.↗
[4] A.H Reeves, “Electric Signaling System”, US Patent 2272070, Feb 1942.↗
[5] Shannon, C.E., “Communications in the Presence of Noise”, Proc. IRE, Volume 37 no1, January 1949, pp 10-21.↗
[6] The Scott’s Guide to Electronics, “Information and Measurement”, University of Andrews – School of Physics and Astronomy.↗

Related topics in this chapter

Introduction
Shannon’s noisy channel coding theorem
Unconstrained capacity for bandlimited AWGN channel
● Shannon’s limit on spectral efficiency
Shannon’s limit on power efficiency
● Generic capacity equation for discrete memoryless channel (DMC)
 □ Capacity over binary symmetric channel (BSC)
 □ Capacity over binary erasure channel (BEC)
● Constrained capacity of discrete input continuous output memoryless AWGN channel
● Ergodic capacity over a fading channel

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing