Chi square distribution – demystified

Note: There is a rating embedded within this post, please visit this post to rate it.

A random variable is always associated with a probability distribution. When the random variable undergoes mathematical transformation the underlying probability distribution no longer remains the same. Consider a random variable whose probability distribution function (PDF) is a standard normal distribution ( and ). Now, if the random variable is squared (a mathematical transformation), then the PDF of is no longer a standard normal distribution. The new transformed distribution is called Chi square Distribution with degree of freedom. The PDF of and are plotted in Figure 1.

Transformation of Normal Distribution to Chi Square Distribution
Figure 1: Transformation of Normal Distribution to Chi Square Distribution

The mean of the random variable is and for the transformed variable Z2, the mean is given by . Similarly, the variance of the random variable is , whereas the variance of the transformed random variable is . In addition to the mean and variance, the shape of the distribution is also changed. The distribution of the transformed variable is no longer symmetric. In fact, the distribution is skewed to one side. Also the random variable can take only positive values whereas the random variable can take negative values too (note the x-axis in the plots above).

Since the new transformation is based on only one parameter (), the degree of freedom for this transformation is . Therefore, the transformed random variable follows – “Chi-square distribution with degree of freedom”.
Suppose, if are independent random variables that follows standard normal distribution( and ), then the transformation,

is a Chi square distribution with k degrees of freedom. The following figure illustrates how the definition of the Chi square distribution as a transformation of normal distribution for degree of freedom and degrees of freedom. In the same manner, the transformation can be extended to degrees of freedom.

Figure 2: Illustration of Chi-square Distribution with 2 degrees of freedom

The above equation is derived from random variables that follow standard normal distribution. For a standard normal distribution, the mean . Therefore, the transformation is called central Chi-square distribution. If, the underlying random variables follow normal distribution with non-zero mean, then the transformation is called non-central Chi-square distribution [2] . In channel modeling, the central Chi-squared distribution is related to Rayleigh Fading scenario and the non-central Chi-square distribution is related to Rician Fading scenario.

Mathematically, the PDF of the central Chi-squared distribution with degrees of freedom is given by

The mean and variance of the central Chi-squared distributed random variable is given by

Relation to Rayleigh distribution

The connection between Chi square distribution and the Rayleigh distribution can be established as follows

  1. If a random variable has standard Rayleigh distribution, then the transformation follows chi-square distribution with degrees of freedom.
  2. If a random variable has the chi-square distribution with degrees of freedom, then the transformation has standard Rayleigh distribution.


Chi-square distribution is used in hypothesis testing (to compare the observed data with expected data that follows a specific hypothesis) and in estimating variances of a parameter.

Matlab Simulation:

Check this book for full Matlab code.
Wireless Communication Systems using Matlab – by Mathuranathan Viswanathan

Figure 3: Simulated output – central Chi square Distribution with k degrees of freedom

Python Code

Python numpy package has a chisquare() generator, which can be used in a straightforward manner to obtain the Chi square distributed sequences.

#---------Chi square distribution
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline'ggplot')

ks=np.arange(start=1,stop=6,step=1) #degrees of freedoms to simulate
nSamp=1000000 #number of samples to generate

fig, ax = plt.subplots(ncols=1, nrows=1, constrained_layout=True)

for i,k in enumerate(ks):
    #Generate central Chi-square distributed random numbers
    X = np.random.chisquare(df=k, size = nSamp)
    ax.hist(X,bins=500,density=True,label=r'$k$={}'.format(k), \
    histtype='step',alpha=0.75, linewidth=3)

ax.set_title('PDFs of Chi square distribution');

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

For further reading

[1] Ernie Croot, “Notes on Chi-squared distribution”, Georgia institute of technology, School of mathematics, Oct 2005.↗

Similar topics

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Uniform random variable

Note: There is a rating embedded within this post, please visit this post to rate it.

Uniform random variables are used to model scenarios where the expected outcomes are equi-probable. For example, in a communication system design, the set of all possible source symbols are considered equally probable and therefore modeled as a uniform random variable.

The uniform distribution is the underlying distribution for an uniform random variable. A continuous uniform random variable, denoted as , take continuous values within a given interval , with equal probability. Therefore, the PDF of such a random variable is a constant over the given interval is.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

$$  f_X(x) = \begin{cases}\frac{1}{b-a} & \text{when } a < x < b\\0 & \text{otherwise} \end{cases} $$

In Matlab, rand function generates continuous uniform random numbers in the interval . The rand function picks a random number in the interval in which the probability of occurrence of all the numbers in the interval are equally likely. The command rand(n,m) will generate a matrix of size . To generate a random number in the interval one can use the following expression.

a + (b-a)*rand(n,m); %Here nxm is the size of the output matrix

To test whether the numbers generated by the continuous uniform distribution are uniform in the interval , one has to generate very large number of values using the rand function and then plot the histogram. The Figure 1 shows that the simulated PDF and theoretical PDF are in agreement with each other.

a=2;b=10; %open interval (2,10)
X=a+(b-a)*rand(1,1000000);%simulate uniform RV
[p,edges]=histcounts(X,'Normalization','pdf');%estimated PDF
outcomes = 0.5*(edges(1:end-1) + edges(2:end));%possible outcomes
g=1/(b-a)*ones(1,length(outcomes)); %Theoretical PDF
bar(outcomes,p);hold on;plot(outcomes,g,'r-');
title('Probability Density Function');legend('simulated','theory');
xlabel('Possible outcomes');ylabel('Probability of outcomes');
Figure 1: Continuous uniform random variable : histogram and theoretical PDF

On the other hand, a discrete random variable generates discrete values that are equally probable. The underlying discrete uniform distribution is denoted as , where , is a finite set of discrete elements that are equally probable as described by the probability mass function (PMF)

$$ f_X(x)= \begin{cases}\frac{1}{n} & \text{where } x \in {s_1,s_2,…,s_n } \\ 0 & otherwise \end{cases} $$

There exist several methods to generate discrete uniform random numbers and two of them are discussed here. The straightforward method is to use randi function in Matlab that can generate discrete uniform numbers in the integer set . The second method is to use rand function and ceil the result to discrete values. For example, the command to generate uniformly distributed discrete numbers from the set is


The uniformity test for discrete uniform random numbers can be performed and it is very similar to the code shown for the continuous uniform random variable case. The only difference here is the normalization term. The histogram values should not be normalized by the total area under the histogram curve as in the case of continuous random variables. Rather, the histogram should be normalized by the total number of occurrences in all the bins. We cannot normalized based on the area under the curve, since the bin values are not dense enough (bins are far from each other) for proper calculation of total area. The code snippet is given next. The resulting plot (Figure 2) shows a good match between the simulated and theoretical PMFs.

X=randi(6,100000,1); %Simulate throws of dice,S={1,2,3,4,5,6}
[pmf,edges]=histcounts(X,'Normalization','pdf');%estimated PMF
outcomes = 0.5*(edges(1:end-1) + edges(2:end));%S={1,2,3,4,5,6}
g=1/6*ones(1,6); %Theoretical PMF
bar(outcomes,pmf);hold on;stem(outcomes,g,'r-');
title('Probability Mass Function');legend('simulated','theory');
xlabel('Possible outcomes');ylabel('Probability of outcomes');
Note: There is a rating embedded within this post, please visit this post to rate it.

Topics in this chapter

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Derive BPSK BER – optimum receiver in AWGN channel

Key focus: Derive BPSK BER (bit error rate) for optimum receiver in AWGN channel. Explained intuitively step by step.

BPSK modulation is the simplest of all the M-PSK techniques. An insight into the derivation of error rate performance of an optimum BPSK receiver is essential as it serves as a stepping stone to understand the derivation for other comparatively complex techniques like QPSK,8-PSK etc..

Understanding the concept of Q function and error function is a pre-requisite for this section of article.

The ideal constellation diagram of a BPSK transmission (Figure 1) contains two constellation points located equidistant from the origin. Each constellation point is located at a distance from the origin, where Es is the BPSK symbol energy. Since the number of bits in a BPSK symbol is always one, the notations – symbol energy (Es) and bit energy (Eb) can be used interchangeably (Es=Eb).

Assume that the BPSK symbols are transmitted through an AWGN channel characterized by variance = N0/2 Watts. When 0 is transmitted, the received symbol is represented by a Gaussian random variable ‘r‘ with mean=S0 = and variance =N0/2. When 1 is transmitted, the received symbol is represented by a Gaussian random variable – r with mean=S1= and variance =N0/2. Hence the conditional density function of the BPSK symbol (Figure 2) is given by,

Figure 1: BPSK – ideal constellation
Figure 2: Probability density function (PDF) for BPSK Symbols

 An optimum receiver for BPSK can be implemented using a correlation receiver or a matched filter receiver (Figure 3). Both these forms of implementations contain a decision making block that decides upon the bit/symbol that was transmitted based on the observed bits/symbols at its input.

Figure 3: Optimum Receiver for BPSK

When the BPSK symbols are transmitted over an AWGN channel, the symbols appears smeared/distorted in the constellation depending on the SNR condition of the channel. A matched filter or that was previously used to construct the BPSK symbols at the transmitter. This process of projection is illustrated in Figure 4. Since the assumed channel is of Gaussian nature, the continuous density function of the projected bits will follow a Gaussian distribution. This is illustrated in Figure 5.

Figure 4: Role of correlation/Matched Filter

After the signal points are projected on the basis function axis, a decision maker/comparator acts on those projected bits and decides on the fate of those bits based on the threshold set. For a BPSK receiver, if the a-prior probabilities of transmitted 0’s and 1’s are equal (P=0.5), then the decision boundary or threshold will pass through the origin. If the apriori probabilities are not equal, then the optimum threshold boundary will shift away from the origin.

Figure 5: Distribution of received symbols

Considering a binary symmetric channel, where the apriori probabilities of 0’s and 1’s are equal, the decision threshold can be conveniently set to T=0. The comparator, decides whether the projected symbols are falling in region A or region B (see Figure 4). If the symbols fall in region A, then it will decide that 1 was transmitted. It they fall in region B, the decision will be in favor of ‘0’.

For deriving the performance of the receiver, the decision process made by the comparator is applied to the underlying distribution model (Figure 5). The symbols projected on the axis will follow a Gaussian distribution. The threshold for decision is set to T=0. A received bit is in error, if the transmitted bit is ‘0’ & the decision output is ‘1’ and if the transmitted bit is ‘1’ & the decision output is ‘0’.

This is expressed in terms of probability of error as,

Or equivalently,

By applying Bayes Theorem↗, the above equation is expressed in terms of conditional probabilities as given below,

Since a-prior probabilities are equal P(0T)= P(1T) =0.5, the equation can be re-written as

Intuitively, the integrals represent the area of shaded curves as shown in Figure 6. From the previous article, we know that the area of the shaded region is given by Q function.

Figure 6a, 6b: Calculating Error Probability


From (4), (6), (7) and (8),

For BPSK, since Es=Eb, the probability of symbol error (Ps) and the probability of bit error (Pb) are same. Therefore, expressing the Ps and Pb in terms of Q function and also in terms of complementary error function :

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.


[1] Nguyen & Shwedyk, “A First course in Digital Communications”, Cambridge University Press, 1st edition.↗

Books by author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Q function and Error functions : demystified

In simple words, The Q-function gives the probability that a random variable from a normal distribution will exceed a certain threshold value. The erf function gives the probability that a normally distributed variable will fall within a certain range.

Q function

Q functions are often encountered in the theoretical equations for Bit Error Rate (BER) involving AWGN channel. A brief discussion on Q function and its relation to erfc function is given here.

Gaussian process is the underlying model for an AWGN channel.The probability density function of a Gaussian Distribution is given by

\[p(x) = \displaystyle{ \frac{1}{ \sigma \sqrt{2 \pi}} e^{ – \frac{(x-\mu)^2}{2 \sigma^2}}}\quad\quad (1) \]

Generally, in BER derivations, the probability that a Gaussian Random Variable \(X \sim N ( \mu, \sigma^2) \) exceeds \(x_0\) is evaluated as the area of the shaded region as shown in Figure 1.

Figure 1: Gaussian PDF and illustration of Q function

Mathematically, the area of the shaded region is evaluated as,

\[Pr(X \geq x_0) =\displaystyle{ \int_{x_0}^{\infty} p(x) dx = \int_{x_0}^{\infty} \frac{1}{ \sigma \sqrt{2 \pi}} e^{ – \frac{(x-\mu)^2}{2 \sigma^2}} dx } \quad\quad (2) \]

The above probability density function given inside the above integral cannot be integrated in closed form. So by change of variables method, we substitute

\[\displaystyle{ y = \frac{x-\mu}{\sigma} }\]

Then equation (2) can be re-written as,

\[\displaystyle{ Pr\left( y > \frac{x_0-\mu}{\sigma} \right ) = \int_{ \left( \frac{x_{0} -\mu}{\sigma}\right)}^{\infty} \frac{1}{ \sqrt{2 \pi}} e^{- \frac{y^2}{2}} dy } \quad\quad (3) \]

Here the function inside the integral is a normalized gaussian probability density function \(Y \sim N( 0, 1)\), normalized to mean \(\mu=0\) and standard deviation \(\sigma=1\).

The integral on the right side can be termed as Q-function, which is given by,

\[\displaystyle{Q(z) = \int_{z}^{\infty}\frac{1}{ \sqrt{2 \pi}} e^{- \frac{y^2}{2}} dy } \quad\quad (4)\]

Here the Q function is related as,

\[\displaystyle{ Pr\left( y > \frac{x_0-\mu}{\sigma} \right ) = Q\left(\frac{x_0-\mu}{\sigma} \right ) = Q(z)} \quad\quad (5)\]

Thus Q function gives the area of the shaded curve with the transformation \(y = \frac{x-\mu}{\sigma}\) applied to the Gaussian probability density function. Essentially, Q function evaluates the tail probability of normal distribution (area of shaded area in the above figure).

The Q-function gives the probability that a random variable from a normal distribution will exceed a certain threshold value.

Error function

The complementary error function represents the area under the two tails of zero mean Gaussian probability density function of variance \(\sigma^2 = 1/2\). The error function gives the probability that the parameter lies outside that range.

Therefore, the complementary error function is given by

\[\displaystyle{ erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-x^2}} dx \quad\quad (6)\]

Hence, the error function is

\[erf(z) = 1 – erfc(z) \quad\quad (7)\]

or equivalently,

\[\displaystyle{ erf(z) = \frac{2}{\sqrt{\pi}} \int_{0}^{z} e^{-x^2} dx } \quad\quad (8) \]

The erf function gives the probability that a normally distributed variable will fall within a certain range.

Q function and Complementary Error Function (erfc)

From the limits of the integrals in equation (4) and (6) one can conclude that Q function is directly related to complementary error function (erfc). It follows from equation (4) and (6), Q function is related to complementary error function by the following relation.

\[\displaystyle{ Q(z) = \frac{1}{2} erfc \left( \frac{z}{\sqrt{2}}\right)} \quad\quad (9) \]

Some important results

Keep a note of the following equations that can come handy when deriving probability of bit errors for various scenarios. These equations are compiled here for easy reference.

If we have a normal variable \(X \sim N (\mu, \sigma^2)\), the probability that \(X > x\) is

\[\displaystyle{ Pr \left( X > x \right) = Q \left( \frac{x-\mu}{\sigma} \right ) } \quad\quad (10) \]

If we want to know the probability that \(X\) is away from the mean by an amount ‘a’ (on the left or right side of the mean), then

\[\displaystyle{ Pr \left( X > \mu+a \right) = Pr \left( X < \mu-a \right) = Q\left(\frac{a}{\sigma} \right ) } \quad\quad (11) \]

If we want to know the probability that X is away from the mean by an amount ‘a’ (on both sides of the mean), then

\[\displaystyle{ Pr \left( \mu-a > X > \mu+a \right) = 2 Q\left(\frac{a}{\sigma} \right ) } \quad\quad (12)\]

Application of Q function in computing the Bit Error Rate (BER) or probability of bit error will be the focus of our next article.


The Q-function and the error function (erf) are important mathematical functions that arise in many fields, including probability theory, statistics, signal processing, and communications engineering. Here are some reasons why these functions are important:

  1. Probability calculations: The Q-function and erf function are used in probability calculations involving Gaussian distributions. The Q-function gives the probability that a random variable from a normal distribution will exceed a certain threshold value. The erf function gives the probability that a normally distributed variable will fall within a certain range.
  2. Signal processing: In signal processing, the Q-function is used to calculate the probability of bit error in digital communication systems. This is important for designing communication systems that can reliably transmit data over noisy channels.
  3. Statistical analysis: The Q-function and erf function are used in statistical analysis to model data and estimate parameters. For example, in hypothesis testing, the Q-function can be used to calculate p-values.
  4. Mathematical modeling: The Q-function and erf function arise naturally in mathematical models for various phenomena. For example, the heat equation in physics and the Black-Scholes equation in finance both involve the erf function.
  5. Computational efficiency: In some cases, the Q-function and erf function provide a more efficient and accurate way of calculating certain probabilities and integrals than other methods.

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Simulation of Rayleigh Fading ( Clarke’s Model – sum of sinusoids method)

Note: There is a rating embedded within this post, please visit this post to rate it.

A multipath fading channel  can be modeled as a FIR (Finite Impulse Response) filter with the following impulse response.

$$ h( \tau ; t ) = h_{0}(t) \delta ( \tau – \tau_{0}(t)) + h_{1}(t) \delta ( \tau – \tau_{1}(t)) + . . . + h_{L-1}(t) \delta ( \tau – \tau_{L-1}(t)) $$

where h(τ,t) is the time varying impulse response of the multipath fading channel having L multipaths and hi(t) and τi(t) denote the time varying complex gain and excess delay of the i-th path. The above mentioned impulse response can be implemented as a FIR filter as shown below :

Multipath Fading phenomena – modelled as a Time Varying FIR Filter

The channel under consideration can be modeled as a multipath fading channel in which the impulse response may follow distributions like Rayleigh distribution ( in which there is no Line of Sight (LOS) ray between transmitter and receiver) or as Rician distribution ( dominant LOS path exist between transmitter and receiver), Nagami distribution, Weibull distribution etc.

Different methods of simulation techniques were proposed to simulate/model multipath channels. Some of the models include clarke’s reference model, Jake’s model, Young’s model , filtered gaussian noise model etc.

A Rayleigh fading channel (flat fading channel) is considered in this text.For simplicity we fix the excess delays τi(t) in the above equation and we generate hi(t) that follows Rayleigh distribution. In this simulation Clarke’s Rayleigh fading model is used. This model is also called mathematical reference model and is commonly considered as a computationally inefficient model compared to Jake’s Rayleigh Fading simulator.

Theory of Rayleigh Fading:

Lets denote the complex impulse response h(t) of the flat fading channel as follows :

$$ h(t) = h_{I}(t) + jh_{Q}(t) $$

where hI(t) and hQ(t) are zero mean gaussian distributed. Therefore the fading envelope is Rayleigh distributed and is given by

$$ \left |h(t) \right | = \sqrt{\left |h_{I}(t) \right |^2 + \left |h_{Q}(t) \right |^2} $$

The probability density function (Rayleigh distribution) of the above mentioned amplitude response is given by

$$ f(z)=\frac{2z}{\sigma ^{2}}e^{-\frac{z^{2}}{\sigma ^{2}}} \\ where \; \sigma ^{2} = E\left ( \left | h(t) \right |^{2} \right ) $$

We will use the Clarke’s Rayleigh Fading model (given below) and check the statistical properties of the random process generated by the model against the statistical properties of Rayleigh distribution (given above).

Clarke’s Rayleigh Fading model:

The random process of flat Rayleigh fading with M multipaths can be simulated with the sum-of-sinusoid method described as


1) The rayleigh fading model is implemented as a function in matlab with following parameters:
M=number of multipaths in the fading channel, N = number of samples to generate, fd=maximum Doppler spread in Hz, Ts = sampling period.

function [h]=rayleighFading(M,N,fd,Ts)

% function to generate rayleigh Fading samples based on Clarke's model
% M = number of multipaths in the channel
% N = number of samples to generate
% fd = maximum Doppler frequency
% Ts = sampling period
% Author : Mathuranathan for
%Code available in the ebook - Simulation of Digital Communication Systems using Matlab

Check this book for full Matlab code.
Simulation of Digital Communication Systems Using Matlab – by Mathuranathan Viswanathan

2)The above mentioned function is used to generate Rayleigh Fading samples with the following values for the function arguments. M=15; N=10^5; fd=100 Hz;Ts=0.0001 second;

Investigation of Statistical Properties of samples generated using Clarke’s model:

3) Mean and Variance of the real and imaginary parts of generated samples are
Mean of real part ~=0
Mean of imag part ~=0
Variance of real part = 0.4989 ~=0.5
Variance of imag part = 0.4989 ~=0.5

The results implies that the mean of the real and imaginary parts are same and are equal to zero.The variance of the real and imaginary parts are approximately equal to 0.5.

4)Next, the pdf of the real part of the simulated samples are plotted and compared against the pdf of Gaussian distribution (with mean=0 and variance =0.5)

Real Part of simulated samples exhibiting Gaussian Distribution characteristics

5)The pdf of the generated Rayleigh fading samples are plotted and compared against pdf of Rayleigh distribution (with variance=1)

PDF of simulated Rayleigh Fading Samples

6) From 4) and 5) we confirm that the samples generated by Clarke’s model follows Rayleigh distribution (with variance = 1) and the real and imaginary part of the samples follow Gaussian distribution (with mean=0 and variance =0.5).

7) The Magnitude and Phase response of the generated Rayleigh Fading samples are plotted here.

The Magnitude and Phase response of the generated Rayleigh Fading samples

See also

[1]Eb/N0 Vs BER for BPSK over Rayleigh Channel and AWGN Channel
[2]Eb/N0 Vs BER for BPSK over Rician Fading Channel
[3]Performance comparison of Digital Modulation techniques
[4]BER Vs Eb/N0 for BPSK modulation over AWGN
[5]Rayleigh Fading Simulation – Young’s model
[6]Introduction to Fading Channels
[7] Chi-Squared distribution

Recommended Books

External Resources

[1]Theoretical expressions for BER under various conditions

Fading channel – complex baseband equivalent models

Keyfocus: Fading channel models for simulation. Learn how fading channels can be modeled as FIR filters for simplified modulation & detection. Rayleigh/Rician fading.


A fading channel is a wireless communication channel in which the quality of the signal fluctuates over time due to changes in the transmission environment. These changes can be caused by different factors such as distance, obstacles, and interference, resulting in attenuation and phase shifting. The signal fluctuations can cause errors or loss of information during transmission.

Fading channels are categorized into slow fading and fast fading depending on the rate of channel variation. Slow fading occurs over long periods, while fast fading happens rapidly over short periods, typically due to multipath interference.

To overcome the negative effects of fading, various techniques are used, including diversity techniques, equalization, and channel coding.

Fading channel in frequency domain

With respect to the frequency domain characteristics, the fading channels can be classified into frequency selective and frequency-flat fading.

A frequency flat fading channel is a wireless communication channel where the attenuation and phase shift of the signal are constant across the entire frequency band. This means that the signal experiences the same amount of fading at all frequencies, and there is no frequency-dependent distortion of the signal.

In contrast, a frequency selective fading channel is a wireless communication channel where the attenuation and phase shift of the signal vary with frequency. This means that the signal experiences different levels of fading at different frequencies, resulting in a frequency-dependent distortion of the signal.

Frequency selective fading can occur due to various factors such as multipath interference and the presence of objects that scatter or absorb certain frequencies more than others. To mitigate the effects of frequency selective fading, various techniques can be used, such as equalization and frequency hopping.

The channel fading can be modeled with different statistics like Rayleigh, Rician, Nakagami fading. The fading channel models, in this section, are utilized for obtaining the simulated performance of various modulations over Rayleigh flat fading and Rician flat fading channels. Modeling of frequency selective fading channel is discussed in this article.

Linear time invariant channel model and FIR filters

The most significant feature of a real world channel is that the channel does not immediately respond to the input. Physically, this indicates some sort of inertia built into the channel/medium, that takes some time to respond. As a consequence, it may introduce distortion effects like inter-symbol interference (ISI) at the channel output. Such effects are best studied with the linear time invariant (LTI) channel model, given in Figure 1.

Figure 1: Complex baseband equivalent LTI channel model

In this model, the channel response to any input depends only on the channel impulse response(CIR) function of the channel. The CIR is usually defined for finite length \(L\) as \(\mathbf{h}=[h_0,h_1,h_2, \cdots,h_{L-1}]\) where \(h_0\) is the CIR at symbol sampling instant \(0T_{sym}\) and \(h_{L-1}\) is the CIR at symbol sampling instant \((L-1)T_{sym}\). Such a channel can be modeled as a tapped delay line (TDL) filter, otherwise called finite impulse response (FIR) filter. Here, we only consider the CIR at symbol sampling instances. It is well known that the output of such a channel (\(\mathbf{r}\)) is given as the linear convolution of the input symbols (\(\mathbf{s}\)) and the CIR (\(\mathbf{h}\)) at symbol sampling instances. In addition, channel noise in the form of AWGN can also be included the model. Therefore, the resulting vector of from the entire channel model is given as

\[\mathbf{r} = \mathbf{h} \ast \mathbf{s} +\mathbf{n} \quad\quad (1) \]

This article is part of the following books
Digital Modulations using Matlab : Build Simulation Models from Scratch, ISBN: 978-1521493885
Digital Modulations using Python ISBN: 978-1712321638
Wireless communication systems in Matlab ISBN: 979-8648350779
All books available in ebook (PDF) and Paperback formats

Simulation model for detection in flat fading channel

A flat-fading (also called as frequency-non-selective) channel is modeled with a single tap (\(L=1\)) FIR filter with the tap weights drawn from distributions like Rayleigh, Rician or Nakagami distributions. We will assume block fading, which implies that the fading process is approximately constant for a given transmission interval. For block fading, the random tap coefficient \(h=h[0]\) is a complex random variable (not random processes) and for each channel realization, a new set of complex random values are drawn from Rayleigh or Rician or Nakagami fading according to the type of fading desired.

Figure 2: LTI channel viewed as tapped delay line filter

Simulation models for modulation and detection over a fading channel is shown in Figure 2. For a flat fading channel, the output of the channel can be expressed simply as the product of time varying channel response and the input signal. Thus, equation (1) can be simplified (refer this article for derivation) as follows for the flat fading channel.

\[\mathbf{r} = h\mathbf{s} + \mathbf{n} \quad\quad (2) \]

Since the channel and noise are modeled as a complex vectors, the detection of \(\mathbf{s}\) from the received signal is an estimation problem in the complex vector space.

Assuming perfect channel knowledge at the receiver and coherent detection, the receiver shown in Figure 3(a) performs matched filtering. The impulse response of the matched filter is matched to the impulse response of the flat-fading channel as \( h^{\ast}\). The output of the matched filter is scaled down by a factor of \(||h||^2\) which is the total-energy contained in the impulse response of the flat-fading channel. The resulting decision vector \(\mathbf{y}\) serves as the sufficient statistic for the estimation of \(\mathbf{s}\) from the received signal \(\mathbf{r}\) (refer equation A.77 in reference [1])

\[\tilde{\mathbf{y}} = \frac{h^{\ast}}{||h||^2} \mathbf{r} = \frac{h^{\ast}}{||h||^2} h\mathbf{s} + \frac{h^{\ast}}{||h||^2} \mathbf{n} = \mathbf{s} + \tilde{\mathbf{w}} \quad\quad (3) \]

Since the absolute value \(|h|\) and the Eucliden norm \(||h||\) are related as \(|h|^2= \left\lVert h\right\rVert = hh^{\ast}\), the model can be simplified further as given in Figure 3(b).

To simulate flat fading, the values for the fading variable \(h\) are drawn from complex normal distribution

\[h= X + jY \quad\quad (4) \]

where, \(X,Y\) are statistically independent real valued normal random variables.

● If \(E[h]=0\), then \(|h|\) is Rayleigh distributed, resulting in a Rayleigh flat fading channel
● If \(E[h] \neq 0\), then \(|h|\) is Rician distributed, resulting in a Rician flat fading channel with the factor \(K=[E[h]]^2/\sigma^2_h\)


[1] D. Tse and P. Viswanath, Fundamentals of Wireless Communication, Cambridge University Press, 2005.↗

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Central Limit Theorem – a demonstration

Central Limit Theorem – What is it ?

The central limit theorem (CLT) is a fundamental concept in statistics and probability theory that explains how the sum of independent and identically distributed random variables behaves. The theorem states that as the number of these variables increases, the distribution of their sum tends to become more like a normal distribution, even if the variables themselves are not normally distributed.

CLT states that the sum of independent and identically distributed (i.i.d) random variables (with finite mean and variance) approaches normal distribution as sample size \(N \rightarrow \infty\). In simpler terms, the theorem states that under certain general conditions, the sum of independent observations that follow same underlying distribution approximates to normal distribution. The approximation steadily improves as the number of observations increase. The underlying distribution of the independent observation can be anything – binomial, Poisson, exponential, Chi-Squared etc.

Why CLT ?

CLT is an important concept in statistics because it allows us to make inferences about a population based on a sample, even if we do not know the distribution of the population. It is used in many statistical techniques, such as hypothesis testing and confidence intervals.

Applications of CLT

Central limit theorem (CLT) is applied in a vast range of applications including (but not limited to) signal processing, channel modeling, random process, population statistics, engineering research, predicting the confidence intervals, hypothesis testing, etc. One such application in signal processing is – deriving the response of a cascaded series of low pass filters by applying the CLT. In the article titled ‘the central limit theorem and low-pass filters‘ the author has illustrated how the response of a cascaded series of low pass filters approaches Gaussian shape as the number of filters in the series increase [1].

In digital communication, the effect of noise on a communication channel is modeled as additive Gaussian white noise. This follows from the fact that the noise from many physical channels can be considered approximately Gaussian. For example, the random movement of electrons in the semiconductor devices gives rise to shot noise whose effect can be approximated to Gaussian distribution by applying central limit theorem.

Law of large numbers and CLT

there is a connection between the central limit theorem and the law of large numbers.

The law of large numbers is another important theorem in probability theory, which states that as the number of independent and identically distributed (iid) random variables increases, the average of those variables converges to the expected value of the distribution. In other words, as the sample size increases, the sample mean becomes more and more representative of the true population mean.

The central limit theorem, on the other hand, describes the distribution of the sum of iid random variables, and shows that as the sample size increases, the distribution of the sum approaches a normal distribution.

Both the law of large numbers and the CLT deal with the behavior of the sum or average of iid random variables as the sample size gets larger. The law of large numbers describes the behavior of the sample mean, while the CLT describes the behavior of the sum of the variables.

In essence, the law of large numbers is a precursor to the central limit theorem, as it establishes the fact that the sample mean becomes more and more representative of the true population mean as the sample size increases, and the central limit theorem shows that the distribution of the sum of iid random variables approaches a normal distribution as the sample size gets larger.

Demonstration using Python

For Matlab code, please refer the following book – Wireless communication systems in Matlab – by Mathuranathan Viswanathan

The following Python code illustrate how the theorem comes to play when the number of observations is increased for two separate experiments: rolling \(N\) unbiased dice and tossing \(N\) unbiased coins. The code generates \(N\) i.i.d discrete uniform random variables that generates uniform random numbers from the set \(\left\{1,k\right\}\). In the case of the dice rolling experiment, \(k\) is set to \(6\), thus simulating the random pick from the sample space \(S=\left\{1,2,3,4,5,6\right\}\) with equal probability. For the coin tossing experiment, \(k\) is set to \(2\), thus simulating the sample space of \(S=\left\{1,2\right\}\) representing head or tail events with equal probability. Rest of the code is self explanatory.

Python code

#---------Central limit theorem - Author: Mathuranathan -----------------------
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline

numIterations = np.asarray([1,2,5,10,50,100]); #number of i.i.d RVs
experiment = 'dice' #valid values: 'dice', 'coins'
maxNumForExperiment = {'dice':6,'coins':2} #max numbers represented on dice or coins

k = maxNumForExperiment[experiment]

fig, fig_axes = plt.subplots(ncols=3, nrows=2, constrained_layout=True)

for i,N in enumerate(numIterations):
    y = np.random.randint(low=1,high=k+1,size=(N,nSamp)).sum(axis=0)
    row = i//3;col=i%3;
    fig_axes[row,col].set_title('N={} {}'.format(N,experiment))
Figure 1: Demonstrating central limit theorem using N numbers of dice
Figure 2: Demonstrating central limit theorem using N numbers of coins


[1] Engelberg, “The central limit theorem and low-pass filters”, Proceedings of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems, 13-15 Dec. 2004, pp. 65-68.↗

Similar topics:

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Maximum Likelihood estimation

Keywords: maximum likelihood estimation, statistical method, probability distribution, MLE, models, practical applications, finance, economics, natural sciences.


Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution by finding the set of values that maximize the likelihood function of the observed data. In other words, MLE is a method of finding the most likely values of the unknown parameters that would have generated the observed data.

The likelihood function is a function that describes the probability of observing the data given the parameters of the probability distribution. The MLE method seeks to find the set of parameter values that maximizes this likelihood function.

For example, suppose we have a set of data that we believe to be normally distributed, but we do not know the mean or variance of the distribution. We can use MLE to estimate these parameters by finding the mean and variance that maximize the likelihood function of the observed data.

The MLE method is widely used in statistical inference, hypothesis testing, and model fitting in many areas, including economics, finance, engineering, and the natural sciences. MLE is a powerful and flexible method that can be applied to a wide range of statistical models, making it a valuable tool in data analysis and modeling.

Difference between MLE and MLD

Maximum likelihood estimation (MLE) and maximum likelihood decoding (MLD) are two different concepts used in different contexts.

Maximum likelihood estimation is a statistical method used to estimate the parameters of a probability distribution based on a set of observed data. The goal is to find the set of parameter values that maximize the likelihood function of the observed data. MLE is commonly used in statistical inference, hypothesis testing, and model fitting.

On the other hand, maximum likelihood decoding (MLD) is a method used in digital communications and signal processing to decode a received signal that has been transmitted through a noisy channel. The goal is to find the transmitted message that is most likely to have produced the received signal, based on a given probabilistic model of the channel.

In maximum likelihood decoding, the receiver calculates the likelihood of each possible transmitted message, given the received signal and the channel model. The maximum likelihood decoder then selects the transmitted message that has the highest likelihood as the decoded message.

While both MLE and MLD involve the concept of maximum likelihood, they are used in different contexts. MLE is used in statistical estimation, while MLD is used in digital communications and signal processing for decoding.

MLE applied to communication systems

Maximum Likelihood estimation (MLE) is an important tool in determining the actual probabilities of the assumed model of communication.

In reality, a communication channel can be quite complex and a model becomes necessary to simplify calculations at decoder side.The model should closely approximate the complex communication channel. There exist a myriad of standard statistical models that can be employed for this task; Gaussian, Binomial, Exponential, Geometric, Poisson,etc., A standard communication model is chosen based on empirical data.

Each model mentioned above has unique parameters that characterizes them. Determination of these parameters for the chosen model is necessary to make them closely model the communication channel at hand.

Suppose a binomial model is chosen (based on observation of data) for the error events over a particular channel, it is essential to determine the probability of succcess (\(p\)) of the binomial model.

If a Gaussian model (normal distribution!!!) is chosen for a particular channel then estimating mean (\(\mu\)) and variance (\(\sigma^{2}\)) are necessary so that they can be applied while computing the conditional probability of p(y received | x sent)

Similarly estimating the mean number of events within a given interval of time or space (\(\lambda\)) is a necessity for a Poisson distribution model.

Maximum likelihood estimation is a method to determine these unknown parameters associated with the corresponding chosen models of the communication channel.

Python code example for MLE

The following program is an implementation of maximum likelihood estimation (MLE) for the binary symmetric channel (BSC) using the binomial probability mass function (PMF).

The goal of MLE is to estimate the value of an unknown parameter (in this case, the error probability \(p\)) based on observed data. The BSC is a simple channel model where each transmitted bit is flipped (with probability \(p\)) independently of other bits during transmission. The goal of the following program is to estimate the error probability \(p\) of the BSC based on a given binary data sequence.

import numpy as np
from scipy.optimize import minimize
from scipy.special import binom
import matplotlib.pyplot as plt

def BSC_MLE(data):
    Maximum likelihood estimation (MLE) for the Binary Symmetric Channel (BSC).
    This function estimates the error probability p of the BSC based on the observed data.
    # Define the binomial probability mass function
    def binom_PMF(p):
        n = len(data)
        k = np.sum(data)
        p = np.clip(p, 1e-10, 1 - 1e-10)  # Regularization to avoid problems due to small estimation errors
        logprob = np.log(binom(n, k)) + k*np.log(p) + (n-k)*np.log(1-p)
        return -logprob
    # Use the minimize function from scipy.optimize to find the value of p that maximizes the binomial PMF
    #x0 argument specifies the initial guess for the value of p that maximizes the binomial PMF. For BSC x0=0.5
    #BFGS is Broyden-Fletcher-Goldfarb-Shanno optimization algorithm used for unconstrained nonlinear optimization
    res = minimize(lambda p: binom_PMF(p), x0=0.5, method='BFGS')
    p_est = res.x[0]

    # Plot the observed data as a histogram
    plt.hist(data, bins=2, density=True, alpha=0.5)
    plt.axvline(p_est, color='r', linestyle='--')
    plt.xlabel('Bit value')
    plt.title('Observed data')
    return p_est

data = np.random.randint(2, size=1000)
p_est = BSC_MLE(data)
print('Estimated error probability: {:.4f}'.format(p_est))

The program first defines a function called BSC_MLE that takes a binary data sequence as input and returns the estimated error probability p_est. The BSC_MLE function defines the binomial PMF, which represents the probability of observing a certain number of errors (i.e., bit flips) in the data sequence given a specific error probability p. The binomial PMF is then maximized using the minimize function from the scipy.optimize module to find the value of p that maximizes the likelihood of observing the data.

The program then generates a random binary data sequence of length 100 using the np.random.randint() function and calls the BSC_MLE function to estimate the error probability based on the observed data. Finally, the program prints the estimated error probability. Try increasing the sequence length to 1000 and observe the estimated error probability.

Figure 1: Maximum Likelihood Estimation (MLE) : Plotting the observed data as a histogram

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Maximum Likelihood Decoding

Keywords: maximum likelihood decoding, digital communication, data storage, noise, interference, wireless communication systems, optical communication systems, digital storage systems, probability, likelihood estimation, python


Maximum likelihood decoding is a technique used to determine the most likely transmitted message in a digital communication system, based on the received signal and statistical models of noise and interference. The method uses maximum likelihood estimation to calculate the probability of each possible transmitted message and then selects the one with the highest probability.

To perform maximum likelihood decoding, the receiver uses a set of pre-defined models to estimate the likelihood of each possible transmitted message based on the received signal. The method is commonly used in various digital communication and data storage systems, such as wireless communication and digital storage. However, it can be complex and time-consuming, particularly in systems with large message spaces or complex noise and interference models.

Maximum Likelihood Decoding:

Consider a set of possible codewords (valid codewords – set \(Y\)) generated by an encoder in the transmitter side. We pick one codeword out of this set ( call it \(y\) ) and transmit it via a Binary Symmetric Channel (BSC) with probability of error \(p\) ( To know what is a BSC – click here ). At the receiver side we receive the distorted version of \(y\) ( call this erroneous codeword \(x\)).

Maximum Likelihood Decoding chooses one codeword from \(Y\) (the list of all possible codewords) which maximizes the following probability.

\[\mathbb{P}(y\;sent\mid x\;received )\]

Meaning that the receiver computes \(P(y_1,x) , P(y_2,x) , P(y_3,x),\cdots,P(y_n,x)\). and chooses a codeword (\(y\)) which gives the maximum probability.  In practice we don’t know \(y\) (at the receiver) but we know \(x\). So how to compute the probability ? Maximum Likelihood Estimation (MLE) comes to our rescue. For a detailed explanation on MLE – refer here[1] The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. Understanding the difference between prediction and estimation is important at this point.   Estimation differs from prediction in the following way … In estimation problems, likelihood of the parameters is estimated based on given data/observation vector. In prediction problems, probability is used as a measure to predict the outcome from known parameters of a model.

Examples for “Prediction” and “Estimation” :

1) Probability of getting a “Head” in a single toss of a fair coin is \(0.5\). The coin is tossed 100 times in a row.Prediction helps in predicting the outcome ( head or tail ) of the \(101^{th}\) toss based on the probability.

2) A coin is tossed 100 times and the data ( head or tail information) is recorded. Assuming the event follows Binomial distribution model, estimation helps in determining the probability of the event. The actual probability may or may not be \(0.5\).   Maximum Likelihood Estimation estimates the conditional probability based on the observed data ( received data – \(x\)) and an assumed model.

Example of Maximum Likelihood Decoding:

Let \(y=11001001\) and \(x=10011001\) . Assuming Binomial distribution model for the event with probability of error \(0.1\) (i.e the reliability of the BSC is \(1-p = 0.9\)), the Hamming distance between codewords is \(2\) . For binomial model,

\[\mathbb{P}(y\;received\mid x\;sent ) = (1-p)^{n-d}.p^{d}\]

where \(d\) =the hamming distance between the received and the sent codewords n= number of bit sent
\(p\)= error probability of the BSC.
\(1-p\) = reliability of BSC

Substituting \(d=2, n=8\) and \(p=0.1\) , then \(P(y\;received \mid x\;sent) = 0.005314\).

Note : Here, Hamming distance is used to compute the probability. So the decoding can be called as “minimum distance decoding” (which minimizes the Hamming distance) or “maximum likelihood decoding”. Euclidean distance may also be used to compute the conditional probability.

As mentioned earlier, in practice \(y\) is not known at the receiver. Lets see how to estimate \(P(y \;received \mid x\; sent)\) when \(y\) is unknown based on the binomial model.

Since the receiver is unaware of the particular \(y\) corresponding to the \(x\) received, the receiver computes \(P(y\; received \mid x\; sent)\) for each codeword in \(Y\). The \(y\) which gives the maximum probability is concluded as the codeword that was sent.

Python code implementing Maximum Likelihood Decoding:

The following program for demonstrating the maximum likelihood decoding, involves generating a noisy signal from a transmitted message and then using maximum likelihood decoding to estimate the transmitted message from the noisy signal.

  1. The maximum_likelihood_decoding function takes three arguments: received_signal, noise_variance, and message_space.
  2. The calculate_probabilities function is called to calculate the probability of each possible message given the received signal, using the known noise variance.
  3. The probabilities are normalized so that they sum to 1.
  4. The maximum_likelihood_decoding function finds the index of the most likely message (i.e., the message with the highest probability).
  5. The function returns the most likely message.
  6. An example usage is demonstrated where a binary message space is defined ([0, 1]), along with a noise variance and a transmitted message.
  7. The transmitted message is added to noise to generate a noisy received signal.
  8. The maximum_likelihood_decoding function is called to decode the noisy signal.
  9. The transmitted message, received signal, and decoded message are printed to the console for evaluation.
import numpy as np
import matplotlib.pyplot as plt

# Define a function to calculate the probability of each possible message given the received signal
def calculate_probabilities(received_signal, noise_variance, message_space):
    probabilities = np.zeros(len(message_space))

    for i, message in enumerate(message_space):
        error = received_signal - message
        probabilities[i] = np.exp(-np.sum(error ** 2) / (2 * noise_variance))

    return probabilities / np.sum(probabilities)

# Define a function to perform maximum likelihood decoding
def maximum_likelihood_decoding(received_signal, noise_variance, message_space):
    probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
    most_likely_message_index = np.argmax(probabilities)
    return message_space[most_likely_message_index]

# Example usage
message_space = np.array([0, 1])
noise_variance = 0.4
transmitted_message = 1
received_signal = transmitted_message + np.sqrt(noise_variance) * np.random.randn()
decoded_message = maximum_likelihood_decoding(received_signal, noise_variance, message_space)

print('Transmitted message:', transmitted_message)
print('Received signal:', received_signal)
print('Decoded message:', decoded_message)

# Plot probability distribution
probabilities = calculate_probabilities(received_signal, noise_variance, message_space), probabilities)
plt.title('Probability Distribution for Received Signal = {}'.format(received_signal))
plt.xlabel('Transmitted Message')
plt.ylim([0, 1])

The probability of the received signal given a specific transmitted message is calculated as follows:

  1. Compute the difference between the received signal and the transmitted message.
  2. Compute the sum of squares of this difference vector.
  3. Divide this sum by twice the known noise variance.
  4. Take the negative exponential of this value.

This results in a probability density function (PDF) for the received signal given the transmitted message, assuming that the noise is Gaussian and zero-mean.

The probabilities for each possible transmitted message are then normalized so that they sum to 1. This is done by dividing each individual probability by the sum of all probabilities.

The maximum_likelihood_decoding function determines the most likely transmitted message by selecting the message with the highest probability, which corresponds to the maximum likelihood estimate of the transmitted message given the received signal and the statistical model of the noise.

Sample outputs

Transmitted message: 1
Received signal: 0.21798306949364643
Decoded message: 0

Transmitted message: 1
Received signal: -0.5115453787966966
Decoded message: 0

Transmitted message: 1
Received signal: 0.8343088336355061
Decoded message: 1

Transmitted message: 1
Received signal: -0.5479891887158619
Decoded message: 0

The probability distribution for the last sample output is shown below

Figure: Probability distribution for a sample run of the code

Reference :

[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Related Topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Random Variables, CDF and PDF

Random Variable:

In a “coin-flipping” experiment, the outcome is not known prior to the experiment, that is we cannot predict it with certainty (non-deterministic/stochastic). But we know the all possible outcomes – Head or Tail. Assign real numbers to the all possible events (this is called “sample space”), say “0” to “Head” and “1” to “Tail”, and associate a variable “X” that could take these two values. This variable “X” is called a random variable, since it can randomly take any value ‘0’ or ‘1’ before performing the actual experiment.

Obviously, we do not want to wait till the coin-flipping experiment is done. Because the outcome will lose its significance, we want to associate some probability to each of the possible event. In the coin-flipping experiment, all outcomes are equally probable (given that the coin is fair and unbiased). This means that we can say that the probability of getting Head ( our random variable X = 0 ) as well that of getting Tail ( X =1 ) is 0.5 (i.e. 50-50 chance for getting Head/Tail).

This can be written as,

Cumulative Distribution Function:

Mathematically, a complete description of a random variable is given be “Cumulative Distribution Function”- FX(x). Here the bold faced “X” is a random variable and “x” is a dummy variable which is a place holder for all possible outcomes ( “0” and “1” in the above mentioned coin flipping experiment). The Cumulative Distribution Function is defined as,

If we plot the CDF for our coin-flipping experiment, it would look like the one shown in the figure on your right.
The example provided above is of discrete nature, as the values taken by the random variable are discrete (either “0” or “1”) and therefore the random variable is called Discrete Random Variable.

If the values taken by the random variables are of continuous nature (Example: Measurement of temperature), then the random variable is called Continuous Random Variable and the corresponding cumulative distribution function will be smoother without discontinuities.

Probability Distribution function :

Consider an experiment in which the probability of events are as follows. The probabilities of getting the numbers 1,2,3,4 individually are respectively. It will be more convenient for us if we have an equation for this experiment which will give these values based on the events. For example, the equation for this experiment can be given by where . This equation ( equivalently a function) is called probability distribution function.

Probability Density function (PDF) and Probability Mass Function(PMF):

Its more common deal with Probability Density Function (PDF)/Probability Mass Function (PMF) than CDF.

The PDF (defined for Continuous Random Variables) is given by taking the first derivate of CDF.

For discrete random variable that takes on discrete values, is it common to defined Probability Mass Function.

The previous example was simple. The problem becomes slightly complex if we are asked to find the probability of getting a value less than or equal to 3. Now the straight forward approach will be to add the probabilities of getting the values which comes out to be . This can be easily modeled as a probability density function which will be the integral of probability distribution function with limits 1 to 3.

Based on the probability density function or how the PDF graph looks, PDF fall into different categories like binomial distribution, Uniform distribution, Gaussian distribution, Chi-square distribution, Rayleigh distribution, Rician distribution etc. Out of these distributions, you will encounter Gaussian distribution or Gaussian Random variable in digital communication very often.


The mean of a random variable is defined as the weighted average of all possible values the random variable can take. Probability of each outcome is used to weight each value when calculating the mean. Mean is also called expectation (E[X])

For continuos random variable X and probability density function fX(x)

For discrete random variable X, the mean is calculated as weighted average of all possible values (xi) weighted with individual probability (pi)

Variance :

Variance measures the spread of a distribution. For a continuous random variable X, the variance is defined as

For discrete case, the variance is defined as

Standard Deviation () is defined as the square root of variance

Properties of Mean and Variance:

For a constant – “c” following properties will hold true for mean

For a constant – “c” following properties will hold true for variance

PDF and CDF define a random variable completely. For example: If two random variables X and Y have the same PDF, then they will have the same CDF and therefore their mean and variance will be same.
On the otherhand, mean and variance describes a random variable only partially. If two random variables X and Y have the same mean and variance, they may or may not have the same PDF or CDF.

Gaussian Distribution :

Gaussian PDF looks like a bell. It is used most widely in communication engineering. For example , all channels are assumed to be Additive White Gaussian Noise channel. What is the reason behind it ? Gaussian noise gives the smallest channel capacity with fixed noise power. This means that it results in the worst channel impairment. So the coding designs done under this most adverse environment will give superior and satisfactory performance in real environments. For more information on “Gaussianity” refer [1]

The PDF of the Gaussian Distribution (also called as Normal Distribution) is completely characterized by its mean () and variance(),

Since PDF is defined as the first derivative of CDF, a reverse engineering tell us that CDF can be obtained by taking an integral of PDF.
Thus to get the CDF of the above given function,

Equations for PDF and CDF for certain distributions are consolidated below

Probability Distribution Probability Density Function(PDF) Cumulative Distribution Function (CDF)
Gaussian/Normal Distribution –

Reference :

[1] S.Pasupathy, “Glories of Gaussianity”, IEEE Communications magazine, Aug 1989 – 1, pp 38.

Topics in this chapter

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author

Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing