Matplotlib histogram and estimated PDF in Python

Key focus: Shown with examples: let’s estimate and plot the probability density function of a random variable using Python’s Matplotlib histogram function.

Generation of random variables with required probability distribution characteristic is of paramount importance in simulating a communication system. Let’s see how we can generate a simple random variable, estimate and plot the probability density function (PDF) from the generated data and then match it with the intended theoretical PDF. Normal random variable is considered here for illustration.

Step 1: Generate random samples

A survey of commonly used fundamental methods to generate a given random variable is given in [1]. For this demonstration, we will consider the normal random variable with the following parameters : μ – mean and σ – standard deviation. First generate a vector of randomly distributed random numbers of sufficient length (say 100000) with some valid values for μ and σ. There are more than one way to generate this. Two of them are given below.

● Method 1: Using the in-built numpy.random.normal() function (requires numpy package to be installed)

import numpy as np

mu=10;sigma=2.5 #mean=10,deviation=2.5
L=100000 #length of the random vector

#Random samples generated using numpy.random.normal()
samples_normal = np.random.normal(loc=mu,scale=sigma,size=(L,1)) #generate normally distributted samples

● Method 2: Box-Muller transformation [2] method produces a pair of normally distributed random numbers (Z1, Z2) by transforming a pair of uniformly distributed independent random samples (U1,U2). The algorithm for transformation is given by

#Samples generated using Box-Muller transformation

U1 = np.random.uniform(low=0,high=1,size=(L,1)) #uniformly distributed random numbers U(0,1)
U2 = np.random.uniform(low=0,high=1,size=(L,1)) #uniformly distributed random numbers U(0,1)

a = np.sqrt(-2*np.log(U1))
b = 2*np.pi*U2

Z = a*np.cos(b) #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

Step 2: Plot the estimated histogram

Typically, if we have a vector of random numbers that is drawn from a distribution, we can estimate the PDF using the histogram tool. Matplotlib’s hist function can be used to compute and plot histograms. If the density argument is set to ‘True’, the hist function computes the normalized histogram such that the area under the histogram will sum to 1. Estimate and plot the normalized histogram using the hist function.

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

fig, ax0 = plt.subplots(ncols=1, nrows=1) #creating plot axes
(values, bins, _) = ax0.hist(samples_normal,bins=100,density=True,label="Histogram of samples") #Compute and plot histogram, return the computed values and bins

Step 3: Theoretical PDF:

And for verification, overlay the theoretical PDF for the intended distribution. The theoretical PDF of normally distributed random samples is given by

Theoretical PDF for normal distribution is readily obtained from stats.norm.pdf() function in the SciPy package.

from scipy import stats
bin_centers = 0.5*(bins[1:] + bins[:-1])
pdf = stats.norm.pdf(x = bin_centers, loc=mu, scale=sigma) #Compute probability density function
ax0.plot(bin_centers, pdf, label="PDF",color='black') #Plot PDF
ax0.legend()#Legend entries
ax0.set_title('PDF of samples from numpy.random.normal()');
PDF of samples using numpy random normal function
Figure 1: Estimated PDF (histogram) and the theoretical PDF for samples generated using numpy.random.normal() function

The histogram and theoretical PDF of random samples generated using Box-Muller transformation, can be plotted in a similar manner.

#Samples generated using Box-Muller transformation
from numpy.random import uniform
U1 = uniform(low=0,high=1,size=(L,1)) #uniformly distributed random numbers U(0,1)
U2 = uniform(low=0,high=1,size=(L,1)) #uniformly distributed random numbers U(0,1)
a = np.sqrt(-2*np.log(U1))
b = 2*np.pi*U2
Z = a*np.cos(b) #Standard Normal distribution
samples_box_muller = Z*sigma+mu #Normal distribution with mean and sigma

#Plotting
fig, ax1 = plt.subplots(ncols=1, nrows=1) #creating plot axes
(values, bins, _) = ax1.hist(samples_box_muller,bins=100,density=True,label="Histogram of samples") #Plot histogram
bin_centers = 0.5*(bins[1:] + bins[:-1])
pdf = stats.norm.pdf(x = bin_centers, loc=mu, scale=sigma) #Compute probability density function
ax1.plot(bin_centers, pdf, label="PDF",color='black') #Plot PDF
ax1.legend()#Legend entries
ax1.set_title('Box-Muller transformation');

References:

[1] John Mount, ‘Six Fundamental Methods to Generate a Random Variable’, January 20, 2012
[2] Thomas, D. B., Luk. W., Leong, P. H. W., and Villasenor, J. D. 2007. Gaussian random number generators. ACM Comput. Surv. 39, 4, Article 11 (October 2007), 38 pages DOI = 10.1145/1287620.1287622 http://doi.acm.org/10.1145/1287620.1287622

Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Similar topics

Articles in this series
[1] Fibonacci series in python
[2] Central Limit Theorem – a demonstration
[3] Moving Average Filter in Python and Matlab
[4] How to plot FFT in Python – FFT of basic signals : Sine and Cosine waves
[5] How to plot audio files as time-series using Scipy python
[6] How to design a simple FIR filter to reject unwanted frequencies
[7] Analytic signal, Hilbert Transform and FFT
[8] Non-central Chi-squared Distribution
[9] Simulation of M-PSK modulation techniques in AWGN channel (in Matlab and Python)
[10] QPSK modulation and Demodulation (with Matlab and Python implementation)

Plot histogram and estimated PDF in Matlab

Key focus: With examples, let’s estimate and plot the probability density function of a random variable using Matlab histogram function.

Generation of random variables with required probability distribution characteristic is of paramount importance in simulating a communication system. Let’s see how we can generate a simple random variable, estimate and plot the probability density function (PDF) from the generated data and then match it with the intended theoretical PDF. Normal random variable is considered here for illustration. Other types of random variables like uniform, Bernoulli, binomial, Chi-squared, Nakagami-m are illustrated in the next section.

Note: If you are inclined towards programming in Python, visit this article

Step 1: Create the random variable

A survey of commonly used fundamental methods to generate a given random variable is given in [1]. For this demonstration, we will consider the normal random variable with the following parameters : – mean and – standard deviation. First generate a vector of randomly distributed random numbers of sufficient length (say 100000) with some valid values for and . There are more than one way to generate this. Some of them are given below.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

● Method 1: Using the in-built random function (requires statistics toolbox)

mu=0;sigma=1;%mean=0,deviation=1
L=100000; %length of the random vector
R = random('Normal',mu,sigma,L,1);%method 1

●  Method 2: Using randn function that generates normally distributed random numbers having and = 1

mu=0;sigma=1;%mean=0,deviation=1
L=100000; %length of the random vector
R = randn(L,1)*sigma + mu; %method 2

● Method 3: Box-Muller transformation [2] method using rand function that generates uniformly distributed random numbers

 mu=0;sigma=1;%mean=0,deviation=1
L=100000; %length of the random vector
U1 = rand(L,1); %uniformly distributed random numbers U(0,1)
U2 = rand(L,1); %uniformly distributed random numbers U(0,1)
Z = sqrt(-2log(U1)).cos(2piU2);%Standard Normal distribution
R = Z*sigma+mu;%Normal distribution with mean and sigma

Step 2: Plot the estimated histogram

Typically, if we have a vector of random numbers that is drawn from a distribution, we can estimate the PDF using the histogram tool.  Matlab supports two in-built functions to compute and plot histograms:

● hist – introduced before R2006a
● histogram – introduced in R2014b

Which one to use ? Matlab’s help page points that the hist function is not recommended for several reasons and the issue of inconsistency is one among them. The histogram function is the recommended function to use.

Estimate and plot the normalized histogram using the recommended ‘histogram’ function. And for verification, overlay the theoretical PDF for the intended distribution. When using the histogram function to plot the estimated PDF from the generated random data, use ‘pdf’ option for ‘Normalization’ option. Do not use the ‘probability’ option for ‘Normalization’ option, as it will not match the theoretical PDF curve.

histogram(R,'Normalization','pdf'); %plot estimated pdf from the generated data

X = -4:0.1:4; %range of x to compute the theoretical pdf
fx_theory = pdf('Normal',X,mu,sigma); %theoretical normal probability density
hold on; plot(X,fx_theory,'r'); %plot computed theoretical PDF
title('Probability Density Function'); xlabel('values - x'); ylabel('pdf - f(x)'); axis tight;
legend('simulated','theory');
Estimated PDF (using histogram function) and the theoretical PDF

However, if you do not have Matlab version that was released before R2014b, use the ‘hist’ function and get the histogram frequency counts () and the bin-centers (). Using these data, normalize the frequency counts using the overall area under the histogram. Plot this normalized histogram and overlay the theoretical PDF for the chosen parameters.

%For those who don't have access to 'histogram' function
%get un-normalized values from hist function with same number of bins as histogram function
numBins=50; %choose appropriately
[f,x]=hist(R,numBins); %use hist function and get unnormalized values
figure; plot(x,f/trapz(x,f),'b-*');%plot normalized histogram from the generated data

X = -4:0.1:4; %range of x to compute the theoretical pdf
fx_theory =   pdf('Normal',X,mu,sigma); %theoretical normal probability density
hold on; plot(X,fx_theory,'r'); %plot computed theoretical PDF
title('Probability Density Function'); xlabel('values - x'); ylabel('pdf - f(x)'); axis tight;
legend('simulated','theory');

Step 3: Theoretical PDF:

The given code snippets above,  already include the command to plot the theoretical PDF by using the ‘pdf’ function in Matlab. It you do not have access to this function, you could use the following equation for computing the theoretical PDF

The code snippet for that purpose is given next.

X = -4:0.1:4; %range of x to compute the theoretical pdf
fx_theory = 1/sqrt(2*pi*sigma^2)*exp(-0.5*(X-mu).^2./sigma^2);
plot(X,fx_theory,'k'); %plot computed theoretical PDF

Note:  The functions – ‘random’ and ‘pdf’ , requires statistics toolbox.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References:

[1] John Mount, ‘Six Fundamental Methods to Generate a Random Variable’, January 20, 2012.↗
[2] Thomas, D. B., Luk. W., Leong, P. H. W., and Villasenor, J. D. 2007. Gaussian random number generators. ACM Comput. Surv. 39, 4, Article 11 (October 2007), 38 pages DOI = 10.1145/1287620.1287622 http://doi.acm.org/10.1145/1287620.1287622.↗

Topics in this chapter

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Random Variables, CDF and PDF

Random Variable:

In a “coin-flipping” experiment, the outcome is not known prior to the experiment, that is we cannot predict it with certainty (non-deterministic/stochastic). But we know the all possible outcomes – Head or Tail. Assign real numbers to the all possible events (this is called “sample space”), say “0” to “Head” and “1” to “Tail”, and associate a variable “X” that could take these two values. This variable “X” is called a random variable, since it can randomly take any value ‘0’ or ‘1’ before performing the actual experiment.

Obviously, we do not want to wait till the coin-flipping experiment is done. Because the outcome will lose its significance, we want to associate some probability to each of the possible event. In the coin-flipping experiment, all outcomes are equally probable (given that the coin is fair and unbiased). This means that we can say that the probability of getting Head ( our random variable X = 0 ) as well that of getting Tail ( X =1 ) is 0.5 (i.e. 50-50 chance for getting Head/Tail).

This can be written as,

Cumulative Distribution Function:

Mathematically, a complete description of a random variable is given be “Cumulative Distribution Function”- FX(x). Here the bold faced “X” is a random variable and “x” is a dummy variable which is a place holder for all possible outcomes ( “0” and “1” in the above mentioned coin flipping experiment). The Cumulative Distribution Function is defined as,

If we plot the CDF for our coin-flipping experiment, it would look like the one shown in the figure on your right.
The example provided above is of discrete nature, as the values taken by the random variable are discrete (either “0” or “1”) and therefore the random variable is called Discrete Random Variable.

If the values taken by the random variables are of continuous nature (Example: Measurement of temperature), then the random variable is called Continuous Random Variable and the corresponding cumulative distribution function will be smoother without discontinuities.

Probability Distribution function :

Consider an experiment in which the probability of events are as follows. The probabilities of getting the numbers 1,2,3,4 individually are respectively. It will be more convenient for us if we have an equation for this experiment which will give these values based on the events. For example, the equation for this experiment can be given by where . This equation ( equivalently a function) is called probability distribution function.

Probability Density function (PDF) and Probability Mass Function(PMF):

Its more common deal with Probability Density Function (PDF)/Probability Mass Function (PMF) than CDF.

The PDF (defined for Continuous Random Variables) is given by taking the first derivate of CDF.

For discrete random variable that takes on discrete values, is it common to defined Probability Mass Function.

The previous example was simple. The problem becomes slightly complex if we are asked to find the probability of getting a value less than or equal to 3. Now the straight forward approach will be to add the probabilities of getting the values which comes out to be . This can be easily modeled as a probability density function which will be the integral of probability distribution function with limits 1 to 3.

Based on the probability density function or how the PDF graph looks, PDF fall into different categories like binomial distribution, Uniform distribution, Gaussian distribution, Chi-square distribution, Rayleigh distribution, Rician distribution etc. Out of these distributions, you will encounter Gaussian distribution or Gaussian Random variable in digital communication very often.

Mean:

The mean of a random variable is defined as the weighted average of all possible values the random variable can take. Probability of each outcome is used to weight each value when calculating the mean. Mean is also called expectation (E[X])

For continuos random variable X and probability density function fX(x)

For discrete random variable X, the mean is calculated as weighted average of all possible values (xi) weighted with individual probability (pi)

Variance :

Variance measures the spread of a distribution. For a continuous random variable X, the variance is defined as

For discrete case, the variance is defined as

Standard Deviation () is defined as the square root of variance

Properties of Mean and Variance:

For a constant – “c” following properties will hold true for mean

For a constant – “c” following properties will hold true for variance

PDF and CDF define a random variable completely. For example: If two random variables X and Y have the same PDF, then they will have the same CDF and therefore their mean and variance will be same.
On the otherhand, mean and variance describes a random variable only partially. If two random variables X and Y have the same mean and variance, they may or may not have the same PDF or CDF.

Gaussian Distribution :

Gaussian PDF looks like a bell. It is used most widely in communication engineering. For example , all channels are assumed to be Additive White Gaussian Noise channel. What is the reason behind it ? Gaussian noise gives the smallest channel capacity with fixed noise power. This means that it results in the worst channel impairment. So the coding designs done under this most adverse environment will give superior and satisfactory performance in real environments. For more information on “Gaussianity” refer [1]

The PDF of the Gaussian Distribution (also called as Normal Distribution) is completely characterized by its mean () and variance(),

Since PDF is defined as the first derivative of CDF, a reverse engineering tell us that CDF can be obtained by taking an integral of PDF.
Thus to get the CDF of the above given function,

Equations for PDF and CDF for certain distributions are consolidated below

Probability Distribution Probability Density Function(PDF) Cumulative Distribution Function (CDF)
Gaussian/Normal Distribution –

Reference :

[1] S.Pasupathy, “Glories of Gaussianity”, IEEE Communications magazine, Aug 1989 – 1, pp 38.

Topics in this chapter

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing