Random Variables, CDF and PDF

Random Variable:

A random variable is a mapping from sample space $latex \Omega\$ to a set of real numbers. What does this mean ? Lets take the usual evergreen example of “flipping a coin”.

In a “coin-flipping” experiment, the outcome is not known prior to the experiment, that is we cannot predict it with certainty (non-deterministic/stochastic). But we know the all possible outcomes – Head or Tail. Assign real numbers to the all possible events (this is called “sample space”), say “0” to “Head” and “1” to “Tail”, and associate a variable “X” that could take these two values. This variable “X” is called a random variable, since it can randomly take any value ‘0’ or ‘1’ before performing the actual experiment.

Obviously, we do not want to wait till the coin-flipping experiment is done. Because the outcome will lose its significance, we want to associate some probability to each of the possible event. In the coin-flipping experiment, all outcomes are equally probable (given that the coin is fair and unbiased). This means that we can say that the probability of getting Head ( our random variable X = 0 ) as well that of getting Tail ( X =1 ) is 0.5 (i.e. 50-50 chance for getting Head/Tail).

This can be written as,

$latex P(\mathbf{X}=0)=0.5\;and\;P(\mathbf{X}=1)=0.5 &s=1$

Cumulative Distribution Function:

Mathematically, a complete description of a random variable is given be “Cumulative Distribution Function”- FX(x). Here the bold faced “X” is a random variable and “x” is a dummy variable which is a place holder for all possible outcomes ( “0” and “1” in the above mentioned coin flipping experiment). The Cumulative Distribution Function is defined as,

$latex F_{\textbf{X}}(x)= P(\textbf{X}\leq x) &s=1$

Cumulative Distribution Function (CDF)

If we plot the CDF for our coin-flipping experiment, it would look like the one shown in the figure on your right.
The example provided above is of discrete nature, as the values taken by the random variable are discrete (either “0” or “1”) and therefore the random variable is called Discrete Random Variable.

If the values taken by the random variables are of continuous nature (Example: Measurement of temperature), then the random variable is called Continuous Random Variable and the corresponding cumulative distribution function will be smoother without discontinuities.

Probability Distribution function :

Consider an experiment in which the probability of events are as follows. The probabilities of getting the numbers 1,2,3,4 individually are $latex 1/10,2/10,3/10,4/10$ respectively. It will be more convenient for us if we have an equation for this experiment which will give these values based on the events. For example, the equation for this experiment can be given by $latex f(x)=x/10$ where $latex x=1,2,3,4$. This equation ( equivalently a function) is called probability distribution function.

Probability Density function (PDF) and Probability Mass Function(PMF):

Its more common deal with Probability Density Function (PDF)/Probability Mass Function (PMF) than CDF.

The PDF (defined for Continuous Random Variables) is given by taking the first derivate of CDF.

$latex f_\textbf{X}(x)=\frac{dF_\textbf{X}(x)}{dx} &s=1$

For discrete random variable that takes on discrete values, is it common to defined Probability Mass Function.

$latex f_\textbf{X}(x)=P(\textbf{X}=x) &s=1$

The previous example was simple. The problem becomes slightly complex if we are asked to find the probability of getting a value less than or equal to 3. Now the straight forward approach will be to add the probabilities of getting the values $latex x=1,2,3$ which comes out to be $latex 1/10+2/10+3/10 =6/10$. This can be easily modeled as a probability density function which will be the integral of probability distribution function with limits 1 to 3.

Based on the probability density function or how the PDF graph looks, PDF fall into different categories like binomial distribution, Uniform distribution, Gaussian distribution, Chi-square distribution, Rayleigh distribution, Rician distribution etc. Out of these distributions, you will encounter Gaussian distribution or Gaussian Random variable in digital communication very often.

Mean:

The mean of a random variable is defined as the weighted average of all possible values the random variable can take. Probability of each outcome is used to weight each value when calculating the mean. Mean is also called expectation (E[X])

For continuos random variable X and probability density function fX(x)

$latex E\left[X \right] = \int_{-\infty }^{\infty}xf_X(x)dx &s=1$

For discrete random variable X, the mean is calculated as weighted average of all possible values (xi) weighted with individual probability (pi)

$latex E\left[X \right] = \mu{_X} = \sum_{-\infty }^{\infty}x_{i}p_{i} &s=1$

Variance :

Variance measures the spread of a distribution. For a continuous random variable X, the variance is defined as

$latex var \left[X\right] = \int_{-\infty }^{\infty} \left(x – E\left[X \right] \right)^2 f_X(x) dx &s=1$

For discrete case, the variance is defined as

$latex var \left[X\right] = {\sigma^2}_X = \sum_{-\infty }^{\infty} \left( x_i – \mu_X\right)^2 p_{i} &s=1$

Standard Deviation ($latex \sigma$) is defined as the square root of variance $latex {\sigma^2}_X $

Properties of Mean and Variance:

For a constant – “c” following properties will hold true for mean

$latex E\left[cX\right] = c E\left[X\right] &s=1$ $latex E\left[X+c\right] = E\left[X\right]+c &s=1$ $latex E\left[c\right] = c &s=1$

For a constant – “c” following properties will hold true for variance

$latex var\left[cX\right] = c^2 var\left[X\right] &s=1$ $latex var\left[X+c\right] = var\left[X\right] &s=1$ $latex var\left[c\right] = 0 &s=1$

PDF and CDF define a random variable completely. For example: If two random variables X and Y have the same PDF, then they will have the same CDF and therefore their mean and variance will be same.
On the otherhand, mean and variance describes a random variable only partially. If two random variables X and Y have the same mean and variance, they may or may not have the same PDF or CDF.

Gaussian Distribution :

Gaussian PDF looks like a bell. It is used most widely in communication engineering. For example , all channels are assumed to be Additive White Gaussian Noise channel. What is the reason behind it ? Gaussian noise gives the smallest channel capacity with fixed noise power. This means that it results in the worst channel impairment. So the coding designs done under this most adverse environment will give superior and satisfactory performance in real environments. For more information on “Gaussianity” refer [1]

The PDF of the Gaussian Distribution (also called as Normal Distribution) is completely characterized by its mean ($latex \mu$) and variance($latex \sigma$),

$latex f(x)=\frac{1}{\sqrt{2\pi \sigma ^{2}}}e^{^{\frac{-(x-\mu )^{2}}{2\sigma ^{2}}}}&s=1$

Since PDF is defined as the first derivative of CDF, a reverse engineering tell us that CDF can be obtained by taking an integral of PDF.
Thus to get the CDF of the above given function,

$latex F_{\textbf{X}}(x;\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi}}\int_{-\infty }^{\frac{x-\mu}{ \sigma}}e^{\frac{-t^{2}}{2}}dt &s=1$

Equations for PDF and CDF for certain distributions are consolidated below

Probability Distribution Probability Density Function(PDF) Cumulative Distribution Function (CDF)
Gaussian/Normal Distribution – $latex \mathcal{N}(\mu,\sigma^{2})$ $latex \displaystyle{f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{^{\frac{-(x-\mu )^{2}}{2\sigma ^{2}}}}}$ $latex \displaystyle{F_{\textbf{X}}(x;\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi}}\int_{-\infty }^{\frac{x-\mu}{\sigma}}e^{\frac{-t^{2}}{2}}dt}$

Reference :

[1] S.Pasupathy, “Glories of Gaussianity”, IEEE Communications magazine, Aug 1989 – 1, pp 38.

Topics in this chapter

[table id=33 /]

Books by the author

[table id=23/]

 

10 thoughts on “Random Variables, CDF and PDF”

  1. Hello , your website is a great help for topic understanding and implementing in our project .
    Sir can you plz give me the simplest PDF and CDF vs Capacity matlab code for mimo system without channel matrix ??
    I mean i m using simulink platform and H matrix need not be in the code for the plot..

    hoping for your fast reply at [email protected]

    Reply
  2. sir, please help me……i want to write a MATLAB script which generates N
    samples from a Rayleigh distribution, and compares the sample histogram with the
    Rayleigh density function. but i want to take starting point as given script

    mu = 0; % mean (mu)
    sig = 2; % standard deviation (sigma)
    N = 1e5; % number of samples

    % Sample from Gaussian distribution %

    z = mu + sig*randn(1,N);

    % Plot sample histogram, scaling vertical axis
    %to ensure area under histogram is 1
    dx = 0.5;
    x = mu-5*sig:dx:mu+5*sig; % mean, and 5 standard
    % deviations either side
    H = hist(z,x);
    area = sum(H*dx);
    H = H/area;
    bar(x,H)
    xlim([-5*sig,5*sig])

    % Overlay Gaussian density function
    hold on
    f = exp(-(x-mu).^2/(2*sig^2))/sqrt(2*pi*sig^2);
    plot(x,f,’r’,’LineWidth’,3)
    hold off

    Reply

Leave a Comment