Generate multiple sequences of correlated random variables

In the previous post, a method for generating two sequences of correlated random variables was discussed. Generation of multiple sequences of correlated random variables, given a correlation matrix is discussed here.

Correlation Matrix

Correlation matrix defines correlation among N variables. It is a symmetric matrix with the element equal to the correlation coefficient between the and the variable. The diagonal elements (correlations of variables with themselves) are always equal to 1.

Sample problem:

Let’s say we would like to generate three sets of random sequences X,Y,Z with the following correlation relationships.

  1. Correlation co-efficient between X and Y is 0.5
  2. Correlation co-efficient between X and Z is 0.3
  3. Obviously the variable X  correlates with itself 100% – i.e, correlation-coefficient is 1

Putting all these relationships in a compact matrix form, gives the correlation matrix. We take arbitrary correlation value (0.3) for the relationship between Y and Z.

Now, the task is to generate three sets of random numbers X,Y and Z that follows the relationship above. The problem can be addressed in many ways. Two most common methods finding the solution are

  1. Cholesky Decomposition method
  2. Spectral Decomposition ( also called Eigen Vector Decomposition) method

The Cholesky Decomposition method is discussed here.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Generating Correlated random number using Cholesky Decomposition:

Cholesky decomposition is the matrix equivalent of taking square root operation on a given matrix. As with any scalar values, positive square root is only possible if the given number is a positive (Imaginary roots do exist otherwise). Similarly, if a matrix need to be decomposed into square-root equivalent, the matrix need to be positive definite.

The method discussed here, seeks to decompose the given correlation matrix using Cholesky decomposition.

where U and L are upper and lower triangular matrices. We will consider Upper triangular matrix here. Equivalently, lower triangular matrix can also be used, in which case the order of output needs to be reversed.

For this decomposition to work, the correlation matrix should be positive definite. The correlated random sequences (where X,Y,Z are column vectors) that follow the above relationship can be generated by multiplying the uncorrelated random numbers R  with U .

Steps to follow:

Generate three sequences of uncorrelated random numbers R – each drawn from a normal distribution. For this case, the R matrix will be of size where k is the number of  samples we wish to generate and we allocate the k samples in three columns, where the columns indicate the place holder for each variable X, Y and Z. Multiply this matrix with the Cholesky decomposed upper triangular version of the correlation matrix.

Python code

import numpy as np
from scipy.linalg import cholesky
from scipy.stats import pearsonr #to calculate correlation coefficient

#for plotting and visualization
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
import seaborn as sns

C = np.array([[1, -0.5, 0.3], 
              [-0.5, 1, 0.2],
              [0.3, 0.2, 1]]) #Construct correlation matrix
U = cholesky(C) #Cholesky decomposition
R = np.random.randn(10000,3) #Three uncorrelated sequences
Rc = R @ U #Array of correlated random sequences

#compute and display correlation coeff from generated sequences
def pearsonCorr(x, y, **kws): 
    (r, _) = pearsonr(x, y) #returns Pearson’s correlation coefficient, 2-tailed p-value)
    ax = plt.gca()
    ax.annotate("r = {:.2f} ".format(r),xy=(.7, .9), xycoords=ax.transAxes)
    
#Visualization
df = pd.DataFrame(data=Rc, columns=['X','Y','Z'])
graph = sns.pairplot(df)
graph.map(pearsonCorr)
Pairplot of correlated random variables generated using Cholesky decomposition (Python)
Figure 1: Pairplot of correlated random variables generated using Cholesky decomposition (Python)

Matlab code

x=[  1  0.5 0.3; 0.5  1  0.3; 0.3 0.3  1 ;]; %Correlation matrix
U=chol(x); %Cholesky decomposition 

R=randn(10000,3); %Random data in three columns each for X,Y and Z
Rc=R*U; %Correlated matrix Rc=[X Y Z]

%Verify Correlation-Coeffs of generated vectors
coeffMatrixXX=corrcoef(Rc(:,1),Rc(:,1));
coeffMatrixXY=corrcoef(Rc(:,1),Rc(:,2));
coeffMatrixXZ=corrcoef(Rc(:,1),Rc(:,3));

%Extract the required correlation coefficients
coeffXX=coeffMatrixXX(1,2) %Correlation Coeff for XX;
coeffXY=coeffMatrixXY(1,2) %Correlation Coeff for XY;
coeffXZ=coeffMatrixXZ(1,2) %Correlation Coeff for XZ;

%Scatterplots
subplot(3,1,1)
plot(Rc(:,1),Rc(:,1),'b.')
title(['Scatterd Plot - X and X calculated \rho=' num2str(coeffXX)])
xlabel('X')
ylabel('X')

subplot(3,1,2)
plot(Rc(:,1),Rc(:,2),'r.')
title(['Scatterd Plot - X and Y calculated \rho=' num2str(coeffXY)])
xlabel('X')
ylabel('Y')

subplot(3,1,3)
plot(Rc(:,1),Rc(:,3),'m.')
title(['Scatterd Plot - X and Z calculated \rho=' num2str(coeffXZ)])
xlabel('X')
ylabel('Z')

Scattered plots to verify the simulated data

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

Further reading

[1] Richard Taylor, “Interpretation of correlation coefficient: A basic review”, Journal of diagnostic medical sonography, Jan/Feb 1990.↗

Topics in this chapter

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

White Noise : Simulation and Analysis using Matlab

Definition

A random process (or signal for your visualization) with a constant power spectral density (PSD) function is a white noise process.

Power Spectral Density

Power Spectral Density function (PSD) shows how much power is contained in each of the spectral component. For example, for a sine wave of fixed frequency, the PSD plot will contain only one spectral component present at the given frequency. PSD is an even function and so the frequency components will be mirrored across the Y-axis when plotted. Thus for a sine wave of fixed frequency, the double sided plot of PSD will have two components – one at +ve frequency and another at –ve frequency of the sine wave. (Know how to plot PSD/FFT in Python & in Matlab)

Gaussian and Uniform White Noise:

A white noise signal (process) is constituted by a set of independent and identically distributed (i.i.d) random variables. In discrete sense, the white noise signal constitutes a series of samples that are independent and generated from the same probability distribution. For example, you can generate a white noise signal using a random number generator in which all the samples follow a given Gaussian distribution. This is called White Gaussian Noise (WGN) or Gaussian White Noise. Similarly, a white noise signal generated from a Uniform distribution is called Uniform White Noise.

Gaussian Noise and Uniform Noise are frequently used in system modelling. In modelling/simulation, white noise can be generated using an appropriate random generator. White Gaussian Noise can be generated using randn function in Matlab which generates random numbers that follow a Gaussian distribution. Similarly, rand function can be used to generate Uniform White Noise in Matlab that follows a uniform distribution. When the random number generators are used, it generates a series of random numbers from the given distribution. Let’s take the example of generating a White Gaussian Noise of length 10 using randn function in Matlab – with zero mean and standard deviation=1.

>> mu=0;sigma=1;
>> noise= sigma *randn(1,10)+mu
noise =   -1.5121    0.7321   -0.1621    0.4651    1.4284    1.0955   -0.5586    1.4362   -0.8026    0.0949

What is i.i.d ?

This simply generates 10 random numbers from the standard normal distribution. As we know that a white process is seen as a random process composing several random variables following the same Probability Distribution Function (PDF). The 10 random numbers above are generated from the same PDF (standard normal distribution). This condition is called “identically distributed” condition. The individual samples given above are “independent” of each other. Furthermore, each sample can be viewed as a realization of one random variable. In effect, we have generated a random process that is composed of realizations of 10 random variables. Thus, the process above is constituted from “independent identically distributed” (i.i.d) random variables.

Strictly and weakly defined white noise:

Since the white noise process is constructed from i.i.d random variable/samples, all the samples follow the same underlying probability distribution function (PDF). Thus, the Joint Probability Distribution function of the process will not change with any shift in time. This is called a stationary process. Hence, this noise is a stationary process. As with a stationary process which can be classified as Strict Sense Stationary (SSS) and Wide Sense Stationary (WSS) processes, we can have white noise that is SSS and white noise that is WSS. Correspondingly they can be called strictly defined white noise signal and weakly defined white noise signal.

What’s with Covariance Function/Matrix ?

A white noise signal, denoted by \(x(t)\), is defined in weak sense is a more practical condition. Here, the samples are statistically uncorrelated and identically distributed with some variance equal to \(\sigma^2\). This condition is specified by using a covariance function as

\[COV \left(x_i, x_j \right) = \begin{cases} \sigma^2, & \quad i = j \\ 0, & \quad i \neq j \end{cases}\]

Why do we need a covariance function? Because, we are dealing with a random process that is composed of \(n\) random variables (10 variables in the modelling example above). Such a process is viewed as multivariate random vector or multivariate random variable.

For multivariate random variables, Covariance function specified how each of the \(n\) variables in the given random process behaves with respect to each other. Covariance function generalizes the notion of variance to multiple dimensions.

The above equation when represented in the matrix form gives the covariance matrix of the white noise random process. Since the random variables in this process are statistically uncorrelated, the covariance function contains values only along the diagonal.

\[C_{xx} = \begin{bmatrix} \sigma^2 & \cdots & 0 \\ \vdots & \sigma^2 & \vdots \\ 0 & \cdots & \sigma^2\end{bmatrix} = \sigma^2 \mathbf{I} \]

The matrix above indicates that only the auto-correlation function exists for each random variable. The cross-correlation values are zero (samples/variables are statistically uncorrelated with respect to each other). The diagonal elements are equal to the variance and all other elements in the matrix are zero.The ensemble auto-correlation function of the weakly defined white noise is given by This indicates that the auto-correlation function of weakly defined white noise process is zero everywhere except at lag \(\tau=0\).

\[R_{xx}(\tau) = E \left[ x(t) x^*(t-\tau)\right] = \sigma^2 \delta (\tau)\]

Related topic: Constructing the auto-correlation matrix in Matlab

Frequency Domain Characteristics:

Wiener-Khintchine Theorem states that for Wide Sense Stationary Process (WSS), the power spectral density function \(S_{xx}(f)\) of a random process can be obtained by Fourier Transform of auto-correlation function of the random process. In continuous time domain, this is represented as

\[S_{xx}(f) = F \left[R_{xx}(\tau) \right] = \int_{-\infty}^{\infty} R_{xx} (\tau) e ^{- j 2 \pi f \tau} d \tau\]

For the weakly defined white noise process, we find that the mean is a constant and its covariance does not vary with respect to time. This is a sufficient condition for a WSS process. Thus we can apply Weiner-Khintchine Theorem. Therefore, the power spectral density of the weakly defined white noise process is constant (flat) across the entire frequency spectrum (Figure 1). The value of the constant is equal to the variance or power of the noise signal.

\[S_{xx}(f) = F \left[R_{xx}(\tau) \right] = \int_{-\infty}^{\infty} \sigma^2 \delta (\tau) e ^{- j 2 \pi f \tau} d \tau = \sigma^2 \int_{-\infty}^{\infty} \delta (\tau) e ^{- j 2 \pi f \tau} = \sigma^2\]
Figure 1: Weiner-Khintchine theorem illustrated

Testing the characteristics of White Gaussian Noise in Matlab:

Generate a Gaussian white noise signal of length \(L=100,000\) using the randn function in Matlab and plot it. Let’s assume that the pdf is a Gaussian pdf with mean \(\mu=0\) and standard deviation \(\sigma=2\). Thus the variance of the Gaussian pdf is \(\sigma^2=4\). The theoretical PDF of Gaussian random variable is given by

\[f_X(x) = \frac{1}{\sigma \sqrt{2 \pi}} \cdot exp \left[ – \frac{\left( x – \mu\right)^2}{2 \sigma^2} \right] \]

More simulation techniques available in the following ebooks
Digital Modulations using Matlab
Digital Modulations using Python
Wireless Communication systems in Matlab

clear all; clc; close all;
L=100000; %Sample length for the random signal
mu=0;
sigma=2;
X=sigma*randn(L,1)+mu;

figure();
subplot(2,1,1)
plot(X);
title(['White noise : \mu_x=',num2str(mu),' \sigma^2=',num2str(sigma^2)])
xlabel('Samples')
ylabel('Sample Values')
grid on;
Figure 2: Simulated noise samples

Plot the histogram of the generated noise signal and verify the histogram by plotting against the theoretical pdf of the Gaussian random variable.

If you are inclined towards programming in Python, go here to know about plotting histogram using Matplotlib package.

subplot(2,1,2)
n=100; %number of Histrogram bins
[f,x]=hist(X,n);
bar(x,f/trapz(x,f)); hold on;
%Theoretical PDF of Gaussian Random Variable
g=(1/(sqrt(2*pi)*sigma))*exp(-((x-mu).^2)/(2*sigma^2));
plot(x,g);hold off; grid on;
title('Theoretical PDF and Simulated Histogram of White Gaussian Noise');
legend('Histogram','Theoretical PDF');
xlabel('Bins');
ylabel('PDF f_x(x)');
Figure 3: Plot of simulated & theoretical PDF for Gaussian RV

Compute the auto-correlation function of the white noise. The computed auto-correlation function has to be scaled properly. If the ‘xcorr’ function (inbuilt in Matlab) is used for computing the auto-correlation function, use the ‘biased’ argument in the function to scale it properly.

figure();
Rxx=1/L*conv(flipud(X),X);
lags=(-L+1):1:(L-1);

%Alternative method
%[Rxx,lags] =xcorr(X,'biased'); 
%The argument 'biased' is used for proper scaling by 1/L
%Normalize auto-correlation with sample length for proper scaling

plot(lags,Rxx); 
title('Auto-correlation Function of white noise');
xlabel('Lags')
ylabel('Correlation')
grid on;
Figure 4: Autocorrelation function of generated noise

Simulating the PSD:

Simulating the Power Spectral Density (PSD) of the white noise is a little tricky business. There are two issues here 1) The generated samples are of finite length. This is synonymous to applying truncating an infinite series of random samples. This implies that the lags are defined over a fixed range. ( FFT and spectral leakage – an additional resource on this topic can be found here) 2) The random number generators used in simulations are pseudo-random generators. Due these two reasons, you will not get a flat spectrum of psd when you apply Fourier Transform over the generated auto-correlation values.The wavering effect of the psd can be minimized by generating sufficiently long random signal and averaging the psd over several realizations of the random signal.

Simulating Gaussian White Noise as a Multivariate Gaussian Random Vector:

To verify the power spectral density of the white noise, we will use the approach of envisaging the noise as a composite of \(N\) Gaussian random variables. We want to average the PSD over \(L\) such realizations. Since there are \(N\) Gaussian random variables (\(N\) individual samples) per realization, the covariance matrix \( C_{xx}\) will be of dimension \(N \times N\). The vector of mean for this multivariate case will be of dimension \(1 \times N\).

Cholesky decomposition of covariance matrix gives the equivalent standard deviation for the multivariate case. Cholesky decomposition can be viewed as square root operation. Matlab’s randn function is used here to generate the multi-dimensional Gaussian random process with the given mean matrix and covariance matrix.

%Verifying the constant PSD of White Gaussian Noise Process
%with arbitrary mean and standard deviation sigma

mu=0; %Mean of each realization of Noise Process
sigma=2; %Sigma of each realization of Noise Process

L = 1000; %Number of Random Signal realizations to average
N = 1024; %Sample length for each realization set as power of 2 for FFT

%Generating the Random Process - White Gaussian Noise process
MU=mu*ones(1,N); %Vector of mean for all realizations
Cxx=(sigma^2)*diag(ones(N,1)); %Covariance Matrix for the Random Process
R = chol(Cxx); %Cholesky of Covariance Matrix
%Generating a Multivariate Gaussian Distribution with given mean vector and
%Covariance Matrix Cxx
z = repmat(MU,L,1) + randn(L,N)*R;

Compute PSD of the above generated multi-dimensional process and average it to get a smooth plot.

%By default, FFT is done across each column - Normal command fft(z)
%Finding the FFT of the Multivariate Distribution across each row
%Command - fft(z,[],2)
Z = 1/sqrt(N)*fft(z,[],2); %Scaling by sqrt(N);
Pzavg = mean(Z.*conj(Z));%Computing the mean power from fft

normFreq=[-N/2:N/2-1]/N;
Pzavg=fftshift(Pzavg); %Shift zero-frequency component to center of spectrum
plot(normFreq,10*log10(Pzavg),'r');
axis([-0.5 0.5 0 10]); grid on;
ylabel('Power Spectral Density (dB/Hz)');
xlabel('Normalized Frequency');
title('Power spectral density of white noise');
Figure 5: Power spectral density of generated noise

The PSD plot of the generated noise shows almost fixed power in all the frequencies. In other words, for a white noise signal, the PSD is constant (flat) across all the frequencies (\(- \infty\) to \(+\infty\)). The y-axis in the above plot is expressed in dB/Hz unit. We can see from the plot that the \(constant \; power = 10 log_{10}(\sigma^2) = 10 log_{10}(4) = 6\; dB\).

Application

In channel modeling, we often come across additive white Gaussian noise (AWGN) channel. To know more about the channel model and its simulation, continue reading this article: Simulate AWGN channel in Matlab & Python.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References:

[1] Robert Grover Brown, Introduction to Random Signal Analysis and Kalman Filtering. John Wiley and Sons, 1983.↗
[2] Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. WCB/McGraw-Hill, 1991.↗

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing