Generate multiple sequences of correlated random variables

In the previous post, a method for generating two sequences of correlated random variables was discussed. Generation of multiple sequences of correlated random variables, given a correlation matrix is discussed here.

Correlation Matrix

Correlation matrix defines correlation among N variables. It is a symmetric N \times N matrix with the (ij)^{th} element equal to the correlation coefficient r_{ij} between the i^{th} and the j^{th} variable. The diagonal elements (correlations of variables with themselves) are always equal to 1.

Sample problem:

Let’s say we would like to generate three sets of random sequences X,Y,Z with the following correlation relationships.

  1. Correlation co-efficient between X and Y is 0.5
  2. Correlation co-efficient between X and Z is 0.3
  3. Obviously the variable X  correlates with itself 100% – i.e, correlation-coefficient is 1

Putting all these relationships in a compact matrix form, gives the correlation matrix. We take arbitrary correlation value (0.3) for the relationship between Y and Z.

\bordermatrix{ & X & Y & Z \cr X & 1 & -0.5 & .3 \cr Y & -0.5 & 1 & .2 \cr Z & .3 & .2 & 1 }

Now, the task is to generate three sets of random numbers X,Y and Z that follows the relationship above. The problem can be addressed in many ways. Two most common methods finding the solution are

  1. Cholesky Decomposition method
  2. Spectral Decomposition ( also called Eigen Vector Decomposition) method

The Cholesky Decomposition method is discussed here.

This article is part of the book
Wireless Communication Systems in Matlab (second edition), ISBN: 979-8648350779 available in ebook (PDF) format and Paperback (hardcopy) format.

Generating Correlated random number using Cholesky Decomposition:

Cholesky decomposition is the matrix equivalent of taking square root operation on a given matrix. As with any scalar values, positive square root is only possible if the given number is a positive (Imaginary roots do exist otherwise). Similarly, if a matrix need to be decomposed into square-root equivalent, the matrix need to be positive definite.

The method discussed here, seeks to decompose the given correlation matrix using Cholesky decomposition.

C = U^TU = LL^T

where U and L are upper and lower triangular matrices. We will consider Upper triangular matrix here. Equivalently, lower triangular matrix can also be used, in which case the order of output needs to be reversed.

C = U^TU

For this decomposition to work, the correlation matrix should be positive definite. The correlated random sequences R_c=[X,Y, Z] (where X,Y,Z are column vectors) that follow the above relationship can be generated by multiplying the uncorrelated random numbers R  with U .

R_c = RU

Steps to follow:

Generate three sequences of uncorrelated random numbers R – each drawn from a normal distribution. For this case, the R matrix will be of size k \times 3 where k is the number of  samples we wish to generate and we allocate the k samples in three columns, where the columns indicate the place holder for each variable X, Y and Z. Multiply this matrix with the Cholesky decomposed upper triangular version of the correlation matrix.

Python code

import numpy as np
from scipy.linalg import cholesky
from scipy.stats import pearsonr #to calculate correlation coefficient

#for plotting and visualization
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
import seaborn as sns

C = np.array([[1, -0.5, 0.3], 
              [-0.5, 1, 0.2],
              [0.3, 0.2, 1]]) #Construct correlation matrix
U = cholesky(C) #Cholesky decomposition
R = np.random.randn(10000,3) #Three uncorrelated sequences
Rc = R @ U #Array of correlated random sequences

#compute and display correlation coeff from generated sequences
def pearsonCorr(x, y, **kws): 
    (r, _) = pearsonr(x, y) #returns Pearson’s correlation coefficient, 2-tailed p-value)
    ax = plt.gca()
    ax.annotate("r = {:.2f} ".format(r),xy=(.7, .9), xycoords=ax.transAxes)
    
#Visualization
df = pd.DataFrame(data=Rc, columns=['X','Y','Z'])
graph = sns.pairplot(df)
graph.map(pearsonCorr)
Pairplot of correlated random variables generated using Cholesky decomposition (Python)
Figure 1: Pairplot of correlated random variables generated using Cholesky decomposition (Python)

Matlab code

x=[  1  0.5 0.3; 0.5  1  0.3; 0.3 0.3  1 ;]; %Correlation matrix
U=chol(x); %Cholesky decomposition 

R=randn(10000,3); %Random data in three columns each for X,Y and Z
Rc=R*U; %Correlated matrix Rc=[X Y Z]

%Verify Correlation-Coeffs of generated vectors
coeffMatrixXX=corrcoef(Rc(:,1),Rc(:,1));
coeffMatrixXY=corrcoef(Rc(:,1),Rc(:,2));
coeffMatrixXZ=corrcoef(Rc(:,1),Rc(:,3));

%Extract the required correlation coefficients
coeffXX=coeffMatrixXX(1,2) %Correlation Coeff for XX;
coeffXY=coeffMatrixXY(1,2) %Correlation Coeff for XY;
coeffXZ=coeffMatrixXZ(1,2) %Correlation Coeff for XZ;

%Scatterplots
subplot(3,1,1)
plot(Rc(:,1),Rc(:,1),'b.')
title(['Scatterd Plot - X and X calculated \rho=' num2str(coeffXX)])
xlabel('X')
ylabel('X')

subplot(3,1,2)
plot(Rc(:,1),Rc(:,2),'r.')
title(['Scatterd Plot - X and Y calculated \rho=' num2str(coeffXY)])
xlabel('X')
ylabel('Y')

subplot(3,1,3)
plot(Rc(:,1),Rc(:,3),'m.')
title(['Scatterd Plot - X and Z calculated \rho=' num2str(coeffXZ)])
xlabel('X')
ylabel('Z')

Scattered plots to verify the simulated data

Multivariate Correlated Random Numbers Cholesky Matlab
Figure 2: Scatter plots showing the correlation trend between the three random variables (Matlab)

Rate this article: PoorBelow averageAverageGoodExcellent (10 votes, average: 4.10 out of 5)

Further reading

[1] Richard Taylor, “Interpretation of correlation coefficient: A basic review”, Journal of diagnostic medical sonography, Jan/Feb 1990.↗

Topics in this chapter

Random Variables - Simulating Probabilistic Systems
● Introduction
Plotting the estimated PDF
● Univariate random variables
 □ Uniform random variable
 □ Bernoulli random variable
 □ Binomial random variable
 □ Exponential random variable
 □ Poisson process
 □ Gaussian random variable
 □ Chi-squared random variable
 □ Non-central Chi-Squared random variable
 □ Chi distributed random variable
 □ Rayleigh random variable
 □ Ricean random variable
 □ Nakagami-m distributed random variable
Central limit theorem - a demonstration
● Generating correlated random variables
 □ Generating two sequences of correlated random variables
 □ Generating multiple sequences of correlated random variables using Cholesky decomposition
Generating correlated Gaussian sequences
 □ Spectral factorization method
 □ Auto-Regressive (AR) model

Books by the author

Wireless Communication Systems in Matlab
Wireless Communication Systems in Matlab
Second Edition(PDF)

PoorBelow averageAverageGoodExcellent (162 votes, average: 3.78 out of 5)

Digital modulations using Python
Digital Modulations using Python
(PDF ebook)

PoorBelow averageAverageGoodExcellent (123 votes, average: 3.60 out of 5)

digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

PoorBelow averageAverageGoodExcellent (126 votes, average: 3.70 out of 5)

Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Post your valuable comments !!!