As reiterated in the previous posts, a MIMO system is used to increase the capacity dramatically and also to improve the quality of a communication link. Increased capacity is obtained by spatial multiplexing and increased quality is obtained by diversity techniques (Space time coding). Capacity equations of a MIMO system over a variety of channels (AWGN, fading channels) is of primary importance. It is desirable to know the capacity improvements offered by a MIMO system over the conventional SISO system. The capacity equations for a conventional SISO system over AWGN and fading channels were discussed here. As a pre-requisite, readers are encouraged to go through the detailed discussion on channel capacity and Shannon’s Theorem too.

If you want to know about simulating a digital communication system in Matlab, check out this ebook.

For those who are directly jumping here (without reading the article on SISO channel capacity), a few definitions are given below. Others can conveniently skip these definitions.

## Entropy

The average amount of information per symbol (measured in bits/symbol) is called Entropy. Given a set of \(N\) discrete information symbols – represented as random variable \(\mathbf{X} \in \{x_1,x_1,…,x_{N} \} \) having probabilities denoted by a Probability Mass Function \(p(x)=\{p_1,p_2,…,p_N \} \), the entropy of \(X\) is given by

$$ h(\mathbf{X}) = \sum_{i=1}^{N} p_i log_2 \left[\frac{1}{p_i} \right] = \sum_{x \in X } p(x)log_2 \left[\frac{1}{p(x)} \right] = – \sum_{x \in X } p(x)log_2 p(x) \;\;\;\;\;\;\;\; (1)$$

Entropy is a measure of uncertainty of a random variable \(\mathbf{X}\), therefore reflects the amount of information required on an average to describe the random variable. In general, it has the following bounds

$$ 0 < h(\mathbf{X}) < log_2(N) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (2)$$

Entropy hits the lower bound of zero (no uncertainty, therefore no information) for a completely deterministic system (probability of correct transmission \(p_i=1\)). It reaches the upper bound when the input symbols \(x_i\) are equi-probable.

## Capacity and mutual information

Following figure represents a discrete memoryless (noise term corrupts the input symbols independently) channel, where the input and output are represented as random variables \(\mathbf{X}\) and \(\mathbf{Y}\) respectively. Statistically, such a channel can be expressed by transistion or conditional probabilities. That is, given a set of inputs to the channel, the probability of observing the output of the channel is expressed as conditional probability \(p(\mathbf{Y}/\mathbf{X})\)

For such a channel, the mutual information \(I(\mathbf{X};\mathbf{Y}) \) denotes the amount of information that one random variable contains about the other random variable

$$ I(\mathbf{X};\mathbf{Y}) = h(\mathbf{X}) – h(\mathbf{Y}|\mathbf{X}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (3)$$

\(h(\mathbf{X})\) is the amount of information in \(\mathbf{X}\) before observing \(\mathbf{Y}\) and thus the above quantity can be seen as the reduction of uncertainty of \(\mathbf{X}\) from the observation of \(\mathbf{Y}\).

The information capacity \(C\) is obtained by maximizing this mutual information taken over all possible input distributions \(p(x)\) [1].

$$ C = \max_{p(x)} I(\mathbf{X};\mathbf{Y}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (4)$$

## MIMO flat fading Channel model

A \(N_t \times N_r\) MIMO system over a flat fading channel can be represented in the complex baseband notation.

$$ \mathbf{y} = \mathbf{H} \mathbf{x} + \mathbf{n} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (5)$$

where,

\(\mathbf{y}\) – received response from the channel – dimension \((N_r \times 1)\)

\(\mathbf{H}\) – the complex channel matrix of dimension \((N_r \times N_t)\)

\(\mathbf{x}\) – vector representing transmitted signal – dimension \((N_t \times 1)\). Assuming Gaussian signals i.e, \(\mathbf{x} \sim \mathcal{N}(0,\mathbf{K}_x)\), where \(\mathbf{K}_x \triangleq E[\mathbf{x}\mathbf{x}^H]\)is the covariance matrix of the transmit vector \(\mathbf{x}\)

\(N_t\) – the number of transmit antennas

\(N_r\) – the number of receive antennas

\(\mathbf{n}\) – complex baseband additive white Gaussian noise vector of dimension \((N_r \times 1)\). It is assumed that the noise is spatially white \( n \sim (0,\mathbf{K}_n) \) where \(\mathbf{K}_n\) is the covariance matrix of noise.

**Note:** The trace of the covariance matrix of the transmit vector gives the average transmit power, \(Tr(\mathbf{K}_x) = E[\left \| \mathbf{x} \right \| ^2] \leq P_t\), where \(P_t\) is the transmit power constraint applied at the transmitter.

## Signal Covariance Matrices

It was assumed that the input signal vector \(\mathbb{x}\) and the noise vector \( \mathbb{n} \) are uncorrelated. Therefore, the covariance matrix of the received signal vector is given by

$$ \begin{align}

E[\mathbf{Y}\mathbf{Y}^H] &= E[(\mathbf{H}\mathbf{X}+\mathbf{N})(\mathbf{H}\mathbf{X}+\mathbf{N})^H] \\

&= \mathbf{H} \mathbf{K}_x \mathbf{H}^H + \mathbf{K}_n \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (6)

\end{align}

$$

In the above equation, the \(^H\) operator on the matrices denote Hermitian transpose operation. Thus, there are three covariance matrix involved here

\( \mathbf{K}_x = E[\mathbf{X}\mathbf{X}^H] \) – Covariance matrix of input signal vector

\( \mathbf{K}_y = E[\mathbf{Y}\mathbf{Y}^H]\) – Covariance matrix of channel response vector

\( \mathbf{K}_n = E[\mathbf{N}\mathbf{N}^H]\) – Covariance matrix of noise vector

## Channel State Information

The knowledge of the channel matrix \( \mathbf{H} \), at the transmitter is called * Channel State Information at the Transmitter (CSIT)*. If the receiver knows about the present state of the channel matrix, that knowledge is called

*. Click here for more information on CSI and another related concept called condition number.*

**Channel State Information at the Receiver (CSIR)**MIMO capacity discussion for CSIT known and unknown cases at the transmitter will be discussed later.

## Capacity with transmit power constraint

Now, we would like to evaluate capacity for the most practical scenario, where the average power, given by \(P = Tr(\mathbf{K}_x) = E[\left \| \mathbf{x} \right \| ^2] \), that can be expensed at the transmitter is limited to \(P_t\). Thus, the channel capacity is now constrained by this average transmit power, given as

$$ C = \max_{p(x) \;,\; P \leq P_t} I(X;Y) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (7)$$

For the further derivations, we assume that the receiver possesses perfect knowledge about the channel. Furthermore, we assume that the input random variable \(X\) is independent of the noise \(N\) and the noise vector is zero mean Gaussian distributed with covariance matrix \( \mathbf{K}_n \) -i.e, \(N \sim \mathcal{N}(0, \mathbf{K}_n) \).

Note that both the input symbols in the vector \(\mathbf{x}\) and the output symbols in the vector \(\mathbf{y}\) take continuous values upon transmission and reception and the values are discrete in time (Continuous input Continuous output discrete Memoryless Channel – CCMC). For such continuous random variable, differential entropy – \(h_d(.)\) is considered. Expressing the mutual information in terms of differential entropy,

$$ \begin{align}

I(\mathbf{X};\mathbf{Y}) &=h_d(\mathbf{Y}) – h_d(\mathbf{Y}|\mathbf{X}) \\

&=h_d(Y) – h_d(\mathbf{H}\mathbf{X} + \mathbf{N}|\mathbf{X}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (8)

\end{align}

$$

Since it is assumed that the channel is perfectly known at the receiver, the uncertainty of the channel \(h\) conditioned on \(X\) is zero, i.e, \( h_d ( \mathbf{H} \mathbf{X} = 0 \) . Furthermore, it is assumed that the noise \(\mathbf{N}\) is independent of the input \(\mathbf{X}\), i.e, \(h_d(\mathbf{N}|\mathbf{X}) = h_d(\mathbf{N}) \). Thus, the mutual information is

$$ I(\mathbf{X};\mathbf{Y}) =h_d(\mathbf{Y}) – h_d(\mathbf{N}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (9)$$

Following the procedure laid out here, the differential entropy \( h_d (\mathbf{N})\) is calcualted as

$$ h_d(\mathbf{N})= log_2(det[ \pi e \mathbf{K}_n ]) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (10) $$

Using \((6)\) and the similar procedure for calculating \(h_d(\mathbf{N})\) above , The differential entropy \( h_d (\mathbf{Y})\) is given by

$$h_d (\mathbf{Y}) = log_2(det[ \pi e (\mathbf{H} \mathbf{K}_x \mathbf{H}^H + \mathbf{K}_n +\mathbf{K}_n) ]) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (11) $$

Substituting equations \( (10) \) and \( (11) \) in \( (9) \), the capacity is given by

$$ \begin{align}

C & = h_d(\mathbf{Y}) – h_d(\mathbf{N}) \\

& = log_2 \left( det \left[ \pi e \left(\mathbf{H} \mathbf{K}_x \mathbf{H}^H + \mathbf{K}_n \right) \right] \right) – log_2 \left( det \left[ \pi e \mathbf{K}_n \right] \right ) \\

&=log_2 \left( det \left[ \mathbf{H} \mathbf{K}_x \mathbf{H}^H +\mathbf{K}_n \right] \right) -log_2 \left(det \left[ \mathbf{K}_n \right] \right) \\

&=log_2 \left( det \left[ \left( \mathbf{H} \mathbf{K}_x \mathbf{H}^H +\mathbf{K}_n \right) \left( \mathbf{K}_n \right)^{-1} \right] \right) \\

&=log_2 \left( det \left[ \left( \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \left( \mathbf{K}_n \right)^{-1} + \mathbf{I}_{N_r} \right] \right) \\

&=log_2 \left( det \left[ \mathbf{I}_{N_r} + (\mathbf{K}_n)^{-1} (\mathbf{H} \mathbf{K}_x \mathbf{H}^H ) \right] \right) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (12)

\end{align} $$

For the case, where the noise is uncorrelated (spatially white) between the antenna branches, \( \mathbf{K}_n = \frac{1}{ \sigma_n^2 }\mathbf{I}_{N_r} \), where \( \mathbf{I}_{N_{r}} \) is the identity matrix of dimension \( N_r \times N_r \).

Thus the capacity for MIMO flat fading channel can be written as

$$

\boxed{C= log_2 \left[ det \left( \mathbf{I}_{N_{r}} + \frac{1}{ \sigma_n^2 } \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \right]}

\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (13)

$$

The capacity equation \((13)\) contains random variables, and therefore the capacity will also be random. For obtaining meaningful result, for fading channels two different capacities can be defined.

If the CSIT is **UNKNOWN** at the transmitter, it is optimal to evenly distribute the available transmit power at the transmit antennas. That is, \( \mathbf{K}_x = \frac{P_t}{N_t} \mathbf{I}_{N_{t}} \), where \( \mathbf{I}_{N_{t}}\) is the identity matrix of dimension \(N_t \times N_t\).

## Ergodic Capacity

Ergodic capacity is defined as the statistical average of the mutual information, where the expectation is taken over \( \mathbf{H} \)

$$ \boxed{C = \mathbb{E} \left[ log_2 \left[ det \left( \mathbf{I}_{N_{r}} + \frac{1}{ \sigma_n^2 } \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \right] \right] } \;\;\;\;\;\;\;\;\;\;\;\; (14) $$

## Outage Capacity

Defined as the information rate below which the instantaneous mutual information falls below a prescribed value of probability expressed as percentage – \(q\).

$$ \boxed{P \left( \mathbb{E} \left( \left[ log_2 \left[ det \left( \mathbf{I}_{N_{r}} + \frac{1}{ \sigma_n^2 } \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \right] \right] \right) \right) < C_{out,q \%} = q \% } \;\;\;\;\;\;\;\;\;\;\; (15) $$

## A word on capacity of a MIMO system over AWGN Channels

The capacity of MIMO system over AWGN channel can be derived in a very similar manner. The only difference will be the channel matrix. For the AWGN channel, the channel matrix will be a constant. The final equation for capacity will be very similar and will follow the lines of capacity of SISO over AWGN channel.

## References:

[1]Andrea J. Goldsmith & Pravin P. Varaiya, Capacity, mutual information, and coding for finite-state Markov channels,IEEE Transactions on Information Theory, Vol 42, No.3, May 1996.