Capacity of a MIMO system over Fading Channels

(3 votes, average: 4.67 out of 5)
As reiterated in the previous posts, a MIMO system is used to increase the capacity dramatically and also to improve the quality of a communication link. Increased capacity is obtained by spatial multiplexing and increased quality is obtained by diversity techniques (Space time coding). Capacity equations of a MIMO system over a variety of channels (AWGN, fading channels) is of primary importance. It is desirable to know the capacity improvements offered by a MIMO system over the conventional SISO system. The capacity equations for a conventional SISO system over AWGN and fading channels were discussed here. As a pre-requisite, readers are encouraged to go through the detailed discussion on channel capacity and Shannon’s Theorem too.

If you want to know about simulating a digital communication system in Matlab, check out this ebook.

For those who are directly jumping here (without reading the article on SISO channel capacity), a few definitions are given below. Others can conveniently skip these definitions.

Entropy

The average amount of information per symbol (measured in bits/symbol) is called Entropy. Given a set of $$N$$ discrete information symbols – represented as random variable $$\mathbf{X} \in \{x_1,x_1,…,x_{N} \}$$ having probabilities denoted by a Probability Mass Function $$p(x)=\{p_1,p_2,…,p_N \}$$, the entropy of $$X$$ is given by

$$h(\mathbf{X}) = \sum_{i=1}^{N} p_i log_2 \left[\frac{1}{p_i} \right] = \sum_{x \in X } p(x)log_2 \left[\frac{1}{p(x)} \right] = – \sum_{x \in X } p(x)log_2 p(x) \;\;\;\;\;\;\;\; (1)$$

Entropy is a measure of uncertainty of a random variable $$\mathbf{X}$$, therefore reflects the amount of information required on an average to describe the random variable. In general, it has the following bounds

$$0 < h(\mathbf{X}) < log_2(N) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (2)$$

Entropy hits the lower bound of zero (no uncertainty, therefore no information) for a completely deterministic system (probability of correct transmission $$p_i=1$$). It reaches the upper bound when the input symbols $$x_i$$ are equi-probable.

Capacity and mutual information

Following figure represents a discrete memoryless (noise term corrupts the input symbols independently) channel, where the input and output are represented as random variables $$\mathbf{X}$$ and $$\mathbf{Y}$$ respectively. Statistically, such a channel can be expressed by transistion or conditional probabilities. That is, given a set of inputs to the channel, the probability of observing the output of the channel is expressed as conditional probability $$p(\mathbf{Y}/\mathbf{X})$$

For such a channel, the mutual information $$I(\mathbf{X};\mathbf{Y})$$ denotes the amount of information that one random variable contains about the other random variable

$$I(\mathbf{X};\mathbf{Y}) = h(\mathbf{X}) – h(\mathbf{Y}|\mathbf{X}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (3)$$

$$h(\mathbf{X})$$ is the amount of information in $$\mathbf{X}$$ before observing $$\mathbf{Y}$$ and thus the above quantity can be seen as the reduction of uncertainty of $$\mathbf{X}$$ from the observation of $$\mathbf{Y}$$.

The information capacity $$C$$ is obtained by maximizing this mutual information taken over all possible input distributions $$p(x)$$ [1].

$$C = \max_{p(x)} I(\mathbf{X};\mathbf{Y}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (4)$$

A $$N_t \times N_r$$ MIMO system over a flat fading channel can be represented in the complex baseband notation.

$$\mathbf{y} = \mathbf{H} \mathbf{x} + \mathbf{n} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (5)$$

where,
$$\mathbf{y}$$ – received response from the channel – dimension $$(N_r \times 1)$$
$$\mathbf{H}$$ – the complex channel matrix of dimension $$(N_r \times N_t)$$
$$\mathbf{x}$$ – vector representing transmitted signal – dimension $$(N_t \times 1)$$. Assuming Gaussian signals i.e, $$\mathbf{x} \sim \mathcal{N}(0,\mathbf{K}_x)$$, where $$\mathbf{K}_x \triangleq E[\mathbf{x}\mathbf{x}^H]$$is the covariance matrix of the transmit vector $$\mathbf{x}$$
$$N_t$$ – the number of transmit antennas
$$N_r$$ – the number of receive antennas
$$\mathbf{n}$$ – complex baseband additive white Gaussian noise vector of dimension $$(N_r \times 1)$$. It is assumed that the noise is spatially white $$n \sim (0,\mathbf{K}_n)$$ where $$\mathbf{K}_n$$ is the covariance matrix of noise.

Note: The trace of the covariance matrix of the transmit vector gives the average transmit power, $$Tr(\mathbf{K}_x) = E[\left \| \mathbf{x} \right \| ^2] \leq P_t$$, where $$P_t$$ is the transmit power constraint applied at the transmitter.

Signal Covariance Matrices

It was assumed that the input signal vector $$\mathbb{x}$$ and the noise vector $$\mathbb{n}$$ are uncorrelated. Therefore, the covariance matrix of the received signal vector is given by

\begin{align} E[\mathbf{Y}\mathbf{Y}^H] &= E[(\mathbf{H}\mathbf{X}+\mathbf{N})(\mathbf{H}\mathbf{X}+\mathbf{N})^H] \\ &= \mathbf{H} \mathbf{K}_x \mathbf{H}^H + \mathbf{K}_n \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (6) \end{align}

In the above equation, the $$^H$$ operator on the matrices denote Hermitian transpose operation. Thus, there are three covariance matrix involved here

$$\mathbf{K}_x = E[\mathbf{X}\mathbf{X}^H]$$ – Covariance matrix of input signal vector
$$\mathbf{K}_y = E[\mathbf{Y}\mathbf{Y}^H]$$ – Covariance matrix of channel response vector
$$\mathbf{K}_n = E[\mathbf{N}\mathbf{N}^H]$$ – Covariance matrix of noise vector

Channel State Information

The knowledge of the channel matrix $$\mathbf{H}$$, at the transmitter is called Channel State Information at the Transmitter (CSIT). If the receiver knows about the present state of the channel matrix, that knowledge is called Channel State Information at the Receiver (CSIR). Click here for more information on CSI and another related concept called condition number.

MIMO capacity discussion for CSIT known and unknown cases at the transmitter will be discussed later.

Capacity with transmit power constraint

Now, we would like to evaluate capacity for the most practical scenario, where the average power, given by $$P = Tr(\mathbf{K}_x) = E[\left \| \mathbf{x} \right \| ^2]$$, that can be expensed at the transmitter is limited to $$P_t$$. Thus, the channel capacity is now constrained by this average transmit power, given as

$$C = \max_{p(x) \;,\; P \leq P_t} I(X;Y) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (7)$$

For the further derivations, we assume that the receiver possesses perfect knowledge about the channel. Furthermore, we assume that the input random variable $$X$$ is independent of the noise $$N$$ and the noise vector is zero mean Gaussian distributed with covariance matrix $$\mathbf{K}_n$$ -i.e, $$N \sim \mathcal{N}(0, \mathbf{K}_n)$$.

Note that both the input symbols in the vector $$\mathbf{x}$$ and the output symbols in the vector $$\mathbf{y}$$ take continuous values upon transmission and reception and the values are discrete in time (Continuous input Continuous output discrete Memoryless Channel – CCMC). For such continuous random variable, differential entropy – $$h_d(.)$$ is considered. Expressing the mutual information in terms of differential entropy,

\begin{align} I(\mathbf{X};\mathbf{Y}) &=h_d(\mathbf{Y}) – h_d(\mathbf{Y}|\mathbf{X}) \\ &=h_d(Y) – h_d(\mathbf{H}\mathbf{X} + \mathbf{N}|\mathbf{X}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (8) \end{align}

Since it is assumed that the channel is perfectly known at the receiver, the uncertainty of the channel $$h$$ conditioned on $$X$$ is zero, i.e, $$h_d ( \mathbf{H} \mathbf{X} = 0$$ . Furthermore, it is assumed that the noise $$\mathbf{N}$$ is independent of the input $$\mathbf{X}$$, i.e, $$h_d(\mathbf{N}|\mathbf{X}) = h_d(\mathbf{N})$$. Thus, the mutual information is

$$I(\mathbf{X};\mathbf{Y}) =h_d(\mathbf{Y}) – h_d(\mathbf{N}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (9)$$

Following the procedure laid out here, the differential entropy $$h_d (\mathbf{N})$$ is calcualted as

$$h_d(\mathbf{N})= log_2(det[ \pi e \mathbf{K}_n ]) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (10)$$

Using $$(6)$$ and the similar procedure for calculating $$h_d(\mathbf{N})$$ above , The differential entropy $$h_d (\mathbf{Y})$$ is given by

$$h_d (\mathbf{Y}) = log_2(det[ \pi e (\mathbf{H} \mathbf{K}_x \mathbf{H}^H + \mathbf{K}_n +\mathbf{K}_n) ]) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (11)$$

Substituting equations $$(10)$$ and $$(11)$$ in $$(9)$$, the capacity is given by

\begin{align} C & = h_d(\mathbf{Y}) – h_d(\mathbf{N}) \\ & = log_2 \left( det \left[ \pi e \left(\mathbf{H} \mathbf{K}_x \mathbf{H}^H + \mathbf{K}_n \right) \right] \right) – log_2 \left( det \left[ \pi e \mathbf{K}_n \right] \right ) \\ &=log_2 \left( det \left[ \mathbf{H} \mathbf{K}_x \mathbf{H}^H +\mathbf{K}_n \right] \right) -log_2 \left(det \left[ \mathbf{K}_n \right] \right) \\ &=log_2 \left( det \left[ \left( \mathbf{H} \mathbf{K}_x \mathbf{H}^H +\mathbf{K}_n \right) \left( \mathbf{K}_n \right)^{-1} \right] \right) \\ &=log_2 \left( det \left[ \left( \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \left( \mathbf{K}_n \right)^{-1} + \mathbf{I}_{N_r} \right] \right) \\ &=log_2 \left( det \left[ \mathbf{I}_{N_r} + (\mathbf{K}_n)^{-1} (\mathbf{H} \mathbf{K}_x \mathbf{H}^H ) \right] \right) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (12) \end{align}

For the case, where the noise is uncorrelated (spatially white) between the antenna branches, $$\mathbf{K}_n = \frac{1}{ \sigma_n^2 }\mathbf{I}_{N_r}$$, where $$\mathbf{I}_{N_{r}}$$ is the identity matrix of dimension $$N_r \times N_r$$.

Thus the capacity for MIMO flat fading channel can be written as
$$\boxed{C= log_2 \left[ det \left( \mathbf{I}_{N_{r}} + \frac{1}{ \sigma_n^2 } \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \right]} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (13)$$

The capacity equation $$(13)$$ contains random variables, and therefore the capacity will also be random. For obtaining meaningful result, for fading channels two different capacities can be defined.

If the CSIT is **UNKNOWN** at the transmitter, it is optimal to evenly distribute the available transmit power at the transmit antennas. That is, $$\mathbf{K}_x = \frac{P_t}{N_t} \mathbf{I}_{N_{t}}$$, where $$\mathbf{I}_{N_{t}}$$ is the identity matrix of dimension $$N_t \times N_t$$.

Ergodic Capacity

Ergodic capacity is defined as the statistical average of the mutual information, where the expectation is taken over $$\mathbf{H}$$
$$\boxed{C = \mathbb{E} \left[ log_2 \left[ det \left( \mathbf{I}_{N_{r}} + \frac{1}{ \sigma_n^2 } \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \right] \right] } \;\;\;\;\;\;\;\;\;\;\;\; (14)$$

Outage Capacity

Defined as the information rate below which the instantaneous mutual information falls below a prescribed value of probability expressed as percentage – $$q$$.

$$\boxed{P \left( \mathbb{E} \left( \left[ log_2 \left[ det \left( \mathbf{I}_{N_{r}} + \frac{1}{ \sigma_n^2 } \mathbf{H} \mathbf{K}_x \mathbf{H}^H \right) \right] \right] \right) \right) < C_{out,q \%} = q \% } \;\;\;\;\;\;\;\;\;\;\; (15)$$

A word on capacity of a MIMO system over AWGN Channels

The capacity of MIMO system over AWGN channel can be derived in a very similar manner. The only difference will be the channel matrix. For the AWGN channel, the channel matrix will be a constant. The final equation for capacity will be very similar and will follow the lines of capacity of SISO over AWGN channel.

(3 votes, average: 4.67 out of 5)

References:

[1]Andrea J. Goldsmith & Pravin P. Varaiya, Capacity, mutual information, and coding for finite-state Markov channels,IEEE Transactions on Information Theory, Vol 42, No.3, May 1996.

Ergodic Capacity of a SISO system over a Rayleigh Fading channel – Simulation in Matlab

(9 votes, average: 5.00 out of 5)
In the previous post, derivation of SISO fading channel capacity was discussed. For a flat fading channel (model shown below), with the perfect knowledge of the channel at the receiver, the capacity of a SISO link was derived as

$$C = log_2 (1 + \gamma)=log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ) \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (1)$$

where,  $$h$$ is flat fading complex channel impulse response that is held constant for each block of $$N$$ transmitted symbols, $$P_t$$ is the average input power at the transmit antenna, $$\gamma = \frac{P_t}{\sigma_n^2} \left | h \right |^2$$ is the signal-to-noise ratio (SNR) at the receiver input and $$\sigma_n^2$$ is the noise power of the channel.

Since the channel impulse response $$h$$ is a random variable, the channel capacity equation shown above is also random. To circumvent this, Ergodic channel capacity was defined along with outage capacity. The Ergodic channel capacity is defined as the statistical average of the mutual information, where the expectation is taken over $$\left | h \right |^2$$
$$C_{erg}=\mathbb{E} \left\{ log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ) \right\} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (2)$$

Jensen’s inequality [1] states that for any concave function $$f(x)$$, where $$x$$ is a random variable,

$$E[f(X)] \leq f(E[X]) \;\;\;\;\;\;\;\;\;\;\;\; (3)$$

Applying Jensen’s inequality to Ergodic capacity in equation $$(2)$$,

$$\mathbb{E} \left[ log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ) \right] \leq log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \mathbb{E} [\left | h \right |^2] \right ) \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (4)$$

This implies that the Ergodic capacity of a fading channel cannot exceed that of an AWGN channel with constant gain. The above equation is simulated in Matlab for a Rayleigh Fading channel with $$\sigma_h^2=E |h|^2=1 , \sigma_n^2=1$$and the plots are shown below.

(9 votes, average: 5.00 out of 5)

References

[1] Konstantinos G. Derpanis, Jensen’s Inequality, Version 1.0,March 12, 2005

Capacity of a SISO system over a fading channel

(5 votes, average: 4.00 out of 5)
As reiterated in the previous posts, a MIMO system is used to increase the capacity dramatically and also to improve the quality of a communication link. Increased capacity is obtained by spatial multiplexing and increased quality is obtained by diversity techniques (Space time coding). Capacity equations of a MIMO system over a variety of channels (AWGN, fading channels) is of primary importance. It is desirable to know the capacity improvements offered by a MIMO system over the conventional SISO system. To begin with, we will be looking into the capacity equations for a conventional SISO system over AWGN and fading channels followed by capacity equations for a MIMO systems. As a pre-requisite, readers are encouraged to go through the detailed discussion on channel capacity and Shannon’s Theorem.

If you want to know about simulating a digital communication system in Matlab, check out this ebook.

To begin with, clarity over few definitions are needed.

Entropy

The average amount of information per symbol (measured in bits/symbol) is called Entropy. Given a set of $$N$$ discrete information symbols – represented as random variable $$X \in \{x_1,x_1,…,x_{N} \}$$ having probabilities denoted by a Probability Mass Function $$p(x)=\{p_1,p_2,…,p_N \}$$, the entropy of $$X$$ is given by

$$H(X) = \sum_{i=1}^{N} p_i log_2 \left[\frac{1}{p_i} \right] = \sum_{x \in X } p(x)log_2 \left[\frac{1}{p(x)} \right] = – \sum_{x \in X } p(x)log_2 p(x) \;\;\;\;\;\;\;\; (1)$$

Entropy is a measure of uncertainty of a random variable $$X$$, therefore reflects the amount of information required on an average to describe the random variable. In general, it has the following bounds

$$0 < H(X) < log_2(N) \;\;\;\;\;\;\;\;\;\;\;\; (2)$$

Entropy hits the lower bound of zero (no uncertainty, therefore no information) for a completely deterministic system (probability of correct transmission $$p_i=1$$). It reaches the upper bound when the input symbols $$x_i$$ are equi-probable.

Capacity and mutual information

Following figure represents a discrete memoryless (noise term corrupts the input symbols independently) channel, where the input and output are represented as random variables $$X$$ and $$Y$$ respectively. Statistically, such a channel can be expressed by transistion or conditional probabilities. That is, given a set of inputs to the channel, the probability of observing the output of the channel is expressed as conditional probability $$p(Y/X)$$

For such a channel, the mutual information $$I(X;Y)$$ denotes the amount of information that one random variable contains about the other random variable

$$I(X;Y) = H(X) – H(Y|X) \;\;\;\;\;\;\;\;\;\;\;\; (3)$$

$$H(X)$$ is the amount of information in $$X$$ before observing $$Y$$ and thus the above quantity can be seen as the reduction of uncertainty of $$X$$ from the observation of $$Y$$.

The information capacity $$C$$ is obtained by maximizing this mutual information taken over all possible input distributions $$p(x)$$ [1].

$$C = \max_{p(x)} I(X;Y) \;\;\;\;\;\;\;\;\;\;\;\; (4)$$

A SISO fading channel can be represented as the convolution of the complex channel impulse response (represented as a random variable $$h$$) and the input $$x$$.
$$y = h \ast x + n \;\;\;\;\;\;\;\;\;\;\;\; (5)$$

Here, $$n$$ is complex baseband additive white Gaussian noise and the above equation is for a single realization of complex output $$y$$. If the channel is assumed to be flat fading or of block fading type (channel does not vary over a block of symbols), the above equation can be simply written without the convolution operation.
$$y = h x + n \;\;\;\;\;\;\;\;\;\;\;\; (6)$$

For different communication fading channels, the channel impulse response can be modeled using various statistical distributions. Some of the common distributions as Rayleigh, Rician, Nakagami-m, etc.,

Capacity with transmit power constraint

Now, we would like to evaluate capacity for the most practical scenario, where the average power, given by $$P = \sigma_x^2$$, that can be expensed at the transmitter is limited to $$P_t$$. Thus, the channel capacity is now constrained by this average transmit power, given as

$$C = \max_{p(x) \;,\; P \leq P_t} I(X;Y) \;\;\;\;\;\;\;\;\;\;\;\; (7)$$

For the further derivations, we assume that the receiver possesses perfect knowledge about the channel. Furthermore, we assume that the input random variable $$X$$ is independent of the noise $$N$$ and the noise is zero mean Gaussian distributed with variance $$\sigma_n^2$$ -i.e, $$N \sim \mathcal{N}(0,\sigma_n^2)$$.

Note that both the input symbols $$x$$ and the output symbols $$y$$ take continuous values upon transmission and reception and the values are discrete in time (Continuous input Continuous output discrete Memoryless Channel – CCMC). For such continuous random variable, differential entropy – $$H_d(.)$$ is considered. Expressing the mutual information interms of differential entropy,

\begin{align} I(X;Y) &=H_d(Y) – H_d(Y|X) \\ &=H_d(Y) – H_d(hX + N|X) \;\;\;\;\;\;\;\;\;\;\;\; (8) \end{align}

Mutual Information and differential entropy

Since it is assumed that the channel is perfectly known at the receiver, the uncertainty of the channel $$h$$ conditioned on $$X$$ is zero, i.e, $$H_d(hX)=0$$ . Furthermore, it is assumed that the noise $$N$$ is independent of the input $$X$$, i.e, $$H_d(N|X) = H_d(N)$$. Thus, the mutual information is

$$I(X;Y) =H_d(Y) – H_d(N) \;\;\;\;\;\;\;\;\;\;\;\; (9)$$

For a complex Gaussian noise $$N$$ with non-zero mean and variance $$N \sim \mathcal{N}(\mu_n,\sigma_n^2)$$, the PDF of the noise is given by

$$f_N(n) = \frac{1}{\sigma_n^2} exp\left[ – \frac{(\mu_n- n)^2}{\sigma_n^2}\right] \;\;\;\;\;\;\;\;\;\;\;\; (10)$$

The differential entropy for the noise $$H_d(N)$$ is given by

\begin{align} H_d(N) & = -\int f_N(n) \; log_2[f_N(n)]\; dn\\ &=-\int f_N(n) \; log_2\left[\frac{1}{\pi \sigma_n^2} e^{ – \frac{(\mu_n- n)^2}{\sigma_n^2}}\right]\; dn\\ &=-\int f_N(n) \; \left[-log_2(\pi \sigma_n^2) – \frac{(\mu_n- n)^2}{\sigma_n^2}log_2(e)\right]\; dn\\ &=log_2(\pi \sigma_n^2) \int f_N(n) \;dn \; + \frac{log_2(e)}{\sigma_n^2} \int (\mu_n- n)^2 f_N(n) \; dn\\ &=log_2(\pi \sigma_n^2) \; + \frac{log_2(e)}{\sigma_n^2} \int (\mu_n- n)^2 f_N(n) \; dn \;\;\;\;\;\;\; ; since\; \int f_N(n) dn=1\\ &=log_2(\pi \sigma_n^2) + log_2(e) \;\;\;\;\;\; ; where \; \int (\mu_n- n)^2 f_N(n) \; dn=\sigma_n^2\\ &=log_2(\pi e \sigma_n^2) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (11) \end{align}

This shows that the differential entropy is not dependent on the mean of $$N$$. Therefore, it is immune to translations (shifting of mean value) of the PDF. For the problem of computation of capacity,

$$C = \max_{p(x) \;,\; P \leq P_T} I(X;Y) = \max_{p(x) \;,\; P \leq P_T} \left[H_d(Y) – H_d(N) \right] \;\;\;\;\;\;\;\;\;\;\;\; (12)$$

and given the differential entropy $$H_d(N)$$, the mutual information $$I(X;Y)= H_d(Y) – H_d(N)$$ is maximized by maximizing the differential entropy $$H_d(Y)$$. The fact is, the Gaussian random variables itself are differential entropy maximizers. Therefore, the mutual information is maximized when the variable $$Y$$ is also Gaussian and therefore the differential entropy $$H_d(Y)=log_2(\pi e \sigma_y^2)$$. Where, the received average power is given by

$$\sigma_y^2 = E[Y^2] = E[(hX+N)(hX+N)^*] = \sigma_x^2 \left | h \right |^2 + \sigma_n^2 \;\;\;\;\;\;\;\;\;\;\;\; (13)$$

Thus the capacity is given by
\begin{align} C &= H_d(Y) – H_d(N)\\ & = log_2(\pi e \sigma_y^2) – log_2(\pi e \sigma_n^2)\\ & = log_2\left(\pi e \left\{\sigma_x^2 \left | h \right |^2 + \sigma_n^2 \right\} \right) – log_2(\pi e \sigma_n^2) \\ & = log_2 \left ( 1 + \frac{\sigma_x^2}{\sigma_n^2} \left | h \right |^2 \right )\\ & = log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ) \;\;\;\;\;\;\;\;\;\;\;\; (14) \end{align}

Representing the entire received signal-to-ratio as $$\gamma = \frac{P_t}{\sigma_n^2} \left | h \right |^2$$, the capacity of a SISO system over a fading channel is given by

$$\boxed{C=log_2(1+\gamma)=log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ) }$$

For the fading channel considered above, the term channel $$h$$ is modeled as a random variable. Thus, the capacity equation above is also a random variable. Thus, for fading channels two different capacities can be defined.

Ergodic Capacity

Ergodic capacity is defined as the statistical average of the mutual information, where the expectation is taken over $$\left | h \right |^2$$
$$\boxed{C_{erg}=\mathbb{E} \left\{ log_2 \left ( 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ) \right\} }$$

Outage Capacity

Definied as the information rate below which the instantaneous mutual information falls below a prescribed value of probability expressed as percentage – $$q%$$.

$$\boxed{ Pr \left( log_2 \left [ 1 + \frac{P_t}{\sigma_n^2} \left | h \right |^2 \right ] < C_{out,q \%} \right) = q\%}$$

(5 votes, average: 4.00 out of 5)

References:

[1]Andrea J. Goldsmith & Pravin P. Varaiya, Capacity, mutual information, and coding for finite-state Markov channels,IEEE Transactions on Information Theory, Vol 42, No.3, May 1996.

Channel Capacity & Shannon’s theorem – demystified

(9 votes, average: 4.67 out of 5)

This post is a part of the ebook : Simulation of digital communication systems using Matlab – available in both PDF and EPUB format.

A chapter dedicated to Shannon’s Theorem in the ebook
, focuses on the concept of  Channel capacity. The concept of Channel capacity is discussed first followed by an in-depth treatment of Shannon’s capacity for various channels

Shannon’s Theorem

How much data will a channel/medium carry in one second or what is the data rate supported by the channel? Let’s discover the answer for this question in detail.

Any discussion about the design of a communication system will be incomplete without mentioning Shannon’s Theorem. Shannon’s information theory tells us the amount of information a channel can carry. In other words it specifies the capacity of the channel. The theorem can be stated in simple terms as follows

• A given communication system has a maximum rate of information C known as the channel capacity
• If the transmission information rate R is less than C, then the data transmission in the presence of noise can be made to happen with arbitrarily small error probabilities by using intelligent coding techniques
• To get lower error probabilities, the encoder has to work on longer blocks of signal data. This entails longer delays and higher computational requirements

The Shannon-Hartley theorem indicates that with sufficiently advanced coding techniques, transmission that nears the maximum channel capacity – is possible with arbitrarily small errors. One can intuitively reason that, for a given communication system, as the information rate increases, the number of errors per second will also increase.

Shannon – Hartley Equation

Shannon-Hartley equation relates the maximum capacity (transmission bit rate) that can be achieved over a given channel with certain noise characteristics and bandwidth. For an AWGN the maximum capacity is given by (Check Appendix A1 and A2 for derivation of Shannon-Hartley equation for AWGN cannel)

\label{eq:shannon_equation}
C = B \; log_2 \left( 1 + \frac{S}{N}\right)  \;\;\;\;\;\;\;\;\;\; \rightarrow (1)

Here $$C$$ is the maximum capacity of the channel in bits/second otherwise called Shannon’s capacity limit for the given channel, $$B$$ is the bandwidth of the channel in Hertz, $$S$$ is the signal power in Watts and $$N$$ is the noise power, also in Watts. The ratio $$S/N$$ is called Signal to Noise Ratio (SNR). It can be ascertained that the maximum rate at which we can transmit the information without any error, is limited by the bandwidth, the signal level, and the noise level. It tells how many bits can be transmitted per second without errors over a channel of bandwidth $$B \; Hz$$, when the signal power is limited to $$S \; Watts$$ and is exposed to Gaussian White (uncorrelated) Noise ($$N \; Watts$$) of additive nature.

Shannon’s capacity limit is defined for the given channel. It is the fundamental maximum transmission capacity that can be achieved on a channel given any combination of any coding scheme, transmission or decoding scheme. It is the best performance limit that we hope to achieve for that channel.

The above expression for the channel capacity makes intuitive sense:

• Bandwidth limits how fast the information symbols can be sent over the given channel
• The SNR ratio limits how much information we can squeeze in each transmitted symbols. Increasing SNR makes the transmitted symbols more robust against noise. SNR is a function of signal quality, signal power and the characteristics of the channel. It is measured at the receiver’s front end
• To increase the information rate, the signal-to-noise ratio and the allocated bandwidth have to be traded against each other
• For no noise, the signal to noise ratio becomes infinite and so an infinite information rate is possible at a very small bandwidth

Thus we may trade off bandwidth for SNR. However, as the bandwidth $$B$$ tends to infinity, the channel capacity does not become infinite – since with an increase in bandwidth, the noise power also increases.

The Shannon’s equation relies on two important concepts:

• That, in principle, a trade-off between SNR and bandwidth is possible
• That, the information capacity depends on both SNR and bandwidth

It is worth to mention two important works by eminent scientists prior to Shannon’s paper [1] , which is as follows.

Edward Amstrong’s earlier work on Frequency Modulation (FM) is an excellent proof for showing that SNR and bandwidth can be traded off against each other. He demonstrated in 1936, that it was possible to increase the SNR of a communication system by using FM at the expense of allocating more bandwidth [2]

In 1903, W.M Miner in his patent (U. S. Patent 745,734 [3]), introduced the concept of increasing the capacity of transmission lines by using sampling and time division multiplexing techniques. In 1937, A.H Reeves in his French patent (French Patent 852,183, U.S Patent 2,272,070 [4]) extended the system by incorporating a quantizer, there by paving the way for the well-known technique of Pulse Coded Modulation (PCM). He realized that he would require more bandwidth than the traditional transmission methods and used additional repeaters at suitable intervals to combat the transmission noise. With the goal of minimizing the quantization noise, he used a quantizer with a large number of quantization levels. Reeves patent relies on two important facts:

• One can represent an analog signal (like speech) with arbitrary accuracy, by using sufficient frequency sampling, and quantizing each sample in to one of the sufficiently large pre-determined amplitude levels
• If the SNR is sufficiently large, then the quantized samples can be transmitted with arbitrarily small errors

It is implicit from Reeve’s patent – that an infinite amount of information can be transmitted on a noise free channel of arbitrarily small bandwidth. This links the information rate with SNR and bandwidth.

Please refer [1] and [5]  for the actual proof by Shannon. A much simpler version of proof (I would rather call it an illustration) can be found at [6] } (under the section “External Links” at the end of this ebook)

Unconstrained Shannon Limit for AWGN channel

Some general characteristics of the Gaussian channel can be demonstrated. Consider that we are sending binary digits across an AWGN channel at a transmission rate $$R$$ equal to the channel capacity $$C$$ : $$R = C$$. If the average signal power is $$S$$, then the average energy per bit is $$E_b = S/C \; (Joules per bit)$$, since the bit duration is $$1/C$$ seconds. If the one sided noise power spectral density is $$N_0/2 \; Watts/Hertz$$ (power normalized to $$1 \; \Omega$$ resistance), then the total noise power is $$N_0B \; Watts$$. The Shannon-Hartley equation becomes

\frac{C}{B} = log_2 \left( 1 + \frac{E_b}{N_0} \frac{C}{B}\right) \;\;\;\;\;\;\;\;\;\; \rightarrow (2)

Rearranging the equation,

\frac{E_b}{N_0} = \frac{B}{C} \left( 2^{\frac{C}{B}} – 1 \right) \;\;\;\;\;\;\;\;\;\; \rightarrow (3)

Letting $$C/B = \eta$$ (the spectral efficiency in $$bits/seconds/Hz$$),

\frac{E_b}{N_0} = \frac{2^ \eta – 1}{\eta} \;\;\;\;\;\;\;\;\;\; \rightarrow (4)

A snippet of Matlab code used to plot the above relationship is given below.

Check this ebook for the matlab code – Simulation of Digital Communication systems using Matlab

The plot in the following Figure, the red dashed line in the graph represents the asymptote of $$E_b/N_0$$ as the bandwidth $$B$$ approaches infinity. The asymptote is at $$E_b/N_0 = ln(2) = -1.59 \; dB$$. This value is called Shannon’s Limit or specifically Shannon’s power efficiency limit.

Let’s derive the Shannon’s power efficiency limit and check if it is equal to $$-1.59 \;dB$$. The asymptotic value (say $$x$$), that we are seeking, is the value of $$E_b/N_0$$ as the spectral efficiency $$\eta$$ approaches $$0$$.

x=\lim_{n \to 0}\left ( \frac{E_b}{N_0} \right )=\lim_{n \to 0}\left ( \frac{2^\eta -1 }{\eta}\right ) \;\;\;\;\;\;\;\;\;\; \rightarrow (5)

Let $$f(\eta)=2^\eta-1$$ and $$g(\eta)= \eta$$. As $$f(0)=g(0)=0$$ and the argument of the limit becomes indeterminate ($$0/0$$), L’Hospital’s rule can be applied in this case. According to L’Hospital’s rule, if $$\lim_{\eta \to k} f(\eta)$$ and $$\lim_{\eta \to k}g(\eta)$$are both zero or are both $$\pm \infty$$, then for any value of $$k$$

\lim_{\eta \to k} \left( \frac{f(\eta)}{g(\eta)} \right) = \lim_{\eta \to k} \left( \frac{f'(\eta)}{g'(\eta)} \right) \;\;\;\;\;\;\;\;\;\; \rightarrow (6)

Thus, the next step boils down to finding the first derivative of $$f(\eta)$$ and $$g(\eta)$$. Expressing $$2^n$$ in natural logarithm

\begin{gather}
2=e^{ln2} \nonumber \\
2^\eta=(e^{ln2} )^{\eta}=(e^{\eta ln2} ) \;\;\;\;\;\;\;\;\;\; \rightarrow (7)
\end{gather}

Let $$u= \eta ln(2)$$ and $$y=e^u$$, then by chain rule of differentiation,

f'(\eta) =\frac{d2^\eta}{d \eta} = \frac{dy}{d \eta} = \frac{dy}{du} \frac{du}{d \eta} = e^u ln(2) = e^{\eta ln 2} ln(2) \;\;\;\;\; \rightarrow (8)

Since $$g(\eta)=\eta$$, the first derivative of $$g(\eta)$$ is

g'(\eta) = 1 \;\;\;\;\;\;\;\;\;\; \rightarrow (9)

Using equations $$(8)$$ and $$(9)$$ and applying L’Hospital’s rule, the Shannon’s limit is given by

\begin{align}
x &=\lim_{\eta \to 0} \left ( \frac{f(\eta)}{g(\eta)} \right ) = \lim_{\eta \to 0} \left ( \frac{f'(\eta)}{g'(\eta)} \right ) \nonumber \\
& = \lim_{\eta \to 0} \left ( ln2\;e^{\eta ln 2} \right ) = ln(2)=0.6931 = -1.59 \; dB   \;\;\;\;\;\;\;\;\;\; \rightarrow (10)
\end{align}

Shannon’s power efficiency limit does not depend on BER. Shannon’s limit tells us the minimum possible $$E_b/N_0$$ required for achieving an arbitrarily small probability of error as $$M \to \infty$$. ($$M$$ is the number of signaling levels for the modulation technique, for BPSK $$M=2$$, QPSK $$M=4$$ and so on…).

It gives the minimum possible $$E_b/N_0$$ that satisfies the Shannon’s-Hartley law. In other words, it gives the minimum possible $$E_b/N_0$$ required to achieve maximum transmission capacity ($$R=C$$, where, $$R$$ is the rate of transmission and $$C$$ is the channel capacity). It will not specify at what BER you will get at that limit. It also will not specify which coding technique to use to achieve the limit. As the capacity is approached, the system complexity will increase drastically. So the aim of any system design is to achieve that limit. As an example, a class of codes called Low Density Parity Codes (LDPC) near the Shannon’s limit but it cannot achieve it.

The Shannon limit derived above is called absolute Shannon power efficiency Limit. It is the limit of a band-limited system irrespective of modulation or coding scheme. This is also called unconstrained Shannon power efficiency Limit. If we select a particular modulation scheme or an encoding scheme, we can calculate the constrained Shannon limit for that scheme. We will see the generic form of Shannon Equation that applies to any channel and then later develop it to find the constrained capacities for AWGN channel.

In the next sections, generic forms of unconstrained Shannon equations are discussed for different types of generic channel models. These generic equations can be utilized to find the unconstrained capacity of a particular channel type – AWGN for example (check Appendix A1, A2 and A3 for details on how to extend the generic equation to a particular channel type). Derivation of Ergodic capacity of a fading channel is available here.

(9 votes, average: 4.67 out of 5)

References :

[1] C. E. Shannon, “A Mathematical Theory of Communication”, Bell Syst. Techn. J., Vol. 27, pp.379-423, 623-656, July, October, 1948
[2]E. H. Armstrong:, “A Method of Reducing Disturbances in Radio Signaling by a System of Frequency-Modulation”, Proc. IRE, 24, pp. 689-740, May, 1936
andrews.ac.uk/~www_pa/Scots_Guide/iandm/part8/page1.html
[3] Willard M Miner, “Multiplex telephony”, US Patent, 745734, December 1903, USPTO link
[4] A.H Reeves, “Electric Signaling System”, US Patent 2272070, Feb 1942, USPTO link
[5]Shannon, C.E., “Communications in the Presence of Noise”, Proc. IRE, Volume 37 no1, January 1949, pp10-21
[6] The Scotts Guide to Electronics, “Information and Measurement”, University of Andrews – School of Physics and Astronomy – click here