Introduction to Information theorychannel capacity and models
A.J. Han VinckUniversity of Essen
May 2011
This lecture
Some models Channel capacity
Shannon channel coding theorem converse
some channel models
Input X P(y|x) output Y
transition probabilities
memoryless:
- output at time i depends only on input at time i
- input and output alphabet finite
Example: binary symmetric channel (BSC)
Error Source
+
E
X
OutputInput
EXY
E is the binary error sequence s.t. P(1) = 1-P(0) = p
X is the binary information sequence
Y is the binary output sequence
1-p
0 0
p
1 1
1-p
from AWGN to BSC
Homework: calculate the capacity as a function of A and 2
p
Other models
0
1
0 (light on)
1 (light off)
p
1-p
X Y
P(X=0) = P0
0
1
0
E
1
1-e
e
e
1-e
P(X=0) = P0
Z-channel (optical) Erasure channel (MAC)
Erasure with errors
0
1
0
E
1p
pe
e
1-p-e
1-p-e
burst error model (Gilbert-Elliot)
Error Source
RandomRandom error channel; outputs independent P(0) = 1- P(1);
BurstBurst error channel; outputs dependent
Error SourceP(0 | state = bad ) = P(1|state = bad ) = 1/2;
P(0 | state = good ) = 1 - P(1|state = good ) = 0.999
State info: good or bad
good bad
transition probability Pgb
Pbg
PggPbb
channel capacity:
I(X;Y) = H(X) - H(X|Y) = H(Y) – H(Y|X) (Shannon 1948)
H(X) H(X|Y)
notes:capacity depends on input probabilities
because the transition probabilites are fixed
channelX Y
capacity)Y;X(Imax)x(P
Practical communication system design
message estimate
channel decoder
n
Code word in
receive
There are 2k code words of length nk is the number of information bits transmitted in n channel uses
2k
Code book
Code bookwith errors
Channel capacity
Definition:
The rate R of a code is the ratio k/n, where
k is the number of information bits transmitted in n channel uses
Shannon showed that: :
for R C
encoding methods exist
with decoding error probability 0
Encoding and decoding according to Shannon
Code: 2k binary codewords where p(0) = P(1) = ½
Channel errors: P(0 1) = P(1 0) = p
i.e. # error sequences 2nh(p)
Decoder: search around received sequence for codeword
with np differences
space of 2n binary sequences
decoding error probability
1. For t errors: |t/n-p|> Є
0 for n (law of large numbers)
2. > 1 code word in region (codewords random)
nand
)p(h1n
kRfor
022
2
2)12()1(P
)RBSCC(n)R)p(h1(n
n
)p(nhk
channel capacity: the BSC
1-p
0 0
p
1 1
1-p
X Y
I(X;Y) = H(Y) – H(Y|X)
the maximum of H(Y) = 1
since Y is binary
H(Y|X) = h(p)
= P(X=0)h(p) + P(X=1)h(p)
Conclusion: the capacity for the BSC CBSC = 1- h(p)Homework: draw CBSC , what happens for p > ½
channel capacity: the BSC
0.5 1.0
1.0
Bit error p
Cha
nnel
cap
acit
y
Explain the behaviour!
channel capacity: the Z-channel
Application in optical communications
0
1
0 (light on)
1 (light off)
p
1-p
X Y
H(Y) = h(P0 +p(1- P0 ) )
H(Y|X) = (1 - P0 ) h(p)
For capacity, maximize I(X;Y) over P0
P(X=0) = P0
channel capacity: the erasure channel
Application: cdma detection
0
1
0
E
1
1-e
e
e
1-e
X Y
I(X;Y) = H(X) – H(X|Y)
H(X) = h(P0 )
H(X|Y) = e h(P0)
Thus Cerasure = 1 – e
(check!, draw and compare with BSC and Z)P(X=0) = P0
Erasure with errors: calculate the capacity!
0
1
0
E
1p
pe
e
1-p-e
1-p-e
example
Consider the following example
1/3
1/3
0
1
2
0
1
2
For P(0) = P(2) = p, P(1) = 1-2p
H(Y) = h(1/3 – 2p/3) + (2/3 + 2p/3); H(Y|X) = (1-2p)log23
Q: maximize H(Y) – H(Y|X) as a function of pQ: is this the capacity?
hint use the following: log2x = lnx / ln 2; d lnx / dx = 1/x
channel models: general diagram
x1
x2
xn
y1
y2
ym
:
:
:
:
:
:
P1|1
P2|1P1|2P2|2
Pm|n
Input alphabet X = {x1, x2, …, xn}
Output alphabet Y = {y1, y2, …, ym}
Pj|i = PY|X(yj|xi)
In general:
calculating capacity needs more theory
The statistical behavior of the channel is completely defined by the channel transition probabilities Pj|i = PY|X(yj|xi)
* clue:
I(X;Y)
is convex in the input probabilities
i.e. finding a maximum is simple
Channel capacity: converse
For R > C the decoding error probability > 0
k/n
C
Pe
Converse: For a discrete memory less channel
1 1 1 1
( ; ) ( ) ( | ) ( ) ( | ) ( ; )n n n n
n n n
i i i i i i ii i i i
I X Y H Y H Y X H Y H Y X I X Y nC
Xi Yi
m Xn Yn m‘encoder channel
channel
Source generates one
out of 2k equiprobable
messages
decoder
Let Pe = probability that m‘ m
source
converse R := k/n
Pe 1 – C/R - 1/nR
Hence: for large n, and R > C,
the probability of error Pe > 0
k = H(M) = I(M;Yn)+H(M|Yn)
Xn is a function of M Fano
I(Xn;Yn) + 1 + k Pe
nC + 1 + k Pe
1 – C n/k - 1/k Pe
We used the data processing theorem Cascading of Channels
I(X;Y)X Y
I(Y;Z)Z
I(X;Z)
The overall transmission rate I(X;Z) for the cascade cannot be larger than I(Y;Z), that is:
)Z;Y(I)Z;X(I
Appendix:
Assume:
binary sequence P(0) = 1 – P(1) = 1-p
t is the # of 1‘s in the sequence
Then n , > 0
Weak law of large numbers
Probability ( |t/n –p| > ) 0
i.e. we expect with high probability pn 1‘s
Appendix:
Consequence:
n(p- ) < t < n(p + ) with high probability
1.
2.
3.
)p(nh)p(n
)p(n2n2
pn
nn2
t
n
)p(hpn
nn2loglim 2n
1
n
)p1(log)p1(plogp)p(h 22
Homework: prove the approximation using ln N! ~ N lnN for N large.
Or use the Stirling approximation: ! 2 N NN NN e
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
Binary Entropy: h(p) = -plog2p – (1-p) log2 (1-p)
Note:
h(p) = h(1-p)
Input X Output Y
Noise
)]Noise(H)Y(H[sup:Cap
W2/S2x
)x(p
Input X is Gaussian with power spectral density (psd) ≤S/2W;
Noise is Gaussian with psd = 2noise
Output Y is Gaussian with psd = y2 = S/2W + 2
noise
Capacity for Additive White Gaussian Noise
W is (single sided) bandwidth
For Gaussian Channels: y2 =x
2 +noise2
X Y X Y
Noise
.sec/bits)W2/S
(logWCap
.trans/bits)(log
.trans/bits)e2(log))(e2(logCap
2noise
2noise
2
2noise
2x
2noise
221
2noise22
12noise
2x22
1
bits)e2(log)Z(H;e2
1)z(p 2
z2212/z
2z
2z
2
Middleton type of burst channel model
Select channel k with probability Q(k)
Transition probability P(0)
0
1
0
1
…
channel 1
channel 2
channel k has transition probability p(k)
Fritzman model:
multiple states G and only one state B
Closer to an actual real-world channel
Gn B1-p
Error probability 0 Error probability h
…G1
Interleaving: from bursty to random
Message interleaver channel interleaver -1 message
encoder decoder
bursty
„random error“
Note: interleaving brings encoding and decoding delay
Homework: compare the block and convolutional interleaving w.r.t. delay
Interleaving: block
Channel models are difficult to derive:
- burst definition ?
- random and burst errors ?
for practical reasons: convert burst into random error
read in row wise
transmit
column wise
1
0
0
1
1
0
1
0
0
1
1
0
0
0
0
0
0
1
1
0
1
0
0
1
1
De-Interleaving: block
read in column wisethis row contains 1 error
1
0
0
1
1
0
1
0
0
1
1
e
e
e
e
e
e
1
1
0
1
0
0
1
1
read out
row wise
Interleaving: convolutional
input sequence 0
input sequence 1 delay of b elements
input sequence m-1 delay of (m-1)b elements
Example: b = 5, m = 3
in
out
Top Related