Information Theory

97
UNIT V INTRODUCTION TO INFORMATION THEORY

Transcript of Information Theory

Page 1: Information Theory

UNIT V

INTRODUCTION TO INFORMATION THEORY

Page 2: Information Theory

ERROR PROBABILITY In any mode of communication, as long as channel noise exists, the

communication cannot be error-free

However, in the case of digital systems, the accuracy can be improved by reducing the error probability

. For all digital systems,

Where,

bKEeP e

i

bb

SSignal powerE bit energy

Bit rate R

Page 3: Information Theory

Limitations on Minimising Error Probability

Increasing bit energy means:

Increasing the signal power ( for a given bit rate )

OR

Decreasing the bit rate ( for a given signal power )

OR

Both Due to physical limitations, cannot be increased beyond a certain limit. Therefore,

in order to reduce further, we must reduce the rate of transmission of information bits .

This implies that to obtain

In the presence of channel noise, it is not possible to obtain error-free communication.

iS bR

bR

iS

iS

ePbR

0, 0e bP we must have R

Page 4: Information Theory

Shannon Theory

Shannon, in 1948 showed that for a given channel, as long as the rate of transmission of information digits (Rb) is maintained within a certain limit

(known as Channel Capacity), it is possible to achieve error-free communication.

Therefore, in order to obtain, it is not necessary to make . It can be obtained by maintaining ,

where C = channel capacity (per sec).

The presence of random disturbance in the channel does not, by itself set any limit on the transmission accuracy. Instead it sets a limit on the information rate for which .

0eP 0bR bR C

0eP

Page 5: Information Theory

Information Capacity: Hartley’s Law

Therefore, there is a theoretical limit to the rate at which information can be sent along a channel with a given bandwidth and signal-to-noise ratio

The relationship between time, information capacity, and channel bandwidth is given by:

HARTLEY’S LAW

where I = amount of information to be sent

t = transmission time

B = channel bandwidth

k = a constant which depends on the type of data coding used and the signal-to-noise ratio of the channel

I ktB

Page 6: Information Theory

Information Capacity: Shannon Hartley Law

Ignoring the noise, the theoretical limit to the amount of data that can be sent in a given bandwidth is given by:

Shannon-Hartley Theorem

where C = information capacity in bits per second

B =the channel bandwidth in hertz

M= number of levels transmitted

22 logC B M

Page 7: Information Theory

Shannon-Hartley Law : Explanation

Consider a channel that can pass all frequencies from 0 to B hertz A simple binary signal with alternate 1’s and 0’s is transmitted. Such a signal would be a simple square wave with frequency one-half the bit-rate

Since it is square-wave, the signal will have harmonics at all odd multiples of its fundamental frequency, with declining amplitude as the frequency increases. At very low-bit rates, the output signal will be similar to the output after passage through the channel but as the bit-rate increases, the frequency of the signal also increases and more of its harmonics are filtered out making the output more and more distorted

Finally, for a bit-rate of 2B, the frequency of the input signal becomes B and only the fundamental frequency component of the input square-wave signal will pass through the channel

Input Signal Output Signal at maximum bit-rate 2B

Thus, with binary input signal , the channel capacity would be

2C B

1V

-1V

1 0 1 10 01 1

t t

Page 8: Information Theory

Multilevel Signalling

The previously discussed idea can be extended to any number of levels

Consider an input signal with 4-voltage levels, -1V, -0.5V,0.5V and 1V, then ach voltage level would correspond to two bits of information

Four-level Code

Therefore, we have managed to transmit twice as much information in the same bandwidth. However, the maximum frequency of the signal would not change.

1V

-1V

11 10 01 00

t

Page 9: Information Theory

Shannon Limit

From the previous discussion it seems that any amount of information can be transmitted through a given channel by simply increasing the number of levels. This is however, not true for because of noise

As the number of levels increase, the probability of occurrence of error due to noise also increases. Therefore, for a given noise level, the maximum data rate is given by:

where, C = information capacity in bits per second

B = bandwidth in hertz

S/N = signal-to-noise ratio

2log 1S

C BN

Page 10: Information Theory

Example

Ques: A telephone line has a bandwidth of 3.2 KHz and signal-to-noise ratio of 35 dB. A signal is transmitted down this line using a four-level code. Find the

maximum Theoretical data rate

Ans: The maximum data rate ignoring noise is given as:

The maximum data using Shannon- limit is given as:

We will take the lesser of the two results. It also implies that it is possible to increase the data rate over this channel by using more levels

2

32

2 log

=2 3.2 10 log 4

12.8 Kb/s

C B M

10/ log (35 /10) 3162S N anti

2, log (1 / )

=37.2 /

Therefore C B S N

Kb s

Page 11: Information Theory

Measure of Information

Common-Sense Measure of Information

Consider the following three hypothetical headlines in a morning paper

Tomorrow the sun will rise in the east United States invades Cuba Cuba invades the United States

Now, from the point of view of common sense, the first headline conveys hardly any information, the second one conveys a large amount of information, and the third one conveys yet a larger amount of information.

If we look in terms of the probability of occurrence of the above three events then the probability of occurence of the first event is unity, the probability of occurence of the second event is very low and the probability of occurrence of the third event is practically zero

Page 12: Information Theory

Common-Sense Measure of Information contd…

Therefore, an event with lower probability of occurrence has a greater surprise element associated with it and hence conveys larger amount of information as compared to an event with greater probability of occurrence.

Mathematically,

If P = Probability of occurrence of a message I = Information content of the message

Then,

For and for, ,1, 0P I 0,P I ,

1 logI

P

Page 13: Information Theory

The Engineering Measure of Information

From the engineering point of view, the information in a message is directly proportional to the minimum time required to transmit the message.

It implies that a message with higher probability can be transmitted in shorter time than required for a message with lower probability.

For efficient transmission shorter code words are assigned to the alphabets like a, e, o, t, etc which occur more frequently and longer code words are assigned are to alphabets like x, z, k, q, etc which occur less frequently. Therefore, the alphabets that occur more frequently in a message (i.e., the alphabets with a higher probability of occurrence) need a shorter time to transmit as compared to those with smaller probability of occurrence.

Therefore, the time required to transmit a symbol (or a message) with probability of occurrence

P is 1

logP

Page 14: Information Theory

The Engineering Measure of Information contd…

Let us consider two equiprobable binary messages m1 and m2. These two equiprobable messages require a minimum of one binary digit (which can assume two values)

Similarly, to encode 4 equiprobable messages, we require a minimum of 2 binary digits per message. Therefore, each of these four messages takes twice as much transmission time as that required by each of the two equiprobable messages and hence, it contains two times (twice) more information

Therefore, generally in order to encode each of the n equiprobable messages, the no. of binary digits required are:

2logr n

Page 15: Information Theory

The Engineering Measure of Information contd…

Since, all the messages are equiprobable, therefore, the number of binary digits required to encode each message of probability P is:

Therefore, the information content I of each message of probability P is given as:

Similarly for r-ary digits

2

1logr

P

2

2

1 log

1log

IP

I KP

2

1log ; for K=1I

P

1log r-ary unitsrI

P

Page 16: Information Theory

Units of Information

Now, it is evident that:

and in general,

2

1 1log log rI r ary units bits

P P

2, 1 logr ary unit r bits

1 log sr ary unit r s ary unit

The 10-ary unit of information is called the Hartley ( in the honour of R.V.L Hartley)

21 log 10 3.32 hartley bits bits

21 log 1.44 nat e bits bits

Page 17: Information Theory

AVERAGE INFORMATION PER MESSAGE: Entropy of a Source

Consider a memory less source (i.e., each message emitted is independent of the previous message(s)) emitting messages with probabilities respectively ( )

Now, the information content of message is given by:

Thus, the average or mean information per message emitted by the source is given by:

1 2, ,..., nm m m 1 2 3, , ,...., nP P P P1 2 .... 1nP P P

iI im

2

1log i

i

I bitsP

21 1

( ) log n n

i i i ii i

H m PI bits P P bits Entropy

Page 18: Information Theory

Maximum value of Entropy

Since, entropy is the measure of uncertainity, the probability distribution that generates maximum uncertainity will have the maximum entropy

The maximum value of is obtained as:( )H m

1 2 1

( )0, , 1, 2,..., 1 ( .... )n n

i

dH mfor i n and P P P P

dP

2 2 21

2 22 2 2 2

, ( ) log log log

log log( ) 1 1, log log log log 0

n

i i i i n ni

i i n ni n i n

i i i n

H m P P P P P P

d P P P PdH mP e P e P P

dP dP P P

2 2

2

log log 0

log 0

n i

n

i

P P

P

P

Page 19: Information Theory

Maximum value of Entropy contd…

Therefore, the maximum value of entropy occurs for equiprobable messages, i.e., when

Therefore, the maximum value of entropy = the minimum no. of binary digits required to encode the message

1 2 3

1,

i n

n

The previous equation is true if P P

P P P Pn

max1

1 1( ) log log

n

i

H m nn n

i nP P

Page 20: Information Theory

The Intuitive (Common Sense) and Engineering Interpretation of Entropy

From the engineering point of view, the information content of any message is equal to the minimum number of digits required to encode the message. Therefore,

Entropy = H(m) = The average value of the minimum number of digits required for encoding each message

Now, from the intuitive (common sense) point of view, information is considered synonymous with the uncertainity or the surprise element associated with it,, a message with lower probability of occurrence which has greater uncertainity conveys larger information.

Therefore, if , is a measure of uncertainity (unexpectedness) of the message, then,

= avg. uncertainity per message of the source that generates messages

If the source is not memory-less (i.e. a message emitted at any time is not independent

of the previous messages emitted), then the source entropy will be less than,. This is because the dependence of the previous messages reduces its uncertainity

1log

P

1

1logi

i i

PP

Page 21: Information Theory

Source Encoding

We know that the minimum number of binary digits required to encode a message is equal to the source entropy H(m) ,if all the messages are equiprobable (with probability P) . It can be shown that this result is true even for the non-equiprobable messages.

Let a source emit m messages with probabilities respectively. Consider a sequence of N messages with .Let be the number of times message occurs in this sequence. Then,

Thus, the message mi occurs times in the whole sequence of N messages

1 2, , , nm m m 1 2 3, , , , nP P P P N

iK

lim ii N

KP

N

im

iNP

Page 22: Information Theory

Source Encoding contd…

Now, consider a typical sequence Sn of N messages from the source.

Because, the N messages (of probability ) occur times respectively and because each message is independent, the probability of occurrence of a typical sequence of N messages is given by:

Therefore, the number of digits required to encode such sequence is:

1 2 3, , , , nP P P P 1 2, ,....., nNP NP NP

1 21 2( ) ( ) ( ) ......( ) NNPNP NP

N NP S P P P

1

1log

( )

1log

NN

N

N ii i

LP S

L N PP

( ) bitsNL NH m

Page 23: Information Theory

Source Encoding contd…

Therefore, the average number of digits required per message is given as:

This shows that we can encode a sequence of non-equiprobable messages by using

on an average H(m) no. of bits per message.

( ) bitsNLL H mN

Page 24: Information Theory

Compact Codes

The source coding theorem states that in order to encode a source with entropy H(m), we need to have a minimum number of H(m) binary digits per message or

r-ary digits per message.

Thus, the average wordlength of an optimum code is H(m), but to attain this length , we have to encode a sequence of N messages (N) at a time.

However, if we wish to encode each message directly without using longer sequences, then, the average length of the code word per message will be > H(m)

In practice, it is not desirable to use long sequences, as they cause transmission delay and to the equipment complexity. Therefore, it is preferred to encode messages directly, even if the price has to be paid in terms of increased wordlength.

( )rH m

Page 25: Information Theory

Huffman Code

Let us suppose that we are given a set of n messages (), then to find the Huffman Code

Step 1 All the messages are arranged in the order of decreasing probability

Step 2 The last two messages (messages with least probabilities) are then combined into one message (i.e. their probabilities are added up)

Step 3 These messages are now again arranged in the decreasing order of probability

Step 4 The whole of the process is repeated until only two messages are left (in the case of binary digits coding) or r messages are left ( in the case of r-ary digits coding).

Step 5 In the case of binary digits coding, the two (reduced) messages are assigned 0 and

1as their first digits in the code sequence (and in the case r-ary digits coding the reduced messages are assigned 0, 1… r-1).

Page 26: Information Theory

Huffman Code contd…

Step 6 We then go back and assign the numbers 0 and 1 to the second digit for the

two messages that were combined in the previous step. We regressing this way until the first column is reached.

Step 7 The optimum (Huffman) code obtained this way is called the Compact Code.

The average length of the compact code is given as:

This is compared with the source entropy given by:

1

n

i ii

L PL

1

1( ) log

n

r i ri i

H m PP

Page 27: Information Theory

Huffman Code contd…

Code Efficiency

Redundancy

( )H m

L

1

Page 28: Information Theory

Huffman Code contd…

For an r-ary code, we will have exactly r messages left in the last reduced set, if and only if, the total number of original messages is equal to r + k(r-1), where k is an integer. This is because each reduction decreases the number of messages by (r-1).Therefore, if there are k reductions; there must be r + k(r-1)

In case the original number of messages does not satisfy this condition, we must add some dummy messages with zero probability of occurrence until this condition is satisfied

For e.g., if r = 4 and the number of messages n = 6, then we must add one dummy message with zero probability of occurrence to make the total number of messages 7, i.e., [4 + 1(4 – 1)]

Page 29: Information Theory

EXERCISE-1

Ques. 1 Obtain the compact (Huffman) Code for the following set of messages:

Messages Probabilities

m1 0.30

m2 0.15

m3 0.25

m4 0.08

m5 0.10

m6 0.12

Page 30: Information Theory

EXERCISE-1

Ans. 1 The optimum Huffman code is obtained as follows:

Messages Probabilities S1 S2 S3 S4

m1 0.30 00 0.30 00 0.30 00 0.43 1 0.57 0

m2 0.25 10 0.25 10 0.27 01 0.30 00

0.43 1

m3 0.15 010 0.18 11 0.25 10 0.27 01

m4 0.12 011 0.15 010 0.18 11

m5 0.10 110 0.12 011

m6 0.08 111

Page 31: Information Theory

Ans. 1 contd…

The average length of the compact code is given by:

The entropy H(m) of the source is given by:

Code Efficiency =

Redundancy =

1

(0.30 2) (0.25 2) (0.15 3) (0.12 3) (0.10 3) (0.08 3)

= 0.60 + 0.50 + 0.45 + 0.36 + 0.30 + 0.24

= 2.45

i in

L PL

bits

( )0.976

H m

L

1 0.024

21

1( ) log

= 0.521089678 0.5 0.4105448 0.3670672 0.3321928+0.2915085

=2.418

in i

H m PP

bits

Page 32: Information Theory

EXERCISE-1 contd…

Ques. 2 A zero memory source emits six messages as shown Find the 4-ary (quaternary) Huffman Code. Determine its avg. wordlength, the efficiency and the redundancy.

Messages Probabilities

m1 0.30

m2 0.15

m3 0.25

m4 0.08

m5 0.10

m6 0.12

Page 33: Information Theory

EXERCISE-1 contd…

Ans. 2 The optimum Huffman code is obtained as follows:

Messages Probabilities S1

m1 0.30 0 0.30 0

m2 0.25 2 0.30 1

m3 0.15 3 0.25 2

m4 0.12 10 0.15 3

m5 0.10 11

m6 0.08 12

m7 0.0 13

Page 34: Information Theory

Ans. 2 contd…

The average length of the compact code is given by:

The entropy H(m) of the source is given by:

Code Efficiency =

Redundancy =

1

(0.30 1) (0.25 1) (0.15 1) (0.12 2) (0.1 2) (0.08 2)

= 1.3 4-ary

i in

L PL

bits

41

1( ) log

=1.209 4-ary

in i

H m PP

bits

( )0.93

H m

L

1 0.07

Page 35: Information Theory

EXERCISE-1 contd…

Ques. 3 A zero memory source emits messages and with probabilities 0.8 and 0.2 respectively. Find the optimum binary code for the source as well as for the second and third order extensions ( i.e. N=2 and 3)

(Ans. For N=1 H(m) = 0.72 ; η = 0.72

For N=2 L’ = 1.56 ; η = 0.923

For N=3 L’’ = 2.184 ;η = 0.989 )

Page 36: Information Theory

Shannon Fano Code

An efficient code can be generated by the following simple procedure known as the Shannon-Fano algorithm:

Step 1 List the source symbols in decreasing probability

Step 2 Partition the set into two sets that are as close to equiprobable as possible Assign 0 to the upper set and 1 to the lower set.

Step 3 Continue this process, each time partitioning the sets with as nearly equal

probabilities as possible until further partitioning is not possible.

Page 37: Information Theory

EXERCISE-2

Ques. 1 Obtain Shannon-Fano Code for the following given set of messages from a memory-less source

Messages Probabilities

X1 0.30

X2 0.25

X3 0.20

X4 0.12

X5 0.08

X6 0.05

ix ( )iP x

Page 38: Information Theory

EXERCISE-2 contd…

Ans. 2

Page 39: Information Theory

Ans. 2 contd…

The optimum wordlength of the code word is:

2 0.3 2 0.25 2 0.20 3 0.12 4 0.08 4 0.05

= 2.38

L

2 2

2 2 2 2

( ) (0.30 log 0.30 0.25 log 0.25 0.20

log 0.20 0.12 log 0.12 0.08 log 0.08 0.05 log 0.05)

= 2.0686389

H m

( )0.86

H mefficiency

L

1 0.1308239

Page 40: Information Theory

Some more examples

Ques. 2 A memory-less source emits messages with

(i) Construct a Shannon-Fano Code for X and calculate efficiency of the code

(ii) Repeat for the Huffman Code and compare the results.

1 2 3 4 5, , , and x x x x x

1( ) 0.4,P x 2 3 4 5( ) 0.19, ( ) 0.16, ( ) 0.15 ( ) 0.1P x P x P x and P x

Page 41: Information Theory

Ans. 2

The Shannon-Fano Code can be obtained as:1x2x3x4x5x

Page 42: Information Theory

Ans. 2 contd…

The optimum wordlength of the code word is:

2 (0.40 0.19 0.16) 3 (0.15 0.10)

= 2.25

L

2 2

2 2 2

( ) (0.40 log 0.40 0.19 log 0.19 0.16

log 0.16 0.15 log 0.15 0.10 log 0.10)

= 2.1497523

H m

( )0.9554454

H mefficiency

L

1 0.0445546

Page 43: Information Theory

Ans. 2 contd…

Now, let us generate the Huffman Code and compare the results. Huffman code is generated as:

Messages Probabilities S1 S2 S3

m1 0.40 1 0.40 1 0.40 1 0.60 0

m2 0.19 000 0.25 01 0.35 00 0.40 1

m3 0.16 001 0.19 000 0.25 01

m4 0.15 010 0.16 001

m5 0.10 011

Page 44: Information Theory

Ans. 2 contd…

The average length of the compact code is given by:

Therefore, we find that the efficiency of Huffman Code is better than that of Shannon Fano Code

1

0.40 (0.19 0.16 0.15 0.10) 3

= 2.2

i in

L PL

bits

( )0.9771601

H mefficiency

L

Page 45: Information Theory

Some more examples contd…

Ques. 3 Construct Shannon-Fano Code and Huffman Code for five equiprobable messages emitted from a memory-less

source with probability P=0.2

Ans. 3

Page 46: Information Theory

Ans. 3 contd…

The optimum wordlength of the code word is:

(0.2 2) 3 (0.2 3) 2

= 2.4

L

2( ) (0.2 log 0.2 5)

= 2.3219281

H m

( )0.96747

H mefficiency

L

1 0.03253

Page 47: Information Theory

Ans. 3 contd…

Now, Huffman code is generated as:

Messages Probabilities S1 S2 S3

m1 0.20 01 0.40 1 0.40 1 0.60 0

m2 0.20 000 0.20 01 0.40 00 0.40 1

m3 0.20 001 0.20 000 0.20 01

m4 0.20 10 0.20 001

m5 0.20 11

Page 48: Information Theory

Ans. 3 contd…

The average length of the compact code is given by:

The efficiency is the same for both the codes

1

(0.2 3) 2 (0.2 2) 3

= 2.4

i in

L PL

bits

( )0.96747

H mefficiency

L

Page 49: Information Theory

Construct Shannon-Fano Code and Huffman Code for the following:

Messages Probabilities

m1 0.50

m2 0.25

m3 0.125

m4 0.125

Ans: For Shannon Fano Code: H (m) =1.75; L=1.75; η=1

Messages Probabilities

m1

m2

m3

m4

m5

m6

1

2

1

4

1

8

1

16

1

32

1

32

Ans: For Shannon Fano Code: H (m) =1.9375; L=1.9375; η=1)

1. 2. 3. Messages Probabilities

m1

m2

m3

m4

m5

1

31

9

1

91

91

9

Ans: For Shannon Fano Code:

H (m) =4/3; L=4/3; η=1

Page 50: Information Theory

CHANNEL CAPACITY OF A DISCRETE MEMORYLESS CHANNEL

Let a source emit symbols . The receiver receives symbols The set of received symbols may or may not be identical to the transmitted set.

If the channel is noiseless The reception of some symbol yj uniquely determines the message transmitted

Because of noise however, there is a certain amount of uncertainity regarding the transmitted symbol when yj is received

If represents the conditional probabilities that xi was transmitted when yj is received, then there is an uncertainity of about xi when yj is received

Thus, the average loss of information over the transmitted symbol when yj is received is given as:

1 2, ,...., rx x x 1 2, ,....., sy y y

( )i jP x ylog[1/ ( )]i jP x y

1( ) ( ) log

( )j i ji i j

H x y P x y bits per symbolP x y

Page 51: Information Theory

Contd…

When this uncertainity is averaged over all x i and yj, we obtain the average uncertainity about a transmitted symbol when a symbol is received which is denoted as:

This uncertainity is caused by channel noise. Hence, it is the average loss of information about a transmitted symbol when a symbol is received. Therefore, is also called equivocation of x with respect to y

( ) ( ) ( ) j jj

H x y P y H x y bits per symbol

Joint Probability =P(x , )

1( ) ( ) log

( )i j

j i ji j i j

y

P y P x yP x y

1( ) ( , ) log

( )i ji j i j

H x y P x y bits per symbolP x y

Page 52: Information Theory

Contd…

If the channel is noiseless

= probability that yj is received when xi is transmitted

This is characteristic of the channel and the receiver. thus, a given channel (with its receiver) is specified by the channel matrix

( ) 0H x y

( )j iP y x

Page 53: Information Theory

Contd…

We can obtain reverse conditional probabilities using Bayes’ Rule:( )i jP x y

Page 54: Information Theory

DISCRETE MEMORYLESS CHANNELS

Channel Representation

Page 55: Information Theory

Contd…

The figure shows a Discrete Memoryless Channel

It is a statistical model with an input X and output Y

During each unit of time (signaling interval), the channel accepts an input symbol from X, and in response it generates an output symbol from Y.

The channel is discrete when the alphabets of both X and Y are finite

It is memoryless when the current output depends on only the current input and not on any of the previous inputs.

Page 56: Information Theory

The Channel Matrix

As discussed earlier a channel is completely specified by the complete set of transition probabilities. Accordingly, a DMC channel shown above is specified by a matrix of transition probabilities [P ( )], given by:

Since each input to the channel results in some output, each row of the channel matrix must add up to unity ,i.e.,

Y X

1

( ) 1 i jj

P y x for all i

Page 57: Information Theory

Contd…

If the input probabilities P(X) are represented by the row matrix, then we have:

and the output probabilities P(Y) are represented by the row matrix as:

1 2[ ( )] [ ( ) ( ) ... ( )]mP X P x P x P x

Page 58: Information Theory

Contd…

If P(X) is represented as a diagonal matrix, then we have:

Then,

=Joint Probability Matrix and

The element is the joint probability of transmitting x i and receiving yj

[ ( , )] [ ( )] [ ( )]dP X Y P X P Y X

( , )i jP x y

Page 59: Information Theory

SPECIAL CHANNELS

Lossless Channel

A channel described by a channel matrix with only one non-zero element in each column is called a lossless channel

A lossless channel is represented as:

Therefore, in the lossless transmission, no source information is lost in transmission.

Page 60: Information Theory

Deterministic Channel

A channel described by a channel matrix with only one non-zero element in each row is called a deterministic channel

A deterministic channel is represented as:

Since each row has only one non-zero element, therefore, this element must be unity. Thus, when a given symbol is sent in a deterministic channel, it is clear which output symbol will be received.

Page 61: Information Theory

Noiseless Channel

A channel is called noiseless if it is both lossless and deterministic.

A noiseless channel is represented as:

Therefore, the channel matrix has only one element in each row and in each column, and this element is unity. Also the number of input and output symbols are the same i.e. m = n for a noiseless channel.

Page 62: Information Theory

Binary Symmetric Channel (BSC)

A binary symmetric channel is represented as:

This channel has two inputs ( ) and two output ( ).

It is symmetric channel because the probability of receiving a 1 if a 0 is sent is the same as the probability of receiving a 0 if a 1 sent.

This common transition probability is denoted by p.

1 20, 1x x 1 20, 1y y

Page 63: Information Theory

EXERCISE-3

Ques. 1 Consider a binary channel as shown:

Find the channel matrix of the channel Find P(y1) and P(y2) when P(x1)= P(x2)=0.5 Find the Joint Probabilities P(x1, y2), P(x2, y1)

Page 64: Information Theory

Ans. 1

The Channel Matrix can be given as:

The Output Probability Matrix is given as:

Now, the Joint Probability Matrix is given as:

Page 65: Information Theory

EXERCISE-3 contd…

Ques. 2 Two binary channels discussed in the previous question are connected in cascade as shown:

Find the overall channel matrix of the resultant channel, and draw the equivalent channel diagram

Find P(z1) and P(z2) when P(x1)= P(x2)=0.5

Page 66: Information Theory

Ans. 2

We have

The resultant equivalent channel diagram is shown as:

Page 67: Information Theory

Ans. 2 contd…

Page 68: Information Theory

EXERCISE-3 contd…

Ques. 3 A channel has the following channel matrix:

Draw the Channel diagram

If the source has equally likely outputs, compute the probabilities associated with the channel outputs for p = 0.2

Page 69: Information Theory

Ans. 3

Above shown is a Binary Erasure Channel with two inputs, X1=0;X2=1and 3 outputs y1=0 ,y2=e and y3=1; where e indicates an erasure which means that the output is in doubt, and hence it should be erased

The output matrix for the above given channel at p=0.2 can be given as:

Page 70: Information Theory

ERROR-FREE COMMUNICATION OVER A NOISY CHANNEL

We know that messages from a source with entropy H(m) can be encoded by using an average of H(m) digits per message. This encoding has, however, zero redundancy.

Hence, if we transmit these coded messages over a noisy channel, some of the information will be received erroneously. Therefore, we cannot have error-free communication over a noisy channel when the messages are encoded with zero redundancy. Redundancy in general helps combat noise.

A simple example of the use of redundancy is the Single Parity Check Code in which an extra binary digit is added to each code word to ensure that the total number of 1’s in the resulting codeword is always even (or odd). If a single error occurs in the received code-word, the parity is violated, and the receiver requests retransmission.

Page 71: Information Theory

Error-Correcting Codes

The two important types of Error correcting codes include:

Block Codes Convolutional Codes

Page 72: Information Theory

Block and Convolutional codes

Block Codes

In block codes, a block of k data digits is encoded by a code word of n digits (n>k), i.e., for each sequence of k data digits, there is a distinct code word of n digits

In block codes, k data digits are accumulated and then encoded into a n-digit code word.

If k data digits are transmitted by a

code word of n digits, the number of check digits is m=n – k.

The Code Efficiency (also known as the code rate) = .

Such a code is called (n, k) code .

Convolutional Codes

In Convolutional or recurrent codes, the coded sequence of n digits depends not only on the k data digits but also on the previous N-1 data digits (N>1). Hence, the coded sequence for a certain k data digits is not unique but depends on N-1 earlier data digits

In Convolutional codes, the coding is done on a continuous, or running basis

k

n

Page 73: Information Theory

LINEAR BLOCK CODES

A code word comprising of n-digits and a data word comprising of k digits can be represented by row matrices as:

c = ( ) d = ( )

Generally, in linear block codes, all the n digits of c are formed by linear combinations (modulo-2 additions) of k data digits

A special case where and the remaining digits from are linear combinations of is known as systematic code

1 2 3, , ,..., nc c c c1 2 3, , ,..., kd d d d

1 2 3, , ,..., nc c c c

1 2 3, , ,..., nc c c c

1 2 3, , ,..., kd d d d

1 1 2 2, ,..., k kc d c d c d 1 to k nc c

1 2 3, , ,..., kd d d d

Page 74: Information Theory

LINEAR BLOCK CODES

For linear Block Codes:

Minimum Distance between code words: Dmin

Number of errors that can be detected:Dmin-1

Number of errors that can be corrected:

Dmin-1/2 If Dmin is odd

Dmin-2/2 If Dmin is even

Page 75: Information Theory

In a systematic code, the first k digits of a code word are the data digits and the last m = n – k digits are the parity-check digits, formed by linear combination of data digits :1 2 3, , ,..., kd d d d

Page 76: Information Theory

Example 1:

For a (6, 3) code, the generator matrix is

For all eight possible data words find the corresponding code words, and verify that this code is a single-error correcting code.

Page 77: Information Theory

Solution

Page 78: Information Theory

Solution contd… :Decoding

Since the modulo-2 sum of any sequence with itself is zero, we get:

Page 79: Information Theory

Decoding

Page 80: Information Theory

Decoding contd…

But because of possible channel errors, is in general a non-zero row vector s called the Syndrome

Therefore, from the received word r we can get s and hence the error word e i .But this procedure does not give a unique solution because r can be expressed in terms of other code words other than ci.

Page 81: Information Theory

Decoding contd…

Since for k-dimensional data words there are code words , the equation is satisfied by error vectors.

For e .g,

If d = 100 corresponding to it c = 100101 and an error occurred in the third digit, then, r =101101 and e=001000

But in the case of c = 101011 and e = 001000 also the received word would be r = 101101.

Similarly, for c = 110110 and e = 011011, again, the received word would be r =

101101

Therefore in the case of 3-bit data, there are 8 possible data words, 8 corresponding code words and hence 8 possible error vectors for each received word.

Maximum-likelihood Rule

If we receive r, then we decide in favour of that c for which r is most likely to be received. , i.e., c corresponding to that e which represents minimum bit errors

2k Ts eH2k

Page 82: Information Theory

Example-2 contd…

A (6, 3) code is generated according to the generating matrix in the

previous example. The receiver receives r = 100011. Determine the corresponding data word d.

Page 83: Information Theory

Solution

Page 84: Information Theory

Solution contd…

Page 85: Information Theory

CYCLIC CODES

Cyclic codes area subclass of linear block codes.

In Linear block codes, the procedure for selecting a generator matrix is relatively easy for single-error correcting codes. However it cannot carry us very far in constructing higher order error correcting codes. Cyclic codes have a fair amount of mathematical structure that permits us the design of higher order correcting codes.

For Cyclic codes, encoding and syndrome calculations can be easily implemented using simple shift registers.

Page 86: Information Theory
Page 87: Information Theory

CYCLIC CODES contd…

One of the important properties of code polynomials is that when is divided by

, the remainder is .

This property can be easily verified as:

( )ix c x

1nx ( ) ( )ic x

Page 88: Information Theory

Proof:

Consider a polynomial

This is a polynomial of degree n-1 or less. There are a total of 2k such polynomials corresponding to 2k data vectors

Thus, we obtain a linear (n, k) code generated by (A) .

Now, let us prove that this code generated is indeed cyclic

Page 89: Information Theory

Proof contd…

Page 90: Information Theory

EXERCISE-3

Ques. 1 Find a generator polynomial g(x) for a (7, 4) cyclic code and find code vectors for the following data vectors: 1010, 1111, 0001, and 1000.

Ans.1 Now, in this case n =7 and n-k =3 and

The generator polynomial should be of the order n-k=

Let us take:

For d = [1 0 1 0]

7 3 3 21 ( 1)( 1)( 1)x x x x x x

3 2( ) 1g x x x

3( )d x x x

Page 91: Information Theory

Ans. 1 contd…

Page 92: Information Theory

SYSTEMATIC CYCLIC CODES

In the previous example, the first k digits were not necessarily the data digits. Therefore, it is not a systematic code.

In a systematic code, the first k digits are the data digits and the last n-k digits are the parity check digits

In a systematic cyclic code, the code word polynomial is given as:

---- (B)

where, is the remainder when is divided by g(x)

1( ) ( ) ( )nc x x d x x

( )x ( )n kx d x

Page 93: Information Theory

Example:

Construct a systematic (7, 4) cyclic code using a generator polynomial

Page 94: Information Theory

Example: Solution contd…

Page 95: Information Theory

Cyclic Code Generation

Coding and Decoding of Cyclic codes can be very easily implemented using Shift Registers and Modulo-2 adders

Systematic Code generation involves division of by g(x) which is implemented using a shift register with feedback connections according to the generator polynomial g(x)

An encoding circuit with n-k shift registers is shown as:

( )n kx d x

Page 96: Information Theory

Cyclic Code Generation contd…

The k-data digits are shifted in one at a time at the input with the switch s held at position p1 . The symbol D represents one-digit delay

As the data digits move through the encoder, they are also shifted out onto the output line, because the first k digits of the code word are the data digits themselves

As soon as the last (or kth ) data digit clears the last [(n-k)th] register, all the registers

contain parity check digits. The switch s is now thrown to position p2,and then the parity check digits are shifted out one at a time onto the line

Every valid code polynomial c(x) is a multiple of g(x). In case of error during transmission , the received word polynomial r(x) will not be a multiple of g(x). Thus,

Page 97: Information Theory

Cyclic Code Generation contd…

Every valid code polynomial c(x) is a multiple of g(x). In case of error during transmission , the received word polynomial r(x) will not be a multiple of g(x). Thus,

1

( ) ( ) ( )( ) ( ) Re (of degree n-k or less)

( ) ( ) ( )

r(x)= c(x)+e(x)

where, e(x) = error polynomial, then since c(x) is a multiple of g(x)

( )s(x)=Rem

( )

r x s x r xm x and s x m syndrome polynomial

g x g x g x

e x

g x