Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.
-
Upload
marybeth-ross -
Category
Documents
-
view
225 -
download
0
Transcript of Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.
![Page 1: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/1.jpg)
Introduction to Information theory
A.J. Han VinckUniversity of Duisburg-Essen
April 2012
![Page 2: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/2.jpg)
2
content
Introduction Entropy and some related properties
Source coding
Channel coding
![Page 3: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/3.jpg)
3
First lecture
What is information theory about Entropy or shortest average
presentation length Some properties of entropy Mutual information Data processing theorem Fano inequality
![Page 4: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/4.jpg)
4
Field of Interest
It specifically encompasses theoretical and applied aspects of
- coding, communications and communications networks - complexity and cryptography- detection and estimation - learning, Shannon theory, and stochastic processes
Information theory deals with the problem of efficient and reliable transmission of information
![Page 5: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/5.jpg)
5
• Satellite communications:Reed Solomon Codes (also CD-Player)
Viterbi Algorithm
• Public Key Cryptosystems (Diffie-Hellman)
• Compression AlgorithmsHuffman, Lempel-Ziv, MP3, JPEG,MPEG
• Modem Design with Coded Modulation ( Ungerböck )
• Codes for Recording ( CD, DVD )
Some of the successes of IT
![Page 6: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/6.jpg)
6
OUR Definition of Information
Information is knowledge that can be used
i.e. data is not necessarily information
we:
1) specify a set of messages of interest to a receiver
2) and select a message to be transmitted
3) sender and receiver build a pair
![Page 7: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/7.jpg)
7
Communication model
sourceAnalogue to digital conversio
n
compression/reduction security
error protection
from bit to signal
digital
![Page 8: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/8.jpg)
8
A generator of messages: the discrete source
source XOutput x { finite set of messages}
Example:
binary source: x { 0, 1 } with P( x = 0 ) = p; P( x = 1 ) = 1 - p
M-ary source: x {1,2, , M} with Pi =1.
![Page 9: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/9.jpg)
9
Express everything in bits: 0 and 1
Discrete finite ensemble:
a,b,c,d 00, 01, 10, 11
in general: k binary digits specify 2k
messages
M messages need log2M bits
Analogue signal: (problem is sampling speed)
1) sample and 2) represent sample value binary
11
10
01
00t
v
Output
00, 10, 01, 01, 11
![Page 10: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/10.jpg)
10
The entropy of a sourcea fundamental quantity in Information theory
entropy
The minimum average number of binary digits needed
to
specify a source output (message) uniquely is called
“SOURCE ENTROPY”
![Page 11: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/11.jpg)
11
SHANNON (1948):
1) Source entropy:= = L
2) minimum can be obtained !
QUESTION: how to represent a source output in digital form?
QUESTION: what is the source entropy of text, music, pictures?
QUESTION: are there algorithms that achieve this entropy?
http://www.youtube.com/watch?v=z7bVw7lMtUg
M
1i
M
1i2 )i()i(P)i(Plog)i(P
![Page 12: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/12.jpg)
12
Properties of entropy
A: For a source X with M different outputs: log2M H(X) 0
the „worst“ we can do is
just assign log2M bits to each source output
B: For a source X „related“ to a source Y: H(X) H(X|Y)
Y gives additional info about X
when X and Y are independent, H(X) = H(X|Y)
![Page 13: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/13.jpg)
13
Joint Entropy: H(X,Y) = H(X) + H(Y|X)
also H(X,Y) = H(Y) + H(X|Y)
intuition: first describe Y and then X given Y
from this: H(X) – H(X|Y) = H(Y) – H(Y|X)
Homework: check the formel
![Page 14: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/14.jpg)
14
Cont.
)y|x(Plog)y,x(P)Y|X(H
)X|Y(H)X(H)x|y(Plog)y,x(P)x(Plog)x|y(P)x(P
)x|y(P)x(Plog)y,x(P)y,x(Plog)y,x(P)Y,X(H
Y,X
Y,XX Y
Y,XY,X
As a formel:
![Page 15: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/15.jpg)
15
log2M = lnM log2eln x = y => x = ey
log2x = y log2e = ln x log2e
Entropy: Proof of A
We use the following important inequalities
11
1 MnMM
Homework: draw the inequality
M-1 lnM
1-1/M
M
![Page 16: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/16.jpg)
16
Entropy: Proof of A
0)1M
1(elog
)1)x(MP
1)(x(Pelog
x2
x2
x
22 )x(MP
1ln)x(PelogMlog)X(H
![Page 17: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/17.jpg)
17
Entropy: Proof of B
0
)1)y|x(P
)x(P)(y,x(Pelog
)y|x(P
)x(Pln)y,x(Pelog
)Y|X(H)X(H
y,x2
y,x2
![Page 18: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/18.jpg)
18
The connection between X and Y
X Y
P(X=0) Y = 0
P(X=1) Y = 1
••• •••
P(X=M-1) Y = N-1
P(Y=0|X=0)
P(Y=1|X=M-1)
P(Y=0|X=M-1)
P(Y= N-1|X=1)
P(Y= N-1|X=M-1)
P(Y= N-1|X=0)
1MX
0X
1MX
0X
j)Yi,P(X
i)X|ji)P(YP(Xj)P(Y
(Bayes)
j)Y|ij)P(XP(Y
i)X|ji)P(YP(Xj)Yi,P(X
![Page 19: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/19.jpg)
19
Entropy: corrolary
H(X,Y) = H(X) + H(Y|X)
= H(Y) + H(X|Y)
H(X,Y,Z) = H(X) + H(Y|X) + H(Z|XY)
H(X) + H(Y) + H(Z)
![Page 20: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/20.jpg)
20
Binary entropy
interpretation:
let a binary sequence contain pn ones, then we can specify each sequence with
log2 2nh(p) = n h(p) bits
( )2nh pn
pn
)p(h
pn
nlog
n
1lim 2n
Homework: Prove the approximation using ln N! ~ N lnN for N large.
Use also logax = y logb x = y logba
The Stirling approximation ! 2 N NN N N e
![Page 21: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/21.jpg)
21
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
The Binary Entropy: h(p) = -plog2p – (1-p) log2 (1-p)
Note:
h(p) = h(1-p)
![Page 22: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/22.jpg)
22
homework
Consider the following figure
Y
0 1 2 3 X
All points are equally likely. Calculate H(X), H(X|Y) and H(X,Y)
3
2
1
![Page 23: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/23.jpg)
23
Source coding
Two principles:
data reduction: remove irrelevant data (lossy, gives errors)
data compression: present data in compact (short) way (lossless)
Two principles:
data reduction: remove irrelevant data (lossy, gives errors)
data compression: present data in compact (short) way (lossless)
remove irrelevance
original data compact
description
Relevant data
„unpack“ „original data“
Transmitter side
receiver side
![Page 24: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/24.jpg)
24
Shannons (1948) definition of transmission of information:
Reproducing at one point (in time or space) either exactly or approximatelya message selected at another point
Shannon uses: Binary Information digiTS (BITS) 0 or 1
n bits specify M = 2n different messages
OR
M messages specified by n = log2 M bits
![Page 25: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/25.jpg)
25
Example:
fixed length representation
00000 a 11001 y
00001 b 11010 z
- the alphabet: 26 letters, log2 26 = 5 bits
- ASCII: 7 bits represents 128 characters
![Page 26: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/26.jpg)
26
ASCII Table to transform our letters and signs into binary ( 7 bits = 128 messages)
ASCII stands for American Standard Code for Information Interchange
![Page 27: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/27.jpg)
27
Example:
suppose we have a dictionary with 30.000 words
these can be numbered (encoded) with 15 bits
if the average word length is 5, we need „on the average“ 3 bits per letter
01000100
![Page 28: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/28.jpg)
28
another example
Source output a,b, or c translate output binary
a 00
b 01
c 10
In out
improve
efficiency
In out
aaa 00000aab 00001aba 00010 ccc 11010
Efficiency = 2 bits/output symbol
improve
efficiency ?
Efficiency = 5/3 bits/output symbol
Homework: calculate optimum efficiency
![Page 29: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/29.jpg)
29
Source coding (Morse idea)
Example: A system generates
the symbols X, Y, Z, T
with probability P(X) = ½; P(Y) = ¼; P(Z) = P(T) = 1/8
Source encoder: X 0; Y 10; Z 110; T = 111
Average transm. length = ½ x 1 + ¼ x 2 +2 x 1/8 x 3 = 1¾ bit/s.
A naive approach gives X 00; Y 10; Z 11; T = 01
With average transm. length 2 bit/s.
![Page 30: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/30.jpg)
30
Example: variable length representation of messages
C1 C2 letter frequency of occurence P(*) 00 1 e 0.5
01 01 a 0.25
10 000 x 0.125
11 001 q 0.125
0111001101000… aeeqea…
Note: C2 is uniquely decodable! (check!)
![Page 31: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/31.jpg)
31
Efficiency of C1 and C2
C2 is more efficient than C1
Average number of coding symbols of C1
Average number of coding symbols of C2
2125.02125.0225.025.02L
75.1125.03125.0325.025.01L
![Page 32: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/32.jpg)
32
Source coding theorem
Shannon shows that source coding algorithms exist that have a
Unique average representation length that approaches the entropy of the source
We cannot do with less
![Page 33: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/33.jpg)
33
Basic idea cryptographyhttp://www.youtube.com/watch?v=WJnzkXMk7is
message
operation cryptogram
secret
message
operation cryptogram
secret
send
receive
open closed
closed open
![Page 34: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/34.jpg)
34
Source coding in Message encryption (1)
Part 1 Part 2 ••• Part n (for example every part 56 bits)
•••
key
••• n cryptograms,
encypher
Part 1
decypher•••
Part 2 Part n
Attacker:
n cryptograms to analyze for particular message of n parts
key
dependancy exists between parts of the message
dependancy exists between cryptograms
![Page 35: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/35.jpg)
35
Source coding in Message encryption (2)
Part 1 Part 2 ••• Part n
1 cryptogram
source encode
encypherkey
decypher
Source decode
Part 1 Part 2 ••• Part n
Attacker:
- 1 cryptogram to analyze for particular message of n parts
- assume data compression factor n-to-1
Hence, less material for the same message!
(for example every part 56 bits)
n-to-1
![Page 36: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/36.jpg)
36
Transmission of information
Mutual information definition Capacity Idea of error correction Information processing Fano inequality
![Page 37: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/37.jpg)
37
mutual information I(X;Y):=
I(X;Y) := H(X) – H(X|Y)
= H(Y) – H(Y|X) ( homework: show this! )
i.e. the reduction in the description length of X given Y
note that I(X;Y) 0
or: the amount of information that Y gives about X
equivalently: I(X;Y|Z) = H(X|Z) – H(X|YZ)
the amount of information that Y gives about X given Z
![Page 38: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/38.jpg)
38
3 classical channels
Binary symmetric erasure Z-channel
(satellite) (network) (optical)
0
X
1
0
X
1
0
X
1
0
E
1
0
Y
1
0
Y
1
Homework:
find maximum H(X)-H(X|Y) and the corresponding input distribution
![Page 39: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/39.jpg)
39
Example 1
Suppose that X Є { 000, 001, , 111 } with H(X) = 3 bits
Channel: X Y = parity of X channel
H(X|Y) = 2 bits: we transmitted H(X) – H(X|Y) = 1 bit of information!
We know that X|Y Є { 000, 011, 101, 110 } or X|Y Є { 001, 010, 001, 111 }
Homework: suppose the channel output gives the number of ones in X.What is then H(X) – H(X|Y)?
![Page 40: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/40.jpg)
40
Transmission efficiency
Example: Erasure channel
0
1
0
E
1
e
e
1-e
1-e
½
½
e
(1-e)/2
(1-e)/2
H(X) = 1 H(X|Y) = e
H(X)-H(X|Y) = 1-e = maximum!
![Page 41: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/41.jpg)
41
Example 2
Suppose we have 2n messages specified by n bits1-e
Transmitted : 0 0 e E
1 11-e
After n transmissions we are left with ne erasures Thus: number of messages we cannot specify = 2ne
We transmitted n(1-e) bits of information over the channel!
![Page 42: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/42.jpg)
42
Transmission efficiency
Easy obtainable when feedback!
0,1 0,1,E 0 or 1 received correctly
If Erasure, repeat until correct
R = 1/ T =1/ Average time to transmit 1 correct bit
= 1/ {(1-e) + 2e(1-e) + 3e2(1-e) + }= 1- e
erasure
![Page 43: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/43.jpg)
43
Transmission efficiency I need on the average H(X) bits/source output to describe the source
symbols X
After observing Y, I need H(X|Y) bits/source output
H(X) H(X|Y)
Reduction in description length is called the transmitted information
Transmitted R = H(X) - H(X|Y) = H(Y) – H(Y|X) from earlier calculations
We can maximize R by changing the input probabilities. The maximum is called CAPACITYCAPACITY (Shannon 1948)
channelX Y
![Page 44: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/44.jpg)
44
Transmission efficiency
Shannon shows that error correcting codes exist that have An efficieny k/n Capacity
n channel uses for k information symbols Decoding error probability 0
when n very large
Problem: how to find these codes
![Page 45: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/45.jpg)
45
In practice:In practice:
Transmit
0 or 1
Receive
0 or 1
0 0 correct
0 1 in - correct
1 1 correct
1 0 in - correct
What What can can we we do do about about it ?it ?
![Page 46: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/46.jpg)
46
Reliable:Reliable: 2 examples2 examples
Transmit
A: = 0 0
B: = 1 1
Receive
0 0 or 1 1 OK
0 1 or 1 0 NOK1 error detected!A: = 0 0 0
B: = 1 1 1
000, 001, 010, 100 A
111, 110, 101, 011 B
1 error corrected!
![Page 47: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/47.jpg)
47
Data processing (1)
Let X, Y and Z form a Markov chain: X Y Z
and Z is independent from X given Y
i.e. P(x,y,z) = P(x) P(y|x) P(z|y)
X P(y|x) Y P(z|y) Z
I(X;Y) I(X; Z)
Conclusion: processing destroys information
![Page 48: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/48.jpg)
48
Data processing (2)
To show that: I(X;Y) I(X; Z)
Proof: I(X; (Y,Z) ) =H(Y,Z) - H(Y,Z|X)
=H(Y) + H(Z|Y) - H(Y|X) - H(Z|YX)
= I(X; Y) + I(X; Z|Y)
I(X; (Y,Z) ) = H(X) - H(X|YZ)
= H(X) - H(X|Z) + H(X|Z) - H(X|YZ)
= I(X; Z) + I(X;Y|Z)
now I(X;Z|Y) = 0 (independency)
Thus: I(X; Y) I(X; Z)
![Page 49: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/49.jpg)
49
I(X;Y) I(X; Z) ?
The question
is: H(X) – H(X|Y) H(X) – H(X|Z) or H(X|Z) H(X|Y) ?
Proof:
1) H(X|Z) - H(X|Y) H(X|ZY) - H(X|Y) (conditioning make H larger)
2) From: P(x,y,z) = P(x)P(y|x)P(z|xy) = P(x)P(y|x)P(z|y)
H(X|ZY) = H(X|Y)
3) Thus H(X|Z) - H(X|Y) H(X|ZY) = H(X|Y) = 0
![Page 50: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/50.jpg)
50
Fano inequality (1)
Suppose we have the following situation: Y is the observation of X
X p(y|x) Y decoder X‘
Y determines a unique estimate X‘:
correct with probability 1-P;
incorrect with probability P
![Page 51: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/51.jpg)
51
Fano inequality (2)
Since Y uniquely determines X‘, we have H(X|Y) = H(X|(Y,X‘)) H(X|X‘)
X‘ differs from X with probability P
Thus for L experiments, we can describe X given X‘ by
firstly: describe the positions where X‘ X with Lh(P) bits
secondly: - the positions where X‘ = X do not need extra bits
- for LP positions we need log2(M-1) bits to specify X
Hence, normalized by L: H(X|Y) H(X|X‘) h(P) + P log2(M-1)
![Page 52: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/52.jpg)
52
Fano inequality (3)H(X|Y) h (P) + P log2(M-1)
H(X|Y)
P
log2(M-1)
log2M
(M-1)/M 10
Fano relates conditional entropy with the detection error probability
Practical importance: For a given channel, with H(X|Y) the detection error probability has a lower bound: it cannot be better than this bound!
![Page 53: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/53.jpg)
53
Fano inequality (3): example
X { 0, 1, 2, 3 }; P ( X = 0, 1, 2, 3 ) = (¼ , ¼ , ¼, ¼ )
X can be observed as Y
Example 1: No observation of X P= ¾; H(X) = 2 h ( ¾ ) + ¾ log23
Example 2: Example 3:
0 0 transition prob. = 1/3
1 1 H(X|Y) = log23
2 2 P > 0.4
3 3
0 0 transition prob. = 1/2
1 1 H(X|Y) = log22
2 2 P > 0.2
3 3
x xy y
![Page 54: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/54.jpg)
54
List decoding
Suppose that the decoder forms a list of size L.
PL is the probability of being in the list
Then
H(X|Y) h(PL ) + PLlog 2L + (1-PL) log2 (M-L)
The bound is not very tight, because of log 2L.
Can you see why?
![Page 55: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/55.jpg)
55
Fano ( http://www.youtube.com/watch?v=sjnmcKVnLi0 )
Shannon showed that it is possible to compress information. He produced examples of such codes which are now known as Shannon-Fano codes.
Robert Fano was an electrical engineer at MIT (the son of G. Fano, the Italian mathematician who pioneered the development of finite geometries and for whom the Fano Plane is named).
Robert Fano
![Page 56: Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649ec95503460f94bd7399/html5/thumbnails/56.jpg)
56
Application source coding: example MP3
1:4by Layer 1 (corresponds to 384 kbps for a stereo signal),
1:6...1:8by Layer 2 (corresponds to 256..192 kbps for a stereo signal),
1:10...1:12by Layer 3 (corresponds to 128..112 kbps for a stereo signal),
Digital audio signals:
Without data reduction, 16 bit samples at a sampling rate 44.1 kHz for Compact Discs. 1.400 Mbit represent just one second of stereo music in CD quality.
With data reduction: MPEG audio coding, is realized by perceptual coding techniques addressing the perception of sound waves by the human ear. It maintains a sound quality that is significantly better than what you get by just reducing the sampling rate and the resolution of your samples.
Using MPEG audio, one may achieve a typical data reduction of