Lecture 7 Lossy Source Coding -...

39
Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lecture 7 Lossy Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University [email protected] December 2, 2015 1 / 39 I-Hsiang Wang IT Lecture 7

Transcript of Lecture 7 Lossy Source Coding -...

Page 1: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lecture 7Lossy Source Coding

I-Hsiang Wang

Department of Electrical EngineeringNational Taiwan University

[email protected]

December 2, 2015

1 / 39 I-Hsiang Wang IT Lecture 7

Page 2: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

The Block-to-Block Source Coding Problem

Source Encoder

Source DecoderSource Destination

s[1 : N ] b[1 : K] bs[1 : N ]

Recall: in Lecture 03, we investigated the fundamental limit of (almost)lossless block-to-block (or fixed-to-fixed) source coding.

The recovery criterion is vanishing probability of error :

limN→∞

P{

SN = SN}= 0.

The minimum compression ratio to fulfill lossless reconstruction isthe entropy rate of the source:

R∗ = H ({Si} ) , for stationary and ergodic {Si}.

2 / 39 I-Hsiang Wang IT Lecture 7

Page 3: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

The Block-to-Block Source Coding Problem

Source Encoder

Source DecoderSource Destination

s[1 : N ] b[1 : K] bs[1 : N ]

In this lecture, we turn our focus to lossy block-to-block source coding,where the setting is the same as before, except

The recovery criterion is reconstruction to with a given distortion D :

lim supN→∞

E[d(

SN, SN)]

≤ D.

The minimum compression ratio to fulfill reconstruction to within agiven distortion D is the rate-distortion function:

R(D) = minpS|S: E[d(S,S)]≤D

I(

S ; S), for DMS {Si}.

3 / 39 I-Hsiang Wang IT Lecture 7

Page 4: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Why lossy source coding?

Sometimes it might be too expensive to reconstruct the source in alossless way.

Sometimes it is impossible to reconstruct the source losslessly.For example, if the source is continuous-valued, the entropy rate ofthe source is usually infinite!

Lossy source coding has wide range of applications, includingquantization/digitization of continuous-valued signals,image/video/audio compression, etc.

In this lecture, we first focus on discrete memoryless sources (DMS).Then, we employ the discretization technique to extend the codingtheorems from the discrete-source case to the continuous-source case. Inparticular, Gaussian sources will be our main focus.

4 / 39 I-Hsiang Wang IT Lecture 7

Page 5: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossless vs. Lossy Source Coding

The general lossy source coding problem involves quantizing all possiblesource sequences sN ∈ SN into 2K reconstruction sequences sN ∈ SN,which can be represented by K bits.The goal is to design the correspondence between sN and sN so that thedistortion (quantization error) is below a prescribed level D.

Lossy source coding has a couple of notable differences from losslesssource coding:

Source alphabet S and the reconstruction alphabet S could bedifferent in general.Performance is determined by the chosen distortion measure.

5 / 39 I-Hsiang Wang IT Lecture 7

Page 6: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function

2 Proof of the Coding TheoremConverse ProofAchievability

6 / 39 I-Hsiang Wang IT Lecture 7

Page 7: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function

2 Proof of the Coding TheoremConverse ProofAchievability

7 / 39 I-Hsiang Wang IT Lecture 7

Page 8: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Distortion Measures

We begin with the definition of the distortion measure per symbol.

Definition 1 (Distortion Measure)A per-symbol distortion measure is a mapping d (s, s) that maps fromS × S to [0,∞), and it is understood as the cost of representing s by s.For two length N sequences sN and sN, the distortion between them isdefined as the average of the per-symbol distortion:

d(sN, sN) ≜ 1

N∑N

i=1 d (si, si) .

Examples: below are two widely used distortion measures:Hamming distortion: S = S, d (s, s) ≜ 1 {s = s}.

Squared-error distortion: S = S = R, d (s, s) ≜ (s − s)2.

8 / 39 I-Hsiang Wang IT Lecture 7

Page 9: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Lossy Source Coding: Problem Setup

Source Encoder

Source DecoderSource Destination

s[1 : N ] b[1 : K] bs[1 : N ]

1 A(2NR,N

)source code consists of

an encoding function (encoder) encN : SN → {0, 1}K that mapseach source sequence sN to a bit sequence bK, where K ≜ ⌊NR⌋.a decoding function (decoder) decN : {0, 1}K → SN that maps eachbit sequence bK to a reconstructed source sequence sN.

2 The expected distortion of the code D(N) ≜ E[d(

SN, SN)]

.

3 A rate-distortion pair (R,D) is said to be achievable if there exist asequence of

(2NR,N

)codes such that lim sup

N→∞D(N) ≤ D.

The optimal compression rate R(D) ≜ inf {R | (R,D) : achievable}.

9 / 39 I-Hsiang Wang IT Lecture 7

Page 10: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Rate Distortion Trade-off

Dmin ≜ mins(s)

E [d (S, s (S))]

It denotes the minimum possible targetdistortion so that the rate is finite.Even the decoder knows the entire sN

and finds a best representative sN (sN),

the expected distortion is still Dmin.

Dmax ≜ mins

E [d (S, s)] DmaxDmin

R

D

Achievable

Not Achievable

H (S )

R (D)R (Dmin)

Let s∗ ≜ arg mins

E [d (S, s)]. Then for target distortion D ≥ Dmax, we can use a

single representative s∗ ≜ (s∗, s∗, . . . , s∗) to reconstruct all sN ∈ SN (rate is 0!), and

D(N) = E[d(SN, s∗

)]= 1

N∑N

i=1 E [d (Si, s∗)] = Dmax ≤ D.

Hence, R (D) = 0 for all D ≥ Dmax.10 / 39 I-Hsiang Wang IT Lecture 7

Page 11: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Lossy Source Coding Theorem

Source Encoder

Source DecoderSource Destination

s[1 : N ] b[1 : K] bs[1 : N ]

Theorem 1 (A Lossy Source Coding Theorem for DMS)For a discrete memoryless source {Si | i ∈ N},

R(D) = minpS|S: E[d(S,S)]≤D

I(

S ; S). (1)

Interpretation:

H (S ) − H(

S∣∣∣S)

= I(

S ; S)

Uncertaintyof source S − Uncertainty of S

after learning S = The rate used incompressing S to S

11 / 39 I-Hsiang Wang IT Lecture 7

Page 12: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function

2 Proof of the Coding TheoremConverse ProofAchievability

12 / 39 I-Hsiang Wang IT Lecture 7

Page 13: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Properties of Rate Distortion Function

DmaxDmin

R

D

Achievable

Not Achievable

H (S )

R (D)R (Dmin)

A rate distortion function R (D)satisfies the following properties:

1 Nonnegative2 Non-increasing in D.3 Convex in D.4 Continuous in D.5 R (Dmin) ≤ H (S ).6 R (D) = 0 if D ≥ Dmax.

These properties are all quite intuitive.Below we sketch the proof of these properties.

13 / 39 I-Hsiang Wang IT Lecture 7

Page 14: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Monotonicity Clear from the definition.

Convexity

The goal is to prove that D1,D2 ≥ Dmin and λ ∈ (0, 1), λ ≜ 1− λ,

R(λD1 + λD2

)≤ λR (D1) + λR (D2) .

Let pi (s|s) ≜ arg minpS|S: E[d(S,S)]≤Di

I(

S ; S)

, the optimizing conditional

distribution that achieves distortion Di, for i = 1, 2. Let pλ ≜ λp1 + λp2.Under pλ (s|s), the expected distortion between S and S ≤ λD1 + λD2,

∵ Ep(S)pλ (s|s)

[d(

S, S)]

=∑

s

∑s

p (s)[λp1 (s|s) + λp2 (s|s)

]d (s, s) .

Proof is complete since I(

S ; S)

is convex in pS|S with a fixed pS:

R(λD1 + λD2

)≤ I

(S ; S

)pλ

≤ λI(

S ; S)

p1

+ λI(

S ; S)

p2

= λR (D1) + λR (D2) .

14 / 39 I-Hsiang Wang IT Lecture 7

Page 15: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Nonnegativity Clear from the definition.

Continuity It is well-known that convexity within an open interval impliescontinuity within that open interval.

DmaxDmin

R

D

violates convexity

DmaxDmin

R

D

discontinuity is possible here

violates convexity

Hence, the only point where R (D) might be discontinuous is at the boundaryD = Dmin. The proof is technical and can be found in Gallager[2].

15 / 39 I-Hsiang Wang IT Lecture 7

Page 16: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Example: Bernoulli Source with Hamming Distortion

Source (binary) Si ∈ S = {0, 1}, and Sii.i.d.∼ Ber (p) ∀ i.

Distortion (Hamming) d (s, s) = 1 {s = s}.

Example 1Derive the rate distortion function of the Bernoulli p source withHamming distortion and show that it is given by

R (D) =

{Hb (p)− Hb (D) , 0 ≤ D ≤ min (p, 1− p)

0, D > min (p, 1− p).

This is the first example about how to compute the rate distortionfunction, that is, how to solve (1) in the lossy source coding theorem.

16 / 39 I-Hsiang Wang IT Lecture 7

Page 17: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

sol. The first step is to identify Dmin and Dmax.

Dmin = 0 because one can choose s (s) = s.

Dmax = min (p, 1− p) because one can choose s =

{0 p ≤ 1

2

1 p ≥ 12

.

The next step is to lower bound I(

S ; S)= H (S )− H

(S∣∣∣S)

.

It is equivalent to upper bounding H(

S∣∣∣S)

:

H(

S∣∣∣S)

= H(

S ⊕ S∣∣∣S)

≤ H(

S ⊕ S)= Hb (q) ,

where we assume that S ⊕ S ∼ Ber (q) for some q ∈ [0, 1].

Observe that d(

S, S)≡ S ⊕ S. Hence, E

[d(

S, S)]

≤ D =⇒ q ≤ D.

Since D ≤ Dmax ≤ 12, we see that Hb (q) is maximized when q = D.

Hence, I(

S ; S)≥ Hb (p)− Hb (D).

17 / 39 I-Hsiang Wang IT Lecture 7

Page 18: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Final step: show that the lower bound Hb (p)− Hb (D) can be attained.The goal is to find a probability transition matrix p (s|s) such that

S ⊥⊥ S ⊕ S so that H(

S ⊕ S∣∣∣S )

= H(

S ⊕ S)

and P{

S ⊕ S = 1}= D.

At first glance this looks hard.The difficulty can be resolved via an auxiliary reverse channel.

Consider a channel with input S, output S, additive noise Z ∼ Ber (D)⊥⊥ S.

S = S ⊕ Z =⇒ Z = S ⊕ S.

The reverse channel specifies the joint distribution p (s, s) and hence p (s|s)!

0

1

�S S

0

1

D

1 � D

� p

1 � � 1 � p

D

1 � D

p = (1 � �)D + �(1 � D) =� � =p � D

1 � 2D

18 / 39 I-Hsiang Wang IT Lecture 7

Page 19: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Example: Gaussian Source with Squared Error Distortion

Source (Gaussian) Si ∈ S = R, and Sii.i.d.∼ N

(µ, σ2

)∀ i.

Distortion (Squared Error) d (s, s) = |s − s|2.

Example 2Derive the rate distortion function of the Gaussian source with squarederror distortion and show that it is given by

R (D) =

12 log

(σ2

D

), 0 ≤ D ≤ σ2

0, D > σ2.

Remark: Although the source is continuous, one can use weak typicalityor the discretization method used in channel coding to extend the lossysource coding theorem from discrete memoryless sources to continuous.Note: In particular, note that R (0) = ∞, which is quite intuitive!

19 / 39 I-Hsiang Wang IT Lecture 7

Page 20: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

sol. First step: identify Dmin and Dmax.

Dmin = 0 because one can choose s (s) = s.Dmax = σ2 because one can choose s = µ, the mean of S.

Next step: lower bound I(

S ; S)= h (S )− h

(S∣∣∣S)

.

It is equivalent to upper bounding h(

S∣∣∣S)

:

h(

S∣∣∣S)

= h(

S − S∣∣∣S)

≤ h(

S − S)≤ 1

2log (2πe D) ,

where the last inequality holds since Var[S − S

]≤ E

[∣∣∣S − S∣∣∣2] ≤ D.

Hence, I(

S ; S)≥ 1

2 log(2πeσ2

)− 1

2 log (2πe D) = 12 log

(σ2

D

).

20 / 39 I-Hsiang Wang IT Lecture 7

Page 21: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Final step: show that the lower bound 12 log

(σ2

D

)can be attained.

The goal is to find a conditional distribution p (s|s) such that

S⊥⊥(

S − S)

so that h(

S − S∣∣∣S )

= h(

S − S)

and(

S − S)∼ N (0,D) .

Again, this can be done via an auxiliary reverse channel.Consider a channel with input S, output S, additive noise Z ∼ N (0,D)⊥⊥S.

S = S + Z =⇒ Z = S − S.

The reverse channel specifies the joint distribution p (s, s) and hence p (s|s)!

X � N�µ,�2

��X

Z � N (0, D)

�X � N�µ, �2 � D

���

�X � �X

21 / 39 I-Hsiang Wang IT Lecture 7

Page 22: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Lossy Source Coding TheoremRate Distortion Function

Example: Source Alphabet = Reconstruction Alphabet

Source (ternary) Si ∈ S = {0, ∗, 1}, and Sii.i.d.∼ pS ∀ i, where

pS (0) = pS (1) = ε ≤ 12 .

Reconstruction (binary) S = {0, 1}.

Distortion d (s, s) ={1 if s = ∗ and s = s0 if s = ∗ or s

.

In other words, there is a don’t-care symbol ∗, and S = S.

Example 3(HW5) Derive the rate distortion function and show that it is given by

R (D) =

{2ε

(1− Hb

( D2ε

)), 0 ≤ D ≤ ε

0, D > ε.

22 / 39 I-Hsiang Wang IT Lecture 7

Page 23: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function

2 Proof of the Coding TheoremConverse ProofAchievability

23 / 39 I-Hsiang Wang IT Lecture 7

Page 24: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function

2 Proof of the Coding TheoremConverse ProofAchievability

24 / 39 I-Hsiang Wang IT Lecture 7

Page 25: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Proof of the Converse of Theorem 1

We aim to show that for any sequence of(2NR,N

)source codes with

lim supN→∞

D(N) ≤ D, the rate R must satisfy R ≥ R(D) (defined in (1)).

We begin with similar steps as in lossless source coding (cf. Lecture 03).

pf: Note that BK is a r.v. because it is generated by another r.v, SN.

K = NR ≥ H(BK )

≥ I(

BK ; SN) (a)≥ I

(SN ; SN

)(b)=

N∑i=1

I(

Si ; SN∣∣∣Si−1

)(c)=

N∑i=1

I(

Si ; SN,Si−1)

≥N∑

i=1

I(

Si ; Si

)(a) is due to SN − BK − SN and the data processing inequality.(b) is due to Chain Rule. (c) is due to Si ⊥⊥ Si−1 (memoryless source).

So far, we have not yet used the condition on distortion.

25 / 39 I-Hsiang Wang IT Lecture 7

Page 26: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Further working on the inequality:

NR ≥∑N

i=1 I(

Si ; Si

)(d)≥

∑Ni=1 R

(E[d(

Si, Si

)])= N

∑Ni=1

1N R

(E[d(

Si, Si

)])(e)≥ NR

(∑Ni=1

1NE

[d(

Si, Si

)])= NR

(E[1N∑N

i=1 d(

Si, Si

)])= NR

(E[d(

SN, SN)])

= NR(D(N)

).

(d) is due to the definition of R (D) in (1).(e) is due to the convexity of R (D) and Jensen’s inequality.

Hence, R ≥ lim supN→∞

R(D(N)

) (f)≥ R

(lim sup

N→∞D(N)

)(g)≥ R (D).

(f) is due to continuity of R (D).(g) is due to lim sup

N→∞D(N) ≤ D and R (D) is non-increasing.

26 / 39 I-Hsiang Wang IT Lecture 7

Page 27: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Remarks

You might note that in the previous proof of converse, we do not makeuse of lower bounds on error probability such as Fano’s inequality.This is because that in our formulation of the lossy source codingproblem, the reconstruction criterion is laid on the expected distortion.

Instead of the criterion lim supN→∞

D(N) ≤ D where D(N) ≜ E[d(

SN, SN)]

,we could use a stronger criterion as follows:

P(N,δ)e ≜ P

{d(

SN, SN)> D + δ

}, δ > 0 (Probability of Error)

limN→∞

P(N,δ)e = 0, ∀ δ > 0 (Reconstruction Criterion)

Under this stronger criterion, we can then give a new operationaldefinition of the rate distortion function.It turns out Theorem 1 remains the same! (converse is implied by our converse)

27 / 39 I-Hsiang Wang IT Lecture 7

Page 28: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

1 Lossy Source Coding Theorem for Memoryless SourcesLossy Source Coding TheoremRate Distortion Function

2 Proof of the Coding TheoremConverse ProofAchievability

28 / 39 I-Hsiang Wang IT Lecture 7

Page 29: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Idea of Constructing Good Source Code

Key in source coding:1 Find a good set of representatives (quantization codewords).2 For each source sequence, determine which codeword to be used.

Main tools we use so far in developing achievability of coding theorems:1 Random coding: construct the codebook randomly and show that

at least one realization can achieve the desired target performance.2 Typicality: help give bounds in performance analysis.

In the following, we prove the achievability part of Theorem 1 by

1 Random coding – show existence of good quantization codebook.

2 Typicality encoding – determine which codeword to be used.

29 / 39 I-Hsiang Wang IT Lecture 7

Page 30: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

SN�SN Random Coding

Typicality Encoding

30 / 39 I-Hsiang Wang IT Lecture 7

Page 31: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Proof Program

1 Random Codebook Generation:Generate a random ensemble of quantization codebooks, each ofwhich contains 2K codewords.

2 Analysis of Expected Distortion:Goal: Show that lim sup

N→∞EC,SN

[d(

SN, SN)]

≤ D, and concludethat there must exist a codebook c such that the expected distortionsatisfies lim sup

N→∞ESN

[d(

SN, SN)]

≤ D.

Note that for a source sequence sN, the optimal encoder chooses anindex w ∈

[1 : 2K]

, that is, a codeword sN (w) in the codebook, sothat d

(sN, sN (w)

)is minimized.

However, similar to ML decoding in channel coding, such optimalencoder is hard to analyze. To simplify analysis, we shall introduce asuboptimal encoder based on typicality.

31 / 39 I-Hsiang Wang IT Lecture 7

Page 32: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Random Codebook Generation

Fix the conditional p.m.f. that attains R(

D1+ε

):

qS|S = arg minpS|S: E[d(S,S)]≤ D

1+ε

I(

S ; S)

(2)

Based on the chosen qS|S and the source distribution pS, calculate pS,the marginal distribution of the reconstruction S.Generate 2K codewords

{sN (w) | w = 1, 2, . . . , 2K}

, i.i.d. according top(sN) = ∏N

i=1 pS (si).

In other words, if we think of the quantization codebook as a 2K × Nmatrix C, the elements of C will be i.i.d. distributed according to pS.

Remark: observe the resemblance with the channel coding achievability.

32 / 39 I-Hsiang Wang IT Lecture 7

Page 33: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Encoding and Decoding

Encoding: unlike channel coding, the encoding process in source codingproblem is usually much involved.We use typicality encoding: (resembling typicality decoding in channel coding)

Given a source sequence sN, find an index w ∈[1 : 2K]

such that(sN, sN (w)

)∈ T (N)

ε

(pS,S

).

Recall the joint distribution pS,S = pS × qS|S as defined in (2).If there is no or more than one such index, randomly pick onew ∈

[1 : 2K]

.Send out the bit sequence that represent the chosen w.

Decoding: Upon receiving the bit sequence representing w, generate thereconstructed sN (w) by looking up the quantization codebook.

33 / 39 I-Hsiang Wang IT Lecture 7

Page 34: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Analysis of Expected Distortion

Why typicality encoder? Typical average lemma (Lemma 2, Lecture 04):

For any nonnegative function g (x) on X , if xn ∈ T (n)ε (X), then

(1− ε)E [g (X)] ≤ 1n

n∑i=1

g (xi) ≤ (1 + ε)E [g (X)] .

In analyzing EC,SN

[d(

SN, SN)]

, we can then distinguish into two cases:

E ≜{(

SN, SN)/∈ T (N)

ε

}and Ec ≜

{(SN, SN

)∈ T (N)

ε

}:

P {E}EC,SN

[d(

SN, SN)∣∣∣E]+ P {Ec}EC,SN

[d(

SN, SN)∣∣∣Ec

]≤ P {E}max

s,sd (s, s) + P {Ec} (1 + ε)

D1 + ε

≤ P {E}maxs,s

d (s, s) + D.

Hence, as long as P {E} vanishes as N → ∞, we are done.

34 / 39 I-Hsiang Wang IT Lecture 7

Page 35: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Analysis of Expected Distortion → Analysis of P {E}

With typicality encoding, analysis of expected distortion is made easy:just need to control P {E}, where E ≜

{(SN, SN

)/∈ T (N)

ε

}.

Let us look at event E : it is the event that the reconstructed SN is notjointly typical with SN, which can only happen when none of thequantization codewords in the codebook is jointly typical with SN.

Hence, E ⊆∩2K

w=1 Acw, where Aw ≜

{(SN, SN(w)

)∈ T (N)

ε

}.

=⇒ P {E} ≤ P{∩2K

w=1 Acw

}.

Unfortunately, the events{Ac

w | w = 1, . . . , 2K}may not be mutually

independent, because they all involve a common random sequence SN.

However, for fixed sN, the events Acw(sN) ≜ {(

sN, SN(w))/∈ T (N)

ε

},

w = 1, . . . , 2K, are indeed mutually independent!

35 / 39 I-Hsiang Wang IT Lecture 7

Page 36: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Analysis of P {E}, E ≜ {(SN, SN) /∈ T (N)ε }

Motivated by the above observation, we give an alternative upper bound:

P {E} ≤∑

sN∈SNp(sN)P{∩2K

w=1 Acw(sN)}

=∑

sN∈SNp(sN)∏2K

w=1 P{Ac

w(sN)}

=∑

sN∈SNp(sN)∏2K

w=1

(1− P

{Aw

(sN)})

Question: Is there a way to lower bound

P{Aw

(sN)} ≜ P

{(sN, SN(w)

)∈ T (N)

ε

(pS,S

)}?

Yes – As long as sN ∈ T (N)ε′ (pS) for some ε′ < ε, Lemma 1 (next slide)

guarantees that P{Aw

(sN)} ≥ 2−N(I(S ;S )+δ(ε)) for sufficiently large N,

where limε→0 δ(ε) = 0.

36 / 39 I-Hsiang Wang IT Lecture 7

Page 37: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Joint Typicality Lemma

The following lemma formally states the bounds.(Proof is omitted – see Section 2.5 of ElGamal&Kim[6])

Lemma 1 (Joint Typicality Lemma)Consider a joint p.m.f. pX,Y = pX · pY|X = pY · pX|Y. Then, there existδ(ε) > 0 with limε→0 δ(ε) = 0 such that:

1 For an arbitrary sequence xn and random Yn ∼∏n

i=1 pY (yi),

P{(xn,Yn) ∈ T (n)

ε (pX,Y)}≤ 2−n(I(X ;Y )−δ(ε)).

2 For an ε′-typical sequence xn ∈ T (n)ε′ (pX) with ε′ < ε, and random

Yn ∼∏n

i=1 pY (yi), for sufficiently large n,

P{(xn,Yn) ∈ T (n)

ε (pX,Y)}≥ 2−n(I(X ;Y )+δ(ε)).

37 / 39 I-Hsiang Wang IT Lecture 7

Page 38: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

Finalizing the Proof

Revoking Lemma 1, the additional condition that sN ∈ T (N)ε′ (pS) for

some ε′ < ε motivates us to split the upper bound on P {E} as follows:

P {E} ≤∑

sN∈SNp(sN)∏2K

w=1

(1− P

{Aw

(sN)})

≤∑

sN /∈T (N)

ε′ (pS)

p(sN)+ ∑

sN∈T (N)

ε′ (pS)

p(sN)∏2K

w=1

(1− P

{Aw

(sN)})

≤ P{

SN /∈ T (N)ε′ (pS)

}+

∑sN∈T (N)

ε′ (pS)

p(sN) (1− 2−N(I(S ;S )+δ(ε))

)2K

≤ P{

SN /∈ T (N)ε′ (pS)

}+(1− 2−N(I(S ;S )+δ(ε))

)2K

≤ P{

SN /∈ T (N)ε′ (pS)

}+ exp

(−2K × 2−N(I(S ;S )+δ(ε))

).

The last step is due to (1− x)r ≤ e−rx for x ∈ [0, 1] and r ≥ 0.

38 / 39 I-Hsiang Wang IT Lecture 7

Page 39: Lecture 7 Lossy Source Coding - 國立臺灣大學homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT_Lecture_07_v1.pdf · Lossy Source Coding Theorem for Memoryless Sources Proof

Lossy Source Coding Theorem for Memoryless SourcesProof of the Coding Theorem

Converse ProofAchievability

We obtain a nice upper bound

P {E} ≤ P{

SN /∈ T (N)ε′ (pS)

}+ exp

(−2K × 2−N(I(S ;S )+δ(ε))

).

The first term vanishes as N → ∞ due to AEP. The second termvanishes as N → ∞ if

R > I(

S ; S)+ δ(ε) = R

(D

1+ε

)+ δ(ε).

Hence, for any R > R(

D1+ε

)+ δ(ε) can achieve average distortion ≤ D.

Finally, due to the continuity of rate-distortion function, we take ε → 0and complete the proof.

39 / 39 I-Hsiang Wang IT Lecture 7