Information and Communication Theory Lecture 4

Information and Communication Theory

Lecture 4

Optimal Coding

Mario A. T. Figueiredo

DEEC, Instituto Superior Tecnico, University of Lisbon, Portugal

2021

Lecture 4 (Optimal Coding) Information and Communication Theory 2021 1/∞

Source Coding

Discrete source Encoder DecoderError-freechannel

Lossless encoding: output of the decoder equal to that of the source.

Assumption: when encoding Xt, fXt is known:

X For memoryless sources, this is just fX ;

X For Markov sources, this is fXt|Xt−1,...,

Without loss of generality, we simply write fX .

Goal: economy, that is, use the channel as little as possible.


Variable-Length Coding


Code uses D-ary alphabet D = {0, 1, ..., D − 1}.

Typically, binary coding, D = 2, D = {0, 1}.

Variable-length encoding: D∗ is the Kleene closure of D:

D∗ = {all finite strings of symbols of D} =∞⋃n=0

Dn

Example: for D = {0, 1},D∗ = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, 0000, ...}


Non-Singular and Uniquely Decodable Codes


For C−1 to exist: non-singular code (C injective). For any x, y ∈ X ,

x 6= y ⇒ C(x) 6= C(y)

To be useful for a sequence of symbols, this is not good enough.

Example: {C(1) = 0, C(2) = 10, C(3) = 01} is non-singular

Received sequence: 010; is it C(1)C(2) or C(3)C(1)?

Impossible to know!

Codes that do not have this problem are called uniquely decodable.


Instantaneous Codes

Consider a uniquely decodable code:{C(1) = 01, C(2) = 11, C(3) = 00, C(4) = 110}

How to decode the sequence 1100....00︸︷︷︸n zeros

11?

X If n is even: C−1(1100....00︸︷︷︸n zeros

11) = 2 3....3︸︷︷︸n/2

2

X If n is odd: C−1(1100....00︸︷︷︸n zeros

11) = 4 3....3︸︷︷︸n−12

2

To decode the first symbol, we many need to wait for many others.

A code that does not have this problem is called instantaneous.


Instantaneous Codes

If no codeword is prefix of another, decoding is instantaneous.Other names: prefix codes, prefix-free codes.

Length function:

lC(x) = length(C(x))

Expected length

L(C) = E[lC(X)] =∑x∈X

fX(x) lC(x)

Example:

L(C) = 7/4 bits/symbol


Instantaneous Codes: Tree Representation

Instantaneous code: no codeword is prefix of another.

Decoding instantaneous code: path from root to leaf of a tree:

For D-ary codes: D−ary trees.

L(C) is the sum of the probabilities of the inner nodes. (show why)


Instantaneous Codes: Kraft-McMillan Inequality

If C is a D-ary instantaneous code, it necessarily satisfies∑x∈X

D−lC(x) ≤ 1 (KMI)

...i.e., if some words are short others have to be long!

Proof:X let lmax = max{lC(1), ..., lC(N)} (length of the longest word).

X there are Dlmax words of length lmax.

X for each word C(x), there are Dlmax−lC(x) words of length lmax thathave C(x) as prefix;

X the sets of length-lmax words that have each word as prefix are disjoint.∑x∈X

Dlmax−lC(x) ≤ Dlmax divide by Dlmax

−→∑x∈X

D−lC(x) ≤ 1

Important: the KMI is a necessary, not sufficient, condition. (why?)


Kraft-McMillan Inequality: Graphical Proof


Exercises

For each of the following codes

say if it is non-singular, uniquely decodable, or instantaneous.Compute the expected length of each of the codes.

Converse of the KMI: given a collection of N positive integers,l1, ..., lN , that satisfy

N∑x=1

D−lx ≤ 1,

is it necessarily possible to construct an instantaneous code such thatlC(x) = lx? Give an example.


Exercises

Given the collection of numbers (2, 2, 2, 3, 4), is it possible to built aninstantaneous code with these lengths? Why? If yes, give an example.Could the code be optimal, or can we build another instantaneouscode with necessarily shorted expected code-length?

Repeat the previous question for the set of numbers (1, 2, 2, 3, 4).

List all possible distributions of lengths for instantaneous codes withthe following numbers of words: N = 3, N = 4, and N = 5.


Source Coding Theorem

Source X ∈ X = {1, ..., N} with probability mass function fX .

For any collection of N positive integers, l1, ..., lN ,

(KMI)N∑x∈X

D−lx ≤ 1 ⇒∑x∈X

fX(x) lx ≥ H(X).

Proof: let q(x) =D−lx

A> 0, where A =

∑x∈X

D−lx ≤ 1;∑x∈X

q(x) = 1

0 ≤ DKL(fX ‖ q) =∑x∈X

fX(x) logDfX(x)

q(x)

=∑x∈X

fX(x) logD fX(x)︸︷︷︸−HD(X)

+ logD A︸︷︷︸≤0

∑x∈X

fX(x) +∑x∈X

fX(x) lx

...equality iff A = 1 and q(x) = fX(x)⇔ lx = − logD fX(x) (only possible

if integers).


Source Coding Theorem


Corollary of the result in previous slide:

C is instantaneous ⇒ KMI ⇒∑x∈X

fX(x) lC(x)︸︷︷︸expected

code-length L(C)

≥ HD(X)

...with equality if and only if lC(x) = − log fX(x).

Equality is only possible if − logD fX(x) are integers.

Shannon-Fano code (SFC): just take lSFC (x) = d− logD fX(x)e

Clearly, the SFC satisfies the KMI (due ≥ u, for any u ∈ R)∑x∈X

D−lSFC (x) ≤

∑x∈X

DlogD fX(x) =∑x∈X

fX(x) = 1


Optimal Code


Optimal code lengths:(loptimal1 , ..., loptimal

N

)=arg min

(l1,...,lN )

∑x∈X

fX(x) lx

subject to l1, ..., lN ∈ N∑x∈X

D−lx ≤ 1

Optimal code: Coptimal is any instantaneous code with

lCoptimal(x) = loptimal

x , for x ∈ X

Because it satisfies the KMI, L(Coptimal) ≥ H(X).

Because it is optimal, L(Coptimal) ≤ L(CSF)


Bounds on Optimal Code-length

Because it is optimal, L(Coptimal) ≤ L(CSF)

Because, due < u+ 1, for any u ∈ R,

L(Coptimal) ≤ L(CSF) =∑x∈X

fX(x)d− logD fX(x)e

<∑x∈X

fX(x)(− logD fX(x) + 1) = H(X) + 1

In summary: H(X) ≤ L(Coptimal) < H(X) + 1

Code efficiency: ρC =H(X)

L(C).

Ideal code: ρC = 1. Important: ideal⇒6⇐ optimal


Exercises

1 List all possible distributions of word-lengths of optimal binary codesfor sources with N = 3, N = 4, and N = 5 symbols. (For N = 2,there is only one choice: (1, 1).)

2 For the source X ∈ X = {1, 2, 3, 4} withfX = (1/3, 1/3, 1/9, 1/9, 1/9), obtain a binary Shannon-Fano codeand compute its expected length and efficiency. Is it an optimal code?

3 For the source X ∈ X = {1, 2, 3, 4} with fX = (1/2, 1/4, 1/8, 1/8),obtain an optimal binary code and compute its expected length andefficiency. Is it an ideal code?

4 Repeat the two previous exercise, but now for ternary codes.

5 Show that there are sources and corresponding optimal codes suchthat L(Coptimal) is arbitrarily close to H(X) + 1.


Coding With a Wrong Distribution


Build Shannon-Fano code assuming gX : lC(x) = d− log gX(x)e

Lower bound:

L(C) =∑x∈Xd− log gX(x)efX(x)

≥−∑x∈X

fX(x) log gX(x)

=∑x∈X

fX(x) logfX(x)

gX(x) fX(x)

= −∑x∈X

fX(x) log fX(x)︸︷︷︸H(X)

+∑x∈X

fX(x) logfX(x)

gX(x)︸︷︷︸DKL(fX‖gX)


Coding With a Wrong Distribution


Build Shannon-Fano code assuming gX : lC(x) = d− log gX(x)e

Upper bound:

L(C) =∑x∈Xd− log gX(x)efX(x)

<∑x∈X

fX(x)(− log gX(x) + 1)

= H(X) +DKL(fX ‖ gX) + 1

Summarizing: if C is built from gX and the true distribution is fX

H(X)+DKL(fX ‖ gX) ≤ L(C) < H(X)+DKL(fX ‖ gX) + 1


Approaching the Bound: Source Extension

Discrete stationary source Xt ∈ X = {1, ..., N}

Extension: group n consecutive symbols: (X1, ..., Xn) ∈ {1, ..., N}n.

The optimal code for the extended symbols (X1, ..., Xn) satisfies

H(X1, ..., Xn) ≤ L(Coptimaln

)︸︷︷︸bits/(n symbols)

< H(X1, ..., Xn) + 1

Memoryless source: H(X1, ..., Xn) = nH(X1), thus

L(Coptimaln

)≤ nH(X1) + 1 ⇒

L(Coptimaln

)n︸︷︷︸

bits/symbol

< H(X1) +1

n

...via extension, expected code-length can arbitrarily approach the entropy.

Non-memoryless source: H(X1, ..., Xn) < nH(X1) and the result iseven stronger.


Exercises

1 For the source X ∈ X = {1, 2} with fX = (7/8, 1/8), obtain a binaryoptimal code and compute its expected length and efficiency.

2 Find de optimal code for the order-2 extension of the source in theprevious question.

3 Repeat the two previous questions for fX = (1/2, 1/2).

4 Consider a stationary Markov source Xt ∈ X = {1, 2, 3}, with thefollowing probability transition matrix:

P =

0 1/8 7/87/8 0 1/81/8 7/8 0

.Obtain the optimal coding for this source and for its order-2 extension.


Huffman Codes

Huffman (1952) algorithm to obtain optimal codes.

Builds a D−ary tree, starting from the leaves, which are the symbols.

Algorithm (for D = 2; the generalizing to D > 2 requires some care).

1 Input: a list of symbol probabilities (p1, ..., pN ).

2 Output: a binary tree with each symbol as a leaf.

3 Assign each symbol to a leaf of the tree.

4 Find the 2 smallest probabilities: pi and pj .

5 Create the parent node for nodes i and j with probability pi + pj .

6 Remove pi and pj from the list and insert pi + pj .

7 If the list of symbols has more than 2 probabilities, go back to step 4.

As seen before, a binary tree corresponds to an instantaneous code.


Huffman Codes

Illustration: probabilities (0.4, 0.1, 0.05, 0.25, 0.2)


Huffman Codes



Huffman Codes

Label the edges (arbitrarily) to obtain the code words


Huffman Codes

Expected code-length: sum of the inner node probabilities:L(C) = 1 + 0.6 + 0.35 + 0.15 = 2.1 bits/symbol


Huffman Codes

Huffman codes are optimal; see proof in recommended reading.

Converse is not true

Huffman code⇒6⇐ optimal code

Exercise: give an example of a source and an optimal code that is nota Huffman code.

In the case of ties, break them arbitrarily.

For D-ary codes, merge D symbols to build a D-ary tree.

For D-ary codes, optimality requires N = k(D− 1) + 1, where k ∈ N....if not satisfied, just append zero-probability symbols.

Exercise: obtain a Huffman ternary code for a source with 4equiprobable symbols.


Exercises

Find a binary Huffman code for the source X ∈ X = {1, 2, 3, 4} withfX = (0.27, 0.26, 0.24, 0.23). Are there optimal codes for this sourcethat are not Huffman codes? If yes, how many?

Find a ternary Huffman code for the source in the previous question.Are there optimal ternary codes for this source that are not Huffmancodes? If yes, how many?

Consider a coding scheme where the first symbol of the codeword isternary (in {0, 1, 2}) and the remaining symbols are binary (in {0, 1}).That is, the codewords belong to {0, 1, 2} × {0, 1}∗. Find the optimalcode with this structure for thes source X ∈ X = {1, 2, 3, 4, 5, 6} withfX = (17, 16, 8, 7, 6, 5)/59. How to express the expected length ofthis code?


Exercises

Consider 9 apparently equal spheres, 8 of which weight 1Kg and one weights1.05Kg. Suppose you have a balance scale.

What is the minimum number of weightings needed to identify the heavier sphere?Propose an non-sequential procedure (i.e., each weighting does not depend on theresults of previous ones) to find the sphere in the minimum number of weightings.

You have 6 bottles of wine, but you know that one has gone bad (tastes likevinegar). By inspection, you determine that the probabilities of bottle i being thebad one are (8, 6, 4, 2, 2, 1)/23. In what order should you taste the bottles to findthe bad one in the minimal number of tastings? What is the expected number oftastings? Can you do better if you are allowed to mix wines and taste the mixture?


Elias Codes for Natural Numbers

Standard number representation is not uniquely decodable.

Binary representation of natural numbers is not uniquely decodable.Example: C(3) = 11, C(21) = 10101, but decoding 1110101 isimpossible; it could be C(14)C(5) or C(58)C(1).

Length of binary representation for x ∈ N is blog2 xc+ 1.Example: C(13) = 1101 has length 4; blog2 13c+1 = b3.70c+1 = 4.

Elias coding:

X instantaneous code for arbitrary natural numbers;

X length not much worse than blog2 xc+ 1.

Useful not only for N, but also for large alphabets X = {1, ..., N}with large and unknown N .


Elias Gamma Code

Length of binary representation for x ∈ N is blog2 xc+ 1.

Let C2 denote the standard binary representation.

Elias gamma code: Cγ(x) = 0...0︸︷︷︸blog2 xczeros

C2(x)

x Cγ(x)

1 12 0104 001005 001017 001119 0001001

10 0001010...

...19 000010011...

...147 000000010010011

Obviously instantaneous.

Length:lCγ (x) = 2blog2 xc+ 1.

Twice as bad as C2.


Elias Delta Code

Elias delta code: Cδ(x) = Cγ(blog2 xc+ 1) C2(x)

C2 is C2 without the leading 1 (e.g. C2(9) = 1001, C2(10) = 001)

Length: lCδ(x) = lCγ (blog2 xc+ 1) + blog2 xc= 2 blog2(blog2 xc+ 1)c+ blog2 xc+ 1

x Cδ(x)

1 12 01003 01014 011007 011118 0010000010 00100010...

...19 001010011...

...147 00010000010011

Obviously instantaneous.

For x > 32, lCδ(x) < lCγ (x)

Approaches C2 for large x:

limx→∞

lCδ(x)

C2(x)= 1


Recommended Reading

T. Cover and J. Thomas, “Elements of Information Theory”, JohnWiley & Sons, 2006 (Chapter 5).

M. Figueiredo, “Elias Coding for Arbitrary Natural Numbers”,available at the course webpage in Fenix.


Information and Communication Theory Lecture 4

Documents

Transcript of Information and Communication Theory Lecture 4