Information Theory: Exercises - Stanford...

Information Theory: Exercises

Mathias Winther Madsen

March 4, 2015

1 Wednesday 4 MarchEntropy of a Categorical Variable A random variable X is distributedaccording to the following table:

x 1 2 3 4 5Pr(X = x) 1/3 1/4 1/6 1/6 1/12

1. Find H(X).

2. Construct a Huffman code for the variable.

3. Decode the message 00101100001 according to your code.

Shuffling Cards (McKay, Ex. 6.10) Roughly how many bits of entropy doyou create by shuffling a deck of cards?

Entropy of a Die Let X be a random variable distributed uniformly on{1, 2, 3, 4, 5, 6}.

1. Construct a Huffman code for the variable.

2. What is the average codeword length for your code? How does that com-pare with the entropy?

3. If you interpret a codeword of length k as a probability of 2�k, what isthen the implicit distribution expressed by your code?

1

2 Monday 9 MarchTiny chess What’s the entropy rate of a knight walking on a 3 ⇥ 3 chessboard? What about a bishop? Think about how the answer depends on yourassumptions about how the pieces select their next moves.

Palindrome Machine A palindrome function picks a width W according tothe geometric distribution

w 2 4 6 8 10 · · ·Pr{W = w} 1/2 1/4 1/8 1/16 1/32 · · ·

It then prints a symmetric string of width W . Possible return values for thisfunction are thus 00 and 0110, but not 10 or 0101.

str = ""while flip():

randbit = flip()append rrandbit to str

print strprint str in reverse

A machine repeatedly calls thisfunction and prints the outputs. Anoutput stream from this machine isthus a series of palindromes like

00 1001 11 110011 00 . . .

but without the disambiguating spaces.You start this machine and ob-

serve the output

1100 . . .

What is the probability that the next character is a 0?

Random walk with gravity A molecule moves around in a glass of waterwhich we consider as divided up into three compartments. Whenever possi-ble, the molecule moves one compartment down with probability 1/5, and onecompartment up with probability 1/20.

1. Write down the transition probabilities associated with this system in anexhaustive and explicit fashion.

2. Find the associated equilibrium distribution.

3. What would you guess the equilibrium distribution would look if we hadstarted with k compartments instead of three?

Straws (McKay, Ex. 15.4) How can you use a fair coin to draw lots amongthree people? Come up with at least two different alterantives and comparethem in terms of (1) fairness, and (2) expected number of coin flips.

2

3 Wednesday 11 March

Horse 1 2 3Prob. 1/2 1/4 1/4Odds 4 2 4

A horse race (Cover and Thomas,

Ex. 6.5) A horse race has probabilitiesand odds as shown in the table.

1. Find the doubling rate associated withthis race.

2. Should you bet on this race?

A Prediction Game A person randomly selects one of the words tip, top,tap, sit, or sip. You can then bet on each letter of the word, one by one.Every time you guess a letter correctly, your capital is doubled.

If you were the bookmaker selecting the secret word, what probability distri-bution over the five words should you use to make the game as hard as possible?What would the corresponding doubling rate be?

P (X,Y ) X = 1 X = 2Y = 1 0 1/2Y = 2 1/4 1/4

Variational approximation Two ran-dom variables X and Y interact accordingto the joint probability table on the right.We will call this probability distributionP and approximate it by a distributionQ which assumes that X and Y are inde-pendent.

1. Which distribution Q over independent X and Y minimizes D(P ||Q)?

2. Which distribution Q over independent X and Y minimizes D(Q ||P )?

Competitive Prediction Two scientists compete about assigning good prob-ability estimates to the outcomes of a random process. One scientist believesthat the process is a series of coin flips with bias ✓ = .6, and the other believesthat it is a series of coin flips with bias ✓ = .2. The process is in fact a coinflipping process, but the coin actually has a bias of ✓ = .5.

104

100

10�4

We measure the relative performance of the twoscientists by looking at the likelihood ratio betweentheir respective probability estimates,

Pr(x1, x2, . . . xk | ✓ = .6)

Pr(x1, x2, . . . xk | ✓ = .2)

We consider one scientists as substantially better than the other if this likelihoodratio exceeds 20/1 or drops below 1/20. Roughly how many coin flips should ittake before this happens?

3

Codebreaking Crack the following substitution cipher:

GWAL VLITG IEW -- HLCLT ARHO UWF MWHE NTLBRGLMV -- UICRHEMRDDML WT HW AWHLV RH AV NYTGL, IHO HWDURHE NITDRBYMIT DWRHDLTLGD AL WH GUWTL, R DUWYEUD R FWYMO GIRM IPWYD I MRDDMLIHO GLL DUL FIDLTV NITD WS DUL FWTMO. RD RG I FIV R UICL WSOTRCRHE WSS DUL GNMLLH IHO TLEYMIDRHE DUL BRTBYMIDRWH.

Spaces and punctuation have been left unencrypted to make things easier. Theunderlying plaintext string is in capitalized English.

4 Homework 1Source coding (Cover and Thomas, Ex. 3.7) An information source pro-duces produces pixels X1, X2, X3, . . . with

Pr{Xi = WHITE} = 0.995

Pr{Xi = BLACK} = 0.005

You decide to brute-force encode outputs from this source, 100 pixels at a time,by means of a table of equally long codewords. You include all sequences withthree or fewer black pixels in the table and accept there will be an error in theremaining cases.

1. Compute the number of codewords you will need.

2. How many bits do you need in order to encode that many sequences? Howdoes that compare to the theoretical minimum?

3. What are your options for improving this performance, theoretically andpractically?

4. Find out how likely this encoding scheme is to encounter an untabulatedsequence.

5 Homework 2Morse Code (Cover and Thomas, Ex. 4.8) An alphabet contains a dotwhich takes one unit of time to transmit, and a dash which takes two.

1. When the two symbols have probability p and 1 � p, what’s the entropyrate of this process?

2. For which choice of p is this entropy rate the largest?

4

Forwards and Backwards Prediction (McKay, Ex. 8.11) Consider thefollowing two tasks:

1. Guessing the next letter of a text given the preceding ones:

... re particularly impr_

2. Guessing the previous letter of a text given the following ones:

_onth following the c . . .

In general, which task is the more difficult — from a statistical perspective, andfrom a cognitive? Why?

Arithmetic Coding for a Bent Coin Suppose we are going to do n = 2flips of a bent coin with bias p = 1/4.

1. Construct the Shannon-Fano-Elias code for the outcomes of this experi-ment.

2. If ki is the length of the ith codeword, what isP

i 2�ki?

3. How does that compare to the same sum for n = 1?

5

Information Theory: Hints


March 4, 2015

1 Wednesday 4 MarchEntropy of a Categorical Variable For the computation, remember that

log2 3 ⇡ 1.58.

Shuffling Cards (McKay, Ex. 6.10) How many ways can they be sorted?

2 Monday 9 MarchTiny chess Find out how where you can place them, and where they can walk

from there. Then compute a stationary distribution.

Palindrome Machine At each time step, the palindrome function either adds

another bit to its list of bits to later print in reverse (with probability

1/2) or

starts printing the contents of that memory (with probability

1/2). Looking at

it this way, you can find the three possible memory states the machine might

be in after having emitted 0011.

Random walk with gravity Try plotting your solution in a logarithmic

coordinate system; or think in terms of ratios.

Straws (McKay, Ex. 15.4) Arithmetic coding gives you one suggestion.

You can also design a method that is perfectly fair at the cost of not having a

cap on its use of resources.

1

3 Wednesday 11 MarchA Prediction Game You should think in terms of the product rule (thechain rule) of probability theory.

Variational approximation Since Q assumes that the two variables are in-dependent, it can be specified completely by two parameters, ↵ = Q(X = 1)

and � = Q(Y = 1). Using high school calculus, you can then find the optimalsetting of those parameters.

Competitive Prediction What’s the expected value of the logarithm of thelikelihood ratio?

Codebreaking You could, for instance, start by listing all the words of length1 and 2.

4 Homework 1Source coding (Cover and Thomas, Ex. 3.7)

1. This is a sum of binomial coefficients.

2. You should compare logN to H, where N is the number of tabulatedcodewords.

3. Theoretically: Think in terms of the proof of the source coding theorem.Practically: Think beyond brute force.

4. This is a cumulative binomial probability.

5 Homework 2Morse Code (Cover and Thomas, Ex. 4.8) An alphabet contains a dotwhich takes one unit of time to transmit, and a dash which takes two.

1. Draw a state diagram that describes what one is allowed to do in varioussituations; think about the transition probabilities that will produce thecorrect symbol probabilities; think about the entropy of the choice one isfaced with at either state node.

2. Use high school calculus maximization. Remember not to confuse log2 xand lnx.

Forwards and Backwards Prediction (McKay, Ex. 8.11) For a statisti-cal perspective, think in terms of the product rule (the chain rule) of probabilitytheory.

2

Arithmetic Coding for a Bent Coin

1. Explicitly compute the cumulative probability distribution.

2. Think about how the codewords fill out the unit interval.

3. Solve the same exercise in the simpler case, using the same methods.

3

Information Theory: Solutions


March 4, 2015

1 Wednesday 4 March

Entropy of a Categorical Variable

1. H(X) ⇡ 2.19.

2. One possibility is {00, 01, 10, 110, 111}.

3. With the code above, this decodes as

00 10 110 00 01 . . . = 1, 3, 4, 1, 2, . . .

Other choices may lead to different answers.

Shuffling Cards (McKay, Ex. 6.10) Since shuffling a deck of cards is equiv-alent to picking one of the 52! possible orders of the cards in the deck, theshuffling creates log2(52!) ⇡ 225.58 bits of uncertainty.

Entropy of a Die

1. One possibility: {00, 01, 100, 101, 110, 111}.

2. The average codeword length is

2

6

⇥ 2 +

4

6

⇥ 3 =

8

3

⇡ 2.67

whereas the entropy islog2 6 ⇡ 2.58.

The encoding thus exceeds the theoretical minimum by about 3.5%.

3. With the code above, {1/4, 1/4, 1/8, 1/8, 1/8, 1/8}.

1

2 Monday 9 March

Tiny chess If the knight begins at one of the 8 boundary fields, it can moveto exactly two other fields, and the entropy rate is H = 1 bit per move. If itbegins on the center field, it can’t move, and it isn’t quite clear how to definean entropy rate for the piece.

There are two different bishops on the 3⇥ 3 chessboard:

1. a white bishop, which can only move around in circles

2. a black bishop, which can either stand on the center field or a corner field.

The white bishop can achieve an entropy rate of 1 bit per move, since it canalways choose between exactly two distinct moves. The entropy rate associatedwith the black bishop depends on how often it walks from a corner field to thecenter field, rather than to the opposite corner:

corner center

1� p

1

p

With this notation, a bishop of this kind will spend an average of

↵ =

1� p

2� p

of its time on the center field, where the entropy of the next step is log2 4 = 2.It will thus achieve an entropy rate of

H =

1� p

2� p⇥ 2 +

1

2� pH2(p),

whereH2(p) = p log2

1

p+ (1� p) log2

1

1� p

is the entropy of a coin flip with parameter p.By numerical inspection, we find that the entropy is the highest when the

probability of transitioning from a corner to a corner is about p ⇡ .39. Thecorresponding entropy rate is then roughly H ⇡ 1.36. By contrast, for p = 0

and p =

1/2, the corresponding rates are H = 1 and H =

4/3, both of which aresmaller than 1.36.

0 0.39 1

0

0.5

1

1.5

p

Ent

ropy

rate

Palindrome Machine The probability of a 0 is

1

4

⇥ 1

2

+

1

4

⇥ 1

4

+

1

2

⇥ 3

4

=

9

16

.

2

Random walk with gravity

1/21 4/21 21/21

.20 .20

.05.05.80 .75 .95

Straws (McKay, Ex. 15.4) One option: Flip the coin twice and map thefirst three possible outcomes to three possible choices; if the fourth outcomecomes up, try again. This will take an average of 8/3 ⇡ 2.67 coin flips to yielda usable outcome.

Another option: Flip the coin exactly twice and map the four outcomes tothree people such that the corresponding distribution assigns them probabilities(

1/4, 1/4, 1/2).

3 Wednesday 11 March

A horse race (Cover and Thomas, Ex. 6.5)

1. W =

1/4.

2. Yes: 2

�4+ 2

�2+ 2

�4 < 1, meaning that it has better than fair odds.

A Prediction Game By the product rule (chain rule) of probability theory,this game is equivalent to a guessing game in which you can place bets on fivedifferent outcomes, getting a payoff of 2

3= 8 times your bet on the correct

guess.The hardest version of this game is the one in which the five words are

equally probable. In that version, its doubling rate is

W = log2 8� log2 5 ⇡ 0.678.

Variational approximation Let ↵ = Q(X = 1) and � = Q(Y = 1). Thenthe solutions are

1. ↵ =

5/8, � =

3/4.

2. Since P (X = 1, Y = 1) = 0, either ↵ or � has to be 0; it turns out thatthe best option is to set ↵ = 0, � = 1.

3

Competitive Prediction The expected value of the logarithm of the likeli-hood ratio is

0.5 log20.6

0.2+ 0.5 log2

0.4

0.8⇡ 0.29.

Every experiment will thus, on average, produce .29 bits of evidence in favorof the first scientist. Since log2 20 ⇡ 4.32, and 4.32/0.29 ⇡ 14.78, it will takeabout 15 coin flips before the evidence strongly favors the option ✓ = 0.6.

Codebreaking

SOME YEARS AGO -- NEVER MIND HOW LONG PRECISELY -- HAVINGLITTLE OR NO MONEY IN MY PURSE, AND NOTHING PARTICULAR TOINTEREST ME ON SHORE, I THOUGHT I WOULD SAIL ABOUT A LITTLEAND SEE THE WATERY PART OF THE WORLD. IT IS A WAY I HAVE OFDRIVING OFF THE SPLEEN AND REGULATING THE CIRCULATION.

4

Information Theory: Exercises - Stanford...

Documents

Transcript of Information Theory: Exercises - Stanford...