EE514a Information Theory I Fall Quarter 2019EE514a { Information Theory I Fall Quarter 2019 Prof....

244
EE514a – Information Theory I Fall Quarter 2019 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical & Computer Engineering Fall Quarter, 2019 https://class.ece.uw.edu/514/bilmes/ee514_fall_2019/ Lecture 4 - Oct 7th, 2019 Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F1/50 (pg.1/244)

Transcript of EE514a Information Theory I Fall Quarter 2019EE514a { Information Theory I Fall Quarter 2019 Prof....

  • EE514a – Information Theory IFall Quarter 2019

    Prof. Jeff Bilmes

    University of Washington, SeattleDepartment of Electrical & Computer Engineering

    Fall Quarter, 2019https://class.ece.uw.edu/514/bilmes/ee514_fall_2019/

    Lecture 4 - Oct 7th, 2019

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F1/50(pg.1/244)

    https://class.ece.uw.edu/514/bilmes/ee514_fall_2019/

  • Logistics Review

    Class Road Map - IT-I

    L1 (9/25): Overview, Communications,Information, Entropy

    L2 (9/30): Entropy, Mutual Information,KL-Divergence

    L3 (10/2): More KL, Jensen, more Venn,Log Sum,Data Proc. Inequality

    L4 (10/7): Data Proc. Ineq.,thermodynamics, Stats, Fano,

    L5 (10/9): M. of Conv, AEP, SourceCoding

    L6 (10/14):

    L7 (10/16):

    L8 (10/21):

    L9 (10/23):

    L10 (10/28):

    L11 (10/30):

    L12 (11/4):

    LXX (11/6): In class midterm exam

    LXX (11/11): Veterans Day holiday

    L13 (11/13):

    L14 (11/18):

    L15 (11/20):

    L16 (11/25):

    L17 (11/27):

    L18 (12/2):

    L19 (12/4):

    LXX (12/10): Final exam

    Finals Week: December 9th–13th.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F2/50(pg.2/244)

  • Logistics Review

    Cumulative Outstanding Reading

    Read chapters 1 and 2 in our book (Cover & Thomas, “InformationTheory”) (including Fano’s inequality).

    Read chapters 3 and 4 in our book (Cover & Thomas, “InformationTheory”).

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F3/50(pg.3/244)

  • Logistics Review

    Homework

    Homework 1 on our assignment dropbox(https://canvas.uw.edu/courses/1319497/assignments), wasdue Tuesday, Oct 8th, 11:55pm.

    Homework 2 will be out by early next week.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F4/50(pg.4/244)

    https://canvas.uw.edu/courses/1319497/assignments

  • Logistics Review

    Fano’s Inequality: Summary

    Consider the following situation where we send X through a noisychannel, receive Y , and do further processing

    X

    p(y|x)

    Yg(·) X̂

    NoisyChannel

    processing

    X̂ is an estimate of X.

    An error if X 6= X̂. How do we measure the error? With probability,Pe , p(X 6= X̂).Intuitively, conditional entropy should tell us something about theerror possibilities, in fact, we have

    Theorem 4.2.7 (Fano’s Inequality)

    H(Pe) + Pe log(|X | − 1) ≥ H(X|X̂) ≥ H(X|Y ) (4.34)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F5/50(pg.5/244)

  • Logistics Review

    Fano’s Inequality

    Theorem 4.2.7

    H(Pe) + Pe log(|X | − 1) ≥ H(X|X̂) ≥ H(X|Y ) (4.28)

    So Pe = 0 requires that H(X|Y ) = 0!Note, the theorem simplifies (and implies)1 + Pe log(|X |) ≥ H(X|Y ), or

    Pe ≥H(X|Y )− 1

    log |X | (4.29)

    yielding a lower bound on the error.

    This will be used to prove the converse to Shannon’s codingtheorem, i.e., that any code with probability of error → 0 as theblock length increases must have a rate R < C = the capacity ofthe channel (to be defined).

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F6/50(pg.6/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Lemma 4.3.1

    Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:

    p(X = X ′) ≥ max(

    2−H(p)−D(p||r), 2−H(r)−D(r||p),)

    (4.1)

    Proof.

    2−H(p)−D(p||r) = 2∑x p(x) log p(x)+

    ∑x p(x) log

    r(x)p(x) (4.2)

    = 2∑x p(x) log r(x) (4.3)

    ≤∑

    x

    p(x)2log r(x) (4.4)

    =∑

    x

    p(x)r(x) (4.5)

    = p(X = X ′)

    (4.6)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.7/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Lemma 4.3.1

    Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:

    p(X = X ′) ≥ max(

    2−H(p)−D(p||r), 2−H(r)−D(r||p),)

    (4.1)

    Proof.

    2−H(p)−D(p||r) = 2∑x p(x) log p(x)+

    ∑x p(x) log

    r(x)p(x) (4.2)

    = 2∑x p(x) log r(x) (4.3)

    ≤∑

    x

    p(x)2log r(x) (4.4)

    =∑

    x

    p(x)r(x) (4.5)

    = p(X = X ′)

    (4.6)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.8/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Lemma 4.3.1

    Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:

    p(X = X ′) ≥ max(

    2−H(p)−D(p||r), 2−H(r)−D(r||p),)

    (4.1)

    Proof.

    2−H(p)−D(p||r) = 2∑x p(x) log p(x)+

    ∑x p(x) log

    r(x)p(x) (4.2)

    = 2∑x p(x) log r(x) (4.3)

    ≤∑

    x

    p(x)2log r(x) (4.4)

    =∑

    x

    p(x)r(x) (4.5)

    = p(X = X ′)

    (4.6)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.9/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Lemma 4.3.1

    Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:

    p(X = X ′) ≥ max(

    2−H(p)−D(p||r), 2−H(r)−D(r||p),)

    (4.1)

    Proof.

    2−H(p)−D(p||r) = 2∑x p(x) log p(x)+

    ∑x p(x) log

    r(x)p(x) (4.2)

    = 2∑x p(x) log r(x) (4.3)

    ≤∑

    x

    p(x)2log r(x) (4.4)

    =∑

    x

    p(x)r(x) (4.5)

    = p(X = X ′)

    (4.6)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.10/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Lemma 4.3.1

    Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:

    p(X = X ′) ≥ max(

    2−H(p)−D(p||r), 2−H(r)−D(r||p),)

    (4.1)

    Proof.

    2−H(p)−D(p||r) = 2∑x p(x) log p(x)+

    ∑x p(x) log

    r(x)p(x) (4.2)

    = 2∑x p(x) log r(x) (4.3)

    ≤∑

    x

    p(x)2log r(x) (4.4)

    =∑

    x

    p(x)r(x) (4.5)

    = p(X = X ′) (4.6)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.11/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:

    P (X = X ′) ≥ 2−H(p) (4.7)

    P (X = X ′) =∑

    x p(x)2 known as the probability of collision.

    Above improves standard collision probability bound∑

    x

    p(x)2 =∑

    x

    [(p(x)− 1/n) + 1/n]2 (4.8)

    = 1/n+∑

    x

    [(p(x)− 1/n)]2 ≥ 1/n (4.9)

    so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.12/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:

    P (X = X ′) ≥ 2−H(p) (4.7)

    P (X = X ′) =∑

    x p(x)2 known as the probability of collision.

    Above improves standard collision probability bound∑

    x

    p(x)2 =∑

    x

    [(p(x)− 1/n) + 1/n]2 (4.8)

    = 1/n+∑

    x

    [(p(x)− 1/n)]2 ≥ 1/n (4.9)

    so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.13/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:

    P (X = X ′) ≥ 2−H(p) (4.7)

    P (X = X ′) =∑

    x p(x)2 known as the probability of collision.

    Above improves standard collision probability bound∑

    x

    p(x)2 =∑

    x

    [(p(x)− 1/n) + 1/n]2 (4.8)

    = 1/n+∑

    x

    [(p(x)− 1/n)]2 ≥ 1/n (4.9)

    so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.

    Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.14/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    An interesting bound on probability of equality

    Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:

    P (X = X ′) ≥ 2−H(p) (4.7)

    P (X = X ′) =∑

    x p(x)2 known as the probability of collision.

    Above improves standard collision probability bound∑

    x

    p(x)2 =∑

    x

    [(p(x)− 1/n) + 1/n]2 (4.8)

    = 1/n+∑

    x

    [(p(x)− 1/n)]2 ≥ 1/n (4.9)

    so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.15/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Weak Law of Large Numbers (WLLN)

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that

    p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)

    Written as 1nSnp−→ µ (converges in probability)

    1nSn gets as close to µ as we want and the variance of

    1nSn gets

    arbitrarily small if n gets big enough.

    SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Weak Law of Large Numbers (WLLN)

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that

    p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)

    Written as 1nSnp−→ µ (converges in probability)

    1nSn gets as close to µ as we want and the variance of

    1nSn gets

    arbitrarily small if n gets big enough.

    SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Weak Law of Large Numbers (WLLN)

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that

    p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)

    Written as 1nSnp−→ µ (converges in probability)

    1nSn gets as close to µ as we want and the variance of

    1nSn gets

    arbitrarily small if n gets big enough.

    SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Weak Law of Large Numbers (WLLN)

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that

    p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)

    Written as 1nSnp−→ µ (converges in probability)

    1nSn gets as close to µ as we want and the variance of

    1nSn gets

    arbitrarily small if n gets big enough.

    SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Weak Law of Large Numbers (WLLN)

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that

    p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)

    Written as 1nSnp−→ µ (converges in probability)

    1nSn gets as close to µ as we want and the variance of

    1nSn gets

    arbitrarily small if n gets big enough.

    SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Weak Law of Large Numbers (WLLN)

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that

    p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)

    Written as 1nSnp−→ µ (converges in probability)

    1nSn gets as close to µ as we want and the variance of

    1nSn gets

    arbitrarily small if n gets big enough.

    SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence

    Xn → X almost surely (Xn a.s.−−→ X) if the set{ω ∈ Ω : Xn(ω)→ X(ω) as n→∞} (4.11)

    is an event with probability 1. So all such events combined togetherhave probability 1, and any event left out has probability zero,according to the underlying probability measure.

    Xn → X in the rth mean (r ≥ 1), (written Xn r−→ X) ifE|Xrn| �)→ 0 as n→∞, � > 0 (4.13)

    Xn → X in distribution (written Xn D−→ X) ifp(Xn ≤ x)→ P (X < x) as n→∞ (4.14)

    for all points x at which FX(x) = p(X ≤ x) is continuous.Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F10/50(pg.22/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, expanded definition

    Expanded description of convergence in probability.

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ > �)< δ for n > n0 (4.15)

    Equivalently

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.23/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, expanded definition

    Expanded description of convergence in probability.

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ > �)< δ for n > n0 (4.15)

    Equivalently

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.24/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, expanded definition

    Expanded description of convergence in probability.

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ > �)< δ for n > n0 (4.15)

    Equivalently

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.25/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, expanded definition

    Expanded description of convergence in probability.

    Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ > �)< δ for n > n0 (4.15)

    Equivalently

    p

    (∣∣∣ 1nSn − µ

    ∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.26/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, implications

    Some modes of convergence are stronger than others. That is

    a.s.

    p

    r

    D ∀r ≥ 1⇒

    ⇒⇒

    Also, if r > s ≥ 1, then r ⇒ s.Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.

    Keep this in mind when considering the AEP which we turn to next.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.27/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, implications

    Some modes of convergence are stronger than others. That is

    a.s.

    p

    r

    D ∀r ≥ 1⇒

    ⇒⇒

    Also, if r > s ≥ 1, then r ⇒ s.

    Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.

    Keep this in mind when considering the AEP which we turn to next.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.28/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, implications

    Some modes of convergence are stronger than others. That is

    a.s.

    p

    r

    D ∀r ≥ 1⇒

    ⇒⇒

    Also, if r > s ≥ 1, then r ⇒ s.Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.

    Keep this in mind when considering the AEP which we turn to next.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.29/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Aside: modes of convergence, implications

    Some modes of convergence are stronger than others. That is

    a.s.

    p

    r

    D ∀r ≥ 1⇒

    ⇒⇒

    Also, if r > s ≥ 1, then r ⇒ s.Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.

    Keep this in mind when considering the AEP which we turn to next.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.30/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    At first, we’ll emphasize mostly intuition

    Please read chapter 3 if you haven’t yet. If you have read it, read itagain.

    Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.

    Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).

    For the n random variables (X1, X2, . . . , Xn), there are Kn possible

    outcomes.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.31/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    At first, we’ll emphasize mostly intuition

    Please read chapter 3 if you haven’t yet. If you have read it, read itagain.

    Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.

    Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).

    For the n random variables (X1, X2, . . . , Xn), there are Kn possible

    outcomes.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.32/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    At first, we’ll emphasize mostly intuition

    Please read chapter 3 if you haven’t yet. If you have read it, read itagain.

    Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.

    Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).

    For the n random variables (X1, X2, . . . , Xn), there are Kn possible

    outcomes.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.33/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    At first, we’ll emphasize mostly intuition

    Please read chapter 3 if you haven’t yet. If you have read it, read itagain.

    Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.

    Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).

    As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).

    For the n random variables (X1, X2, . . . , Xn), there are Kn possible

    outcomes.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.34/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    At first, we’ll emphasize mostly intuition

    Please read chapter 3 if you haven’t yet. If you have read it, read itagain.

    Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.

    Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).

    For the n random variables (X1, X2, . . . , Xn), there are Kn possible

    outcomes.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.35/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    At first, we’ll emphasize mostly intuition

    Please read chapter 3 if you haven’t yet. If you have read it, read itagain.

    Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.

    Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).

    For the n random variables (X1, X2, . . . , Xn), there are Kn possible

    outcomes.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.36/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m.

    Hence, M = 2m possible code words.We can represent the encoder as follows:

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?

    M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.37/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.

    We can represent the encoder as follows:

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?

    M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.38/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?

    M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.39/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.

    We want to have a code word for every possible source message,must have what condition?

    M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.40/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?

    M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.41/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?

    M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.42/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    A question on rate: How many bits are used per source letter?

    R = rate =logM

    n=m

    n≥ logK bits per source letter (4.18)

    Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error?

    Yes.

    How? One way: some source messages would not have a code.

    Source messages Code wordsgarbage

    I.e., code words only assigned to a subset of the source messages!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.43/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    A question on rate: How many bits are used per source letter?

    R = rate =logM

    n=m

    n≥ logK bits per source letter (4.18)

    Not surprising, e.g., for English need dlogKe = 5 bits.

    Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error?

    Yes.

    How? One way: some source messages would not have a code.

    Source messages Code wordsgarbage

    I.e., code words only assigned to a subset of the source messages!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.44/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    A question on rate: How many bits are used per source letter?

    R = rate =logM

    n=m

    n≥ logK bits per source letter (4.18)

    Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error?

    Yes.How? One way: some source messages would not have a code.

    Source messages Code wordsgarbage

    I.e., code words only assigned to a subset of the source messages!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.45/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    A question on rate: How many bits are used per source letter?

    R = rate =logM

    n=m

    n≥ logK bits per source letter (4.18)

    Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error? Yes.

    How? One way: some source messages would not have a code.

    Source messages Code wordsgarbage

    I.e., code words only assigned to a subset of the source messages!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.46/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    A question on rate: How many bits are used per source letter?

    R = rate =logM

    n=m

    n≥ logK bits per source letter (4.18)

    Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error? Yes.How? One way: some source messages would not have a code.

    Source messages Code wordsgarbage

    I.e., code words only assigned to a subset of the source messages!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.47/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    A question on rate: How many bits are used per source letter?

    R = rate =logM

    n=m

    n≥ logK bits per source letter (4.18)

    Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error? Yes.How? One way: some source messages would not have a code.

    Source messages Code wordsgarbage

    I.e., code words only assigned to a subset of the source messages!Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.48/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    Any source message assigned to garbage would, if we wish to sendthat message, result in an error.

    Alternatively, and perhaps less distressingly, rather than throw somemessages into the trash, we could assign to them long code words,and the non-garbage messages to short code words.

    Source messages

    Short Code words

    Long Code words

    In either case, if n gets big enough, we make the code such that theprobability of getting one of those error source messages (orlong-code-word source messages) very small!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F16/50(pg.49/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    Any source message assigned to garbage would, if we wish to sendthat message, result in an error.

    Alternatively, and perhaps less distressingly, rather than throw somemessages into the trash, we could assign to them long code words,and the non-garbage messages to short code words.

    Source messages

    Short Code words

    Long Code words

    In either case, if n gets big enough, we make the code such that theprobability of getting one of those error source messages (orlong-code-word source messages) very small!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F16/50(pg.50/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    Any source message assigned to garbage would, if we wish to sendthat message, result in an error.

    Alternatively, and perhaps less distressingly, rather than throw somemessages into the trash, we could assign to them long code words,and the non-garbage messages to short code words.

    Source messages

    Short Code words

    Long Code words

    In either case, if n gets big enough, we make the code such that theprobability of getting one of those error source messages (orlong-code-word source messages) very small!

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F16/50(pg.51/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Probability of Source Words

    The probability of a source word can be expressed

    p(X1 = x1, X2 = x2, . . . , Xn = xn) =

    n∏

    i=1

    p(Xi = xi) (4.19)

    Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)

    = − logn∏

    i=1

    p(xi) =

    n∑

    i=1

    − log p(xi) =n∑

    i=1

    I(xi)

    (4.21)

    Also note that EI(X) = H(X).

    The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s

    with mean µ = EXi.

    So, I(Xi) is also a random variable with mean H(X).

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.52/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Probability of Source Words

    The probability of a source word can be expressed

    p(X1 = x1, X2 = x2, . . . , Xn = xn) =

    n∏

    i=1

    p(Xi = xi) (4.19)

    Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)

    = − logn∏

    i=1

    p(xi) =

    n∑

    i=1

    − log p(xi) =n∑

    i=1

    I(xi)

    (4.21)

    Also note that EI(X) = H(X).

    The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s

    with mean µ = EXi.

    So, I(Xi) is also a random variable with mean H(X).

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.53/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Probability of Source Words

    The probability of a source word can be expressed

    p(X1 = x1, X2 = x2, . . . , Xn = xn) =

    n∏

    i=1

    p(Xi = xi) (4.19)

    Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)

    = − logn∏

    i=1

    p(xi) =

    n∑

    i=1

    − log p(xi) =n∑

    i=1

    I(xi)

    (4.21)

    Also note that EI(X) = H(X).

    The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s

    with mean µ = EXi.

    So, I(Xi) is also a random variable with mean H(X).

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.54/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Probability of Source Words

    The probability of a source word can be expressed

    p(X1 = x1, X2 = x2, . . . , Xn = xn) =

    n∏

    i=1

    p(Xi = xi) (4.19)

    Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)

    = − logn∏

    i=1

    p(xi) =

    n∑

    i=1

    − log p(xi) =n∑

    i=1

    I(xi)

    (4.21)

    Also note that EI(X) = H(X).

    The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s

    with mean µ = EXi.

    So, I(Xi) is also a random variable with mean H(X).

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.55/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Probability of Source Words

    The probability of a source word can be expressed

    p(X1 = x1, X2 = x2, . . . , Xn = xn) =

    n∏

    i=1

    p(Xi = xi) (4.19)

    Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)

    = − logn∏

    i=1

    p(xi) =

    n∑

    i=1

    − log p(xi) =n∑

    i=1

    I(xi)

    (4.21)

    Also note that EI(X) = H(X).

    The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s

    with mean µ = EXi.

    So, I(Xi) is also a random variable with mean H(X).Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.56/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    WLLN and entropy

    Combining the above, we get

    1

    n

    n∑

    i=1

    I(Xi)p−−−→

    n→∞H(X) (4.22)

    Thus, if n is big enough, we have that (this is where it gets cool ,)

    1

    n

    n∑

    i=1

    I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)

    ⇒ − 1n

    n∑

    i=1

    log p(xi) ≈ H(X) (4.24)

    ⇒ − logn∏

    i=1

    p(xi) ≈ nH(X) (4.25)

    ⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)

    (4.27)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.57/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    WLLN and entropy

    Combining the above, we get

    1

    n

    n∑

    i=1

    I(Xi)p−−−→

    n→∞H(X) (4.22)

    Thus, if n is big enough, we have that (this is where it gets cool ,)

    1

    n

    n∑

    i=1

    I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)

    ⇒ − 1n

    n∑

    i=1

    log p(xi) ≈ H(X) (4.24)

    ⇒ − logn∏

    i=1

    p(xi) ≈ nH(X) (4.25)

    ⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)

    (4.27)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.58/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    WLLN and entropy

    Combining the above, we get

    1

    n

    n∑

    i=1

    I(Xi)p−−−→

    n→∞H(X) (4.22)

    Thus, if n is big enough, we have that (this is where it gets cool ,)

    1

    n

    n∑

    i=1

    I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)

    ⇒ − 1n

    n∑

    i=1

    log p(xi) ≈ H(X) (4.24)

    ⇒ − logn∏

    i=1

    p(xi) ≈ nH(X) (4.25)

    ⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)

    (4.27)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.59/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    WLLN and entropy

    Combining the above, we get

    1

    n

    n∑

    i=1

    I(Xi)p−−−→

    n→∞H(X) (4.22)

    Thus, if n is big enough, we have that (this is where it gets cool ,)

    1

    n

    n∑

    i=1

    I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)

    ⇒ − 1n

    n∑

    i=1

    log p(xi) ≈ H(X) (4.24)

    ⇒ − logn∏

    i=1

    p(xi) ≈ nH(X) (4.25)

    ⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)

    (4.27)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.60/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    WLLN and entropy

    Combining the above, we get

    1

    n

    n∑

    i=1

    I(Xi)p−−−→

    n→∞H(X) (4.22)

    Thus, if n is big enough, we have that (this is where it gets cool ,)

    1

    n

    n∑

    i=1

    I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)

    ⇒ − 1n

    n∑

    i=1

    log p(xi) ≈ H(X) (4.24)

    ⇒ − logn∏

    i=1

    p(xi) ≈ nH(X) (4.25)

    ⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)

    ⇒ p(x1, . . . , xn) ≈ 2−nH(X)

    (4.27)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.61/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    WLLN and entropy

    Combining the above, we get

    1

    n

    n∑

    i=1

    I(Xi)p−−−→

    n→∞H(X) (4.22)

    Thus, if n is big enough, we have that (this is where it gets cool ,)

    1

    n

    n∑

    i=1

    I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)

    ⇒ − 1n

    n∑

    i=1

    log p(xi) ≈ H(X) (4.24)

    ⇒ − logn∏

    i=1

    p(xi) ≈ nH(X) (4.25)

    ⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X) (4.27)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.62/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We repeat this last equation: When n is large enough, we have

    p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)

    Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!

    So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .

    . . . and that probability is equal to 2−nH .

    Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.63/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We repeat this last equation: When n is large enough, we have

    p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)

    Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!

    So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .

    . . . and that probability is equal to 2−nH .

    Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.64/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We repeat this last equation: When n is large enough, we have

    p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)

    Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!

    So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .

    . . . and that probability is equal to 2−nH .

    Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.65/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We repeat this last equation: When n is large enough, we have

    p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)

    Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!

    So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .

    . . . and that probability is equal to 2−nH .

    Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.66/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    We repeat this last equation: When n is large enough, we have

    p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)

    Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!

    So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .

    . . . and that probability is equal to 2−nH .

    Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.67/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, and

    if n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.68/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.69/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.70/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.71/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � Kn

    Those samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.72/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.73/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Said again: Almost all events are almost equally probable

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,

    then for any sample x1, x2, . . . , xn

    The probability of the sample is essentially independent of thesample, i.e.,

    p(x1, . . . , xn) ≈ 2−nH(X) (4.29)

    where H(X) is the entropy of p(x).

    Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are

    represented by A(n)� .

    Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.74/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    The Typical Set

    Let A(n)� be the set of typical sequences (i.e., those with probability

    2−nH(X).

    If “all” events have the same probability p, then there are 1/p ofthem.So the number of typical sequences is

    |A(n)� | ≈ 2nH(X). (4.30)Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take

    m = nH(X) (4.31)

    in the encoder model. Thus, the rate of the code is H(X).

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.75/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    The Typical Set

    Let A(n)� be the set of typical sequences (i.e., those with probability

    2−nH(X).If “all” events have the same probability p, then there are 1/p ofthem.

    So the number of typical sequences is

    |A(n)� | ≈ 2nH(X). (4.30)Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take

    m = nH(X) (4.31)

    in the encoder model. Thus, the rate of the code is H(X).

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.76/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    The Typical Set

    Let A(n)� be the set of typical sequences (i.e., those with probability

    2−nH(X).If “all” events have the same probability p, then there are 1/p ofthem.So the number of typical sequences is

    |A(n)� | ≈ 2nH(X). (4.30)

    Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take

    m = nH(X) (4.31)

    in the encoder model. Thus, the rate of the code is H(X).

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.77/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    The Typical Set

    Let A(n)� be the set of typical sequences (i.e., those with probability

    2−nH(X).If “all” events have the same probability p, then there are 1/p ofthem.So the number of typical sequences is

    |A(n)� | ≈ 2nH(X). (4.30)Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take

    m = nH(X) (4.31)

    in the encoder model. Thus, the rate of the code is H(X).

    {X1, X2, . . . ,Xn}

    Xi ∈ {a1, a2, . . . , aK}Kn possible messages

    n source letters in each source msg

    {Y1 , Y 2, . . . , Ym}

    Yi ∈ {0, 1}2m possible messages

    m total bits

    Encoder

    Source messages Code words

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.78/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Code Only The Typical Set

    If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):

    m

    n= H, which could be ≤ logK (4.32)

    So to summarize, we have three uses and interpretations of entropyhere for source coding.

    1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for

    the typical set.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.79/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Code Only The Typical Set

    If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):

    m

    n= H, which could be ≤ logK (4.32)

    So to summarize, we have three uses and interpretations of entropyhere for source coding.

    1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for

    the typical set.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.80/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Code Only The Typical Set

    If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):

    m

    n= H, which could be ≤ logK (4.32)

    So to summarize, we have three uses and interpretations of entropyhere for source coding.

    1 The probability of a typical sequence is 2−nH(X).

    2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for

    the typical set.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.81/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Code Only The Typical Set

    If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):

    m

    n= H, which could be ≤ logK (4.32)

    So to summarize, we have three uses and interpretations of entropyhere for source coding.

    1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).

    3 Number of bits per source symbol is H(X), when we code only forthe typical set.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.82/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Code Only The Typical Set

    If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):

    m

    n= H, which could be ≤ logK (4.32)

    So to summarize, we have three uses and interpretations of entropyhere for source coding.

    1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for

    the typical set.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.83/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).

    The probability of a single sequence is

    p(x1, x2, . . . , xn) =n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability?

    No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.

    What is the most probable sequence?

    When p = 0.1, the sequenceof all 0s.

    Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.84/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability?

    No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.

    What is the most probable sequence?

    When p = 0.1, the sequenceof all 0s.

    Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.85/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.

    Do they all have the same probability?

    No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.

    What is the most probable sequence?

    When p = 0.1, the sequenceof all 0s.

    Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.86/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability?

    No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence?

    When p = 0.1, the sequenceof all 0s.

    Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.87/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.

    What is the most probable sequence?

    When p = 0.1, the sequenceof all 0s.

    Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.88/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence?

    When p = 0.1, the sequenceof all 0s.Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.89/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence? When p = 0.1, the sequenceof all 0s.

    Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.90/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence? When p = 0.1, the sequenceof all 0s.Do the sequences that collectively have “any” probability all havethe same probability?

    Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.91/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is

    p(x1, x2, . . . , xn) =

    n∏

    i=1

    pxi(1− pi)1−xi = p∑i xi(1− p)n−

    ∑i xi

    (4.33)

    There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence? When p = 0.1, the sequenceof all 0s.Do the sequences that collectively have “any” probability all havethe same probability? Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.92/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).

    Can we predict the probability that a particular sequence has aparticular probability value? I.e.,

    Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)

    Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.

    It turns out that

    Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)

    if n is large enough.

    In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.93/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).

    Can we predict the probability that a particular sequence has aparticular probability value? I.e.,

    Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)

    Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.

    It turns out that

    Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)

    if n is large enough.

    In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.94/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).

    Can we predict the probability that a particular sequence has aparticular probability value? I.e.,

    Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)

    Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.

    It turns out that

    Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)

    if n is large enough.

    In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.95/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).

    Can we predict the probability that a particular sequence has aparticular probability value? I.e.,

    Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)

    Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.

    It turns out that

    Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)

    if n is large enough.

    In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.96/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    AEP setup

    Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).

    Can we predict the probability that a particular sequence has aparticular probability value? I.e.,

    Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)

    Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.

    It turns out that

    Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)

    if n is large enough.

    In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.97/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).

    Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p

    (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.98/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p

    (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.99/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p

    (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.100/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p

    (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.101/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.102/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.103/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.104/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Ex: Bernoulli trials

    Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and

    p(Sn = k) =

    (n

    k

    )pkqn−k (4.36)

    Then, the expression 2−nH can be viewed in an intuitive way. I.e.,

    2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)

    = 2log pnp+log(1−p)n(1−p) (4.38)

    = pnpqnq where q = 1− p (4.39)

    here H = H(p) the binary entropy with probability p.

    np is the expected number of 1’s.

    nq is the expected number of 0’s.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.105/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.

    No other sequences have any appreciable probability!

    The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.

    But before doing any of that, we need more formalism.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.106/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.

    No other sequences have any appreciable probability!

    The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.

    But before doing any of that, we need more formalism.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.107/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.

    No other sequences have any appreciable probability!

    The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.

    But before doing any of that, we need more formalism.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.108/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Towards AEP

    In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.

    No other sequences have any appreciable probability!

    The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.

    But before doing any of that, we need more formalism.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.109/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    Theorem 4.5.1 (AEP)

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then

    − 1n

    log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)

    Proof.

    − 1n

    log p(X1, X2, . . . , Xn) = −1

    nlog

    n∏

    i=1

    p(Xi) (4.41)

    = − 1n

    i

    log p(Xi)p−→ −E log p(X) (4.42)

    = H(X)

    (4.43)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.110/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    Theorem 4.5.1 (AEP)

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then

    − 1n

    log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)

    Proof.

    − 1n

    log p(X1, X2, . . . , Xn)

    = − 1n

    log

    n∏

    i=1

    p(Xi) (4.41)

    = − 1n

    i

    log p(Xi)p−→ −E log p(X) (4.42)

    = H(X)

    (4.43)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.111/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    Theorem 4.5.1 (AEP)

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then

    − 1n

    log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)

    Proof.

    − 1n

    log p(X1, X2, . . . , Xn) = −1

    nlog

    n∏

    i=1

    p(Xi) (4.41)

    = − 1n

    i

    log p(Xi)p−→ −E log p(X) (4.42)

    = H(X)

    (4.43)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.112/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    Theorem 4.5.1 (AEP)

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then

    − 1n

    log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)

    Proof.

    − 1n

    log p(X1, X2, . . . , Xn) = −1

    nlog

    n∏

    i=1

    p(Xi) (4.41)

    = − 1n

    i

    log p(Xi)

    p−→ −E log p(X) (4.42)

    = H(X)

    (4.43)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.113/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    Theorem 4.5.1 (AEP)

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then

    − 1n

    log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)

    Proof.

    − 1n

    log p(X1, X2, . . . , Xn) = −1

    nlog

    n∏

    i=1

    p(Xi) (4.41)

    = − 1n

    i

    log p(Xi)p−→ −E log p(X) (4.42)

    = H(X)

    (4.43)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.114/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Asymptotic Equipartition Property (AEP)

    Theorem 4.5.1 (AEP)

    If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then

    − 1n

    log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)

    Proof.

    − 1n

    log p(X1, X2, . . . , Xn) = −1

    nlog

    n∏

    i=1

    p(Xi) (4.41)

    = − 1n

    i

    log p(Xi)p−→ −E log p(X) (4.42)

    = H(X) (4.43)

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.115/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Typical Set

    Definition 4.5.2 (Typical Set)

    The typical set A(n)� w.r.t. p(x) is the set of sequences

    (x1, x2, . . . , xn) ∈ X n with the property that

    2−n(H(X)+�) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−�) (4.44)

    Equivalently, we may write A(n)� as

    A(n)� =

    {(x1, x2, . . . , xn) :

    ∣∣∣∣−1

    nlog p(x1, . . . , xn)−H

    ∣∣∣∣ < �}

    (4.45)

    Typical set are those sequences with log probability within the range−nH

    n� n�

    A(n)� has a number of interesting properties.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F28/50(pg.116/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Typical Set

    Definition 4.5.2 (Typical Set)

    The typical set A(n)� w.r.t. p(x) is the set of sequences

    (x1, x2, . . . , xn) ∈ X n with the property that

    2−n(H(X)+�) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−�) (4.44)

    Equivalently, we may write A(n)� as

    A(n)� =

    {(x1, x2, . . . , xn) :

    ∣∣∣∣−1

    nlog p(x1, . . . , xn)−H

    ∣∣∣∣ < �}

    (4.45)

    Typical set are those sequences with log probability within the range−nH

    n� n�

    A(n)� has a number of interesting properties.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F28/50(pg.117/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Typical Set

    Definition 4.5.2 (Typical Set)

    The typical set A(n)� w.r.t. p(x) is the set of sequences

    (x1, x2, . . . , xn) ∈ X n with the property that

    2−n(H(X)+�) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−�) (4.44)

    Equivalently, we may write A(n)� as

    A(n)� =

    {(x1, x2, . . . , xn) :

    ∣∣∣∣−1

    nlog p(x1, . . . , xn)−H

    ∣∣∣∣ < �}

    (4.45)

    Typical set are those sequences with log probability within the range−nH

    n� n�

    A(n)� has a number of interesting properties.

    Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F28/50(pg.118/244)

  • Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding

    Typical Set

    Definition 4.5.2 (Typical Set)

    The typical set A(n)� w.r.t. p(x) is the set of sequences

    (x1,