Finite-blocklength schemes in information theory
Transcript of Finite-blocklength schemes in information theory
![Page 1: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/1.jpg)
Finite-blocklength schemes in information theory
Li Cheuk Ting
Department of Information Engineering, The Chinese University of Hong Kong
Part of this presentation is based on my lecture notes for Special Topics in Information Theory
![Page 2: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/2.jpg)
Overview
β’ In this talk, we study an unconventional approach to code constructionβ’ An alternative to conventional random coding
β’ Gives tight one-shot/finite-blocklength/asymptotic results
β’ Very simple (proof of Martonβs inner bound for broadcast channel can be written in one slide!)
β’ Apply to channel coding, channel with state, broadcast channel, multiple access channel, lossy source coding (with side information), etc
![Page 3: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/3.jpg)
How to measure information?
![Page 4: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/4.jpg)
How to measure information?
β’ How many bits are needed to store a piece of information?β’ E.g. We can use one bit to represent whether it will rain
tomorrow
β’ In general, to represent π possibilities, need βlog2 πβ bits
β’ How much information does βit will rain tomorrowβ really contain?β’ For a place that always rains, this contains no
information
β’ The less likely it will rain, the more information (βsurprisalβ) it contains
![Page 5: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/5.jpg)
![Page 6: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/6.jpg)
Self-information
β’ For probability mass function ππ of random variable π, the self-information of the value π₯ is
ππ π₯ = log1
ππ(π₯)
β’ We use log to base 2 (unit is bit)
β’ For joint pmf ππ,π of a random variables π, π,
ππ,π π₯, π¦ = log1
ππ,π(π₯, π¦)
![Page 7: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/7.jpg)
Self-information
β’ E.g. in English text, the most frequent letter is βeβ (13%), and the least frequent letter is βzβ (0.074%) (according to https://en.wikipedia.org/wiki/Letter_frequency)
β’ Let π β {a,β¦ , z} be a random letter
β’ Have
ππ e = log1
0.13β 2.94 bits
ππ z = log1
0.00074β 10.40 bits
![Page 8: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/8.jpg)
Self-information - Properties
β’ ππ π₯ β₯ 0
β’ If ππ is the uniform distribution over [1. . π],ππ π₯ = log π for π₯ β [1. . π]
β’ (Invariant under relabeling) If π is an injective function, then ππ(π) π(π₯) = ππ π₯
β’ (Additive) If π, π are independent,ππ,π π₯, π¦ = ππ π₯ + ππ π¦
![Page 9: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/9.jpg)
Information spectrum
β’ If π is a random variable, ππ π is random as well
β’ Some values of π may contain more information than others
β’ The distribution of ππ π (or its cumulative distribution function) is called the information spectrum
β’ ππ π is a constant if and only if π follows a uniform distribution
β’ Information spectrum is a probability distribution, which can be unwieldy
β’ We sometimes want a single number to summarize the amount of information of π
![Page 10: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/10.jpg)
Entropy
β’ The Shannon entropy
π» π = π»(ππ) = π ππ π =π₯ππ(π₯) log
1
ππ(π₯)is the average of the self-informationβ’ A number (not random) that roughly corresponds to
the amount of information in π
β’ Treat 0 log(1/0) = 0
β’ Similarly the joint entropy of π and π is π» π, π = π ππ,π π, π
![Page 11: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/11.jpg)
Entropy - Properties
β’ π»(π) β₯ 0, and π» π = 0 iff π is (almost surely) a constant
β’ If π β [1. . π], then π» π β€ log πβ’ Equality iff π is uniform over [1. . π]β’ Proof: Jensenβs ineq. on concave function π§ β¦ π§ log(1/π§)
β’ If π is a function, then π» π(π) β€ π»(π)β’ If π is injective, equality holds (invariant under relabeling)
β’ Consequences: π» π, π β₯ π»(π), π» π, π π = π»(π)
β’ (Subadditive) π» π, π β€ π» π + π»(π)β’ Equality holds iff π, π independent (additive)
β’ π» π is concave in ππ
![Page 12: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/12.jpg)
A random English letter
β’ Self-information ranges from ππ e β 2.94 toππ z β 10.40
β’ π» π β 4.18
ππ(π₯)
ππ(π₯)
(according to https://en.wikipedia.org/wiki/Letter_frequency)
![Page 13: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/13.jpg)
Why is entropy a reasonable measure of information?
β’ Axiomatic characterization:π» π is the only measure that satisfies β’ Subadditivity. π» π, π β€ π» π + π» π
β’ Additivity. π» π,π = π» π + π» π if π,π independent
β’ Invariant under relabeling and adding a zero mass
β’ π» π is continuous in ππβ’ π» π = 1 when π~Unif{0,1}
[AczΓ©l, J., Forte, B., & Ng, C. T. (1974). Why the Shannon and Hartley entropies are 'naturalβ]
β’ Operational characterizations:β’ π» π is approximately the number of coin flips needed to
generate π [D. E. Knuth & A. C. Yao. (1976). The complexity of nonuniform random number generation]
β’ π» π is approximately the number of bits needed to compress π
![Page 14: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/14.jpg)
Information density
β’ The information density between two random variables π, π is
ππ;π π₯; π¦ = ππ π¦ β ππ|π π¦ π₯
= logππ,π π₯, π¦
ππ π₯ ππ π¦= log
ππ|π(π¦|π₯)
ππ(π¦)β’ ππ π¦ is the info of π = π¦ without knowing π = π₯
β’ ππ|π π¦ π₯ is the info of π = π¦ after knowing π = π₯
β’ ππ;π π₯; π¦ measures how much knowing π = π₯ reduces the info of π = π¦
β’ Can be positive/negative/zero
β’ Zero if π, π independent
![Page 15: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/15.jpg)
Information density
β’ ππ;π π₯; π¦ = ππ π¦ β ππ|π π¦ π₯
= logππ,π π₯,π¦
ππ π₯ ππ π¦= log
ππ|π(π¦|π₯)
ππ(π¦)
β’ E.g. π, π are the indicators of whether it rains today/tomorrow resp., with the following prob. matrix
β’ ππ;π 1; 1 = log0.2
0.3β 0.3β 1.15
β’ Knowing it rains today decreases the info of βtomorrow will rainβ
β’ ππ;π 1; 0 = log0.1
0.3β 0.7β β1.07
β’ Knowing it rains today increases the info of βtomorrow will not rainβ
π = 0 π = 1
π = 0 0.6 0.1
π = 1 0.1 0.2
![Page 16: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/16.jpg)
Mutual information
β’ The mutual information between two random variables π, π is
πΌ π; π = π ππ;π π; π
= π logππ,π π, π
ππ π ππ π= π» π β π» π π= π» π + π» π β π»(π, π)
β’ Always nonnegative since π» π β₯ π» π π
β’ Measures the dependency between π, πβ’ Zero iff π, π independent
![Page 17: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/17.jpg)
Source coding & channel coding
β’ Source coding: compressing a source π~ππ
β’ Channel coding: transmitting a message π through a noisy channel
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
Encπ β {1,β¦ , π}
π~ππ Dec π
![Page 18: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/18.jpg)
One-shot channel coding
β’ Message π~Unif{1,β¦ , π}
β’ Encoder maps message to channel input π = π(π)β’ The set π = π π :π β 1,β¦ , π is the codebook
β’ Its elements π π are called codewords
β’ Channel output π follows conditional distribution ππ|π
β’ Decoder maps π to decoded message π = π(π)
β’ Goal: error prob π( π β π) is small
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
![Page 19: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/19.jpg)
One-shot channel coding
β’ Want π π β π β€ π
Thm [Yassaee et al. 2013]. Fix any ππ. There exists code with
π π β π β€ 1 β π1
1 + π2βππ;π π;π
β€ π min{π2βππ;π π;π , 1}where π, π ~ππππ|π
[Yassaee, Aref, and Gohari, "A technique for deriving one-shot achievability results in network information theory," ISIT 2013.]
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
![Page 20: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/20.jpg)
One-shot channel coding
β’ Random codebook generation: generateπ π ~ππ i.i.d. for π β {1,β¦ , π}
Given π, the decoder:
β’ (Maximum likelihood decoder) Find ΰ·π that maximizes ππ|π(π|π ΰ·π )β’ Optimal β attains the lowest error prob. for a fixed π
β’ (Stochastic likelihood decoder) Chooses ΰ·π with prob.
π ΰ·π π =ππ|π(π|π ΰ·π )
Οπβ² ππ|π(π|π πβ² )=
2ππ;π(π ΰ·π ;π)
Οπβ² 2ππ;π(π πβ² ;π)
[Yassaee-Aref-Gohari 2013]
![Page 21: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/21.jpg)
β’ π ΰ·π π =2ππ;π(π ΰ·π ;π)
Οπβ² 2
ππ;π(π πβ² ;π)[Yassaee-Aref-Gohari 2013]
π π = π
= ππ1
πΟπ,π¦ππ|π(π¦|π(π))
2ππ;π(π π ;π¦)
Οπβ² 2
ππ;π(π πβ² ;π¦)
= ππ Οπ¦ππ|π(π¦|π(1))2ππ;π(π 1 ;π¦)
Οπβ² 2
ππ;π(π πβ² ;π¦)(Symmetry)
= Οπ¦ππ(1)ππ 2 ,β¦,π(π) ππ|π(π¦|π(1))2ππ;π(π 1 ;π¦)
2ππ;π(π 1 ;π¦)+Οπβ²β 1
2ππ;π(π πβ² ;π¦)
β₯ Οπ¦ ππ(1) ππ|π(π¦|π(1))2ππ;π(π 1 ;π¦)
2ππ;π(π 1 ;π¦)+πβ1(Jensen)
β₯ Οπ¦ ππ(1) ππ|π(π¦|π(1))1
1+π2βππ;π(π 1 ;π¦)
= Οπ¦Οπ₯ ππ(π₯) ππ|π(π¦|π₯)1
1+π2βππ;π(π₯;π¦)
= π1
1+π2βππ;π π;π
![Page 22: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/22.jpg)
Asymptotic channel coding
β’ Memoryless: πππ|ππ π¦π π₯π = Οπ=1π ππ|π(π¦π|π₯π)
β’ Applying one-shot:
ππ = π π β π β€ π min{2ππ βΟπ=1π ππ;π ππ;ππ , 1} ,
where ππ , ππ ~ππππ|π i.i.d. for π = 1, β¦ , π
β’ Asymptotic (π β β): haveΟπ=1π ππ;π ππ; ππ β ππΌ(π; π) by law of large numbers,
so ππ β 0 if π < πΌ π; π
β’ Recovers (achievability part of) Shannonβs channel coding theorem: Channel capacity is
πΆ = maxππ
πΌ(π; π)
Encπ~Unif{1,β¦ , 2ππ }ππ
Dec πChannelππ|π
ππ
![Page 23: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/23.jpg)
Codebook as a black box
β’ Random codebook: π = {π π }~ππ i.i.d. for π β {1,β¦ , π}
β’ Decoder: Find ΰ·π = argmax ππ|π π ΰ·π π /ππ(π ΰ·π )
β’ Treat codebook π as a box:β’ Operation 1: Query π, get π~ππβ’ Operation 2: Query posterior distribution ππ|π, get π
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
Box
π π~ππ ππ|π π
![Page 24: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/24.jpg)
A general black box
β’ Consider random variable π
β’ Only one operation: Query distribution π, get π~π
β’ Want box to have βmemoryββ’ If we query the same π twice, should get the same π
β’ If we query similar π1, π2, then π1, π2 are equal with high probability
Magic box!π π~π
![Page 25: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/25.jpg)
Using the general black box
β’ Let π = (π,π)
β’ Encoding: Query π = ππ Γ πΏπ (πΏπ is degenerate distribution π π = π = 1), get (π,π)
β’ Decoding: Query π = ππ|π Γ ππ (ππ is Unif{1,β¦ , π}), get ( π, ΰ·π)
β’ Input partial knowledge into box, get full knowledge
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
Magic box about (π,π)
π = ππ Γ πΏπ π~ππ π = ππ|π Γ ππ π
![Page 26: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/26.jpg)
How to build the box
β’ Operation: Query distribution π, get π~πβ’ Memory: If we query similar π1, π2, then π1, π2 are equal
with high probability
β’ Attempt 1: Generate π~π afresh for each query?β’ Does not have memory!
β’ Attempt 2: Generate random seed π at the beginning, then use the same seed to generate all π~π ?β’ Only guarantees to give the same π for the same πβ’ No guarantee for similar but different π1, π2
β’ Need a way to generate π that is not sensitive to small changes to π
Magic box!π π~π
![Page 27: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/27.jpg)
How to build the boxβ’ Generate random seed π at the beginning, then use
the same seed to generate all π~π ?
β’ Exponential distribution with rate πExp(π) has prob. density function
π π§; π = ππβππ§ for π§ β₯ 0β’ If π~Exp(π), then ππ~Exp(π/π)
β’ For ππ~Exp(ππ) indep. for π = 1, β¦ , π, have
π argminπππ = π =ππ
π1 +β―+ ππ
β’ Let π = π1, β¦ , ππ be the seed, ππ’~Exp(1) i.i.d.
β’ Query π, output π = argminπ’ππ’
π(π’)
Magic box!π π~π
C. T. Li and A. El Gamal. Strong functional representation lemma and applications to coding theorems. IEEE Trans. Inf. Theory, 64(11):6967β6978, 2018.
C. T. Li and V. Anantharam, "A Unified Framework for One-Shot Achievability via the Poisson Matching Lemma," IEEE Trans. Inf. Theory, vol. 67, no. 5,
pp. 2624-2651, 2021.
![Page 28: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/28.jpg)
How to build the box
β’ Let π = π1, β¦ , ππ be the seed, ππ~Exp(1) i.i.d.
β’ Query π, output π = argminπ’ππ’
π(π’)
β’ π π = π’ =π(π’)
π(1)+β―+π(π)= π(π’) OK!
β’ Give same π for same π since π is a function of π and π (fixed at the beginning) OK!
β’ Small changes to π is unlikely to affect
argminπ’ππ’
π(π’)OK!
Magic box!π π~π
![Page 29: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/29.jpg)
How to build the box
β’ Let π = π1, β¦ , ππ be the seed, ππ~Exp(1) i.i.d.
β’ Query π, output π = argminπ’ππ’
π(π’)
β’ If π = 2, then π = 1 iffπ1
π(1)<
π2
π(2)β
π1
π1+π2< π(1)
Magic box!π π~π
π1π1 + π2
0
1
π 1 = ππ~π(π = 1)
π = 1
1
π = 2
![Page 30: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/30.jpg)
Poisson matching lemma
β’ Let π = π1, β¦ , ππ be the seed, ππ~Exp(1) i.i.d.
β’ Query π, output ππ = argminπ’ππ’
π(π’)
β’ Poisson matching lemma [Li-Anantharam 2018]:If we query π, π to get ππ, ππ respectively, then
π ππ β ππ ππ β€π(ππ)
π(ππ)
C. T. Li and A. El Gamal. Strong functional representation lemma and applications to coding theorems. IEEE Trans. Inf. Theory, 64(11):6967β6978, 2018.
C. T. Li and V. Anantharam, "A Unified Framework for One-Shot Achievability via the Poisson Matching Lemma," IEEE Trans. Inf. Theory, vol. 67, no. 5,
pp. 2624-2651, 2021.
![Page 31: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/31.jpg)
A general black box
β’ Operation: Query distribution π, get π~π
β’ Guarantee: If we query π, π to get ππ, ππrespectively, then
π ππ β ππ ππ β€π(ππ)
π(ππ)
β’ We can use this box alone to prove many tight one-shot/finite-blocklength/asymptotic coding results
Magic box!π π~π
![Page 32: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/32.jpg)
β’ Let π = (π,π)
β’ Encoding: Query π = ππ Γ πΏπ, get (π,π)
β’ Decoding: Query π = ππ|π Γ ππ, get ( π, π)
β’ Poisson matching lemma:π π β π β€ π π π β π π,π, π
β€ π min(ππΓπΏπ)(π,π)
(ππ|πΓππ)(π,π), 1
= π minππ(π)
ππ|π(π|π)/π, 1
= π min π2βππ;π(π;π), 1
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
Magic box about (π,π)
π = ππ Γ πΏπ π~ππ π = ππ|π Γ ππ π
C. T. Li and V. Anantharam, "A Unified Framework for
One-Shot Achievability via the Poisson Matching Lemma,"
IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 2624-2651, 2021.
![Page 33: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/33.jpg)
Channel coding β removing the box
β’ The box contains a random seed in it
β’ In reality, encoder and decoder cannot share common randomness
β’ ππ β€ π min π2βππ;π(π;π), 1 averaged over choices of seed
β’ There exists fixed seed s.t. ππ β€ π min π2βππ;π(π;π), 1
Encπ~Unif{1,β¦ , π}π
Dec πChannelππ|π
π
Fixed box about (π,π)
π = ππ Γ πΏπ π~ππ π = ππ|π Γ ππ π
![Page 34: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/34.jpg)
Second-order asymptotics
β’ ππ β€ π min 2πΏβΟπ=1π ππ;π ππ;ππ , 1 , ππ , ππ ~ππππ|π i.i.d.
β’ ππ β 0 if πΏ βͺ Οπ=1π π ππ; ππ , ππ β 1 if πΏ β« Οπ=1
π π ππ; ππβ’ First-order: optimal πΏ β ππΌ(π; π)
β’ Central limit theorem:Οπ=1π π ππ; ππ approximately follows π(ππΌ π; π , ππ), where
π = Var[π π; π ]
β’ For a fixed ππ = π, optimal πΏ β ππΌ π; π β πππβ1 πwhere πβ1 π is the inverse of the Q-function(π πΎ = 1 β Ξ¦(πΎ), Ξ¦ is the cdf of π(0,1))
β’ The π when ππ is the capacity-achieving distribution (that maximizes πΌ(π; π)) is called the channel dispersion
Encπ~Unif{1,β¦ , 2πΏ}ππ
Dec πChannelππ|π
ππ
Y. Polyanskiy, H. V. Poor, and S. VerdΓΊ, βChannel coding rate in the finite blocklength regime,β IEEE Transactions on Information Theory, vol. 56, no.
5, pp. 2307β2359, 2010.
![Page 35: Finite-blocklength schemes in information theory](https://reader031.fdocuments.in/reader031/viewer/2022020620/61e52948d45eeb16a15fcbab/html5/thumbnails/35.jpg)
Οπ=1π π ππ; ππ
ππΌ(π; π)
πΎ ππ
Fixed error prob.
cutoff point
(second order)
π π =
ππ
Error prob. βπ(Οπ=1
π π ππ; ππ β€ πΏ)