Finite-blocklength schemes in information theory

35
Finite-blocklength schemes in information theory Li Cheuk Ting Department of Information Engineering, The Chinese University of Hong Kong [email protected] Part of this presentation is based on my lecture notes for Special Topics in Information Theory

Transcript of Finite-blocklength schemes in information theory

Page 1: Finite-blocklength schemes in information theory

Finite-blocklength schemes in information theory

Li Cheuk Ting

Department of Information Engineering, The Chinese University of Hong Kong

[email protected]

Part of this presentation is based on my lecture notes for Special Topics in Information Theory

Page 2: Finite-blocklength schemes in information theory

Overview

β€’ In this talk, we study an unconventional approach to code constructionβ€’ An alternative to conventional random coding

β€’ Gives tight one-shot/finite-blocklength/asymptotic results

β€’ Very simple (proof of Marton’s inner bound for broadcast channel can be written in one slide!)

β€’ Apply to channel coding, channel with state, broadcast channel, multiple access channel, lossy source coding (with side information), etc

Page 3: Finite-blocklength schemes in information theory

How to measure information?

Page 4: Finite-blocklength schemes in information theory

How to measure information?

β€’ How many bits are needed to store a piece of information?β€’ E.g. We can use one bit to represent whether it will rain

tomorrow

β€’ In general, to represent π‘˜ possibilities, need ⌈log2 π‘˜βŒ‰ bits

β€’ How much information does β€œit will rain tomorrow” really contain?β€’ For a place that always rains, this contains no

information

β€’ The less likely it will rain, the more information (β€œsurprisal”) it contains

Page 5: Finite-blocklength schemes in information theory
Page 6: Finite-blocklength schemes in information theory

Self-information

β€’ For probability mass function 𝑝𝑋 of random variable 𝑋, the self-information of the value π‘₯ is

πœ„π‘‹ π‘₯ = log1

𝑝𝑋(π‘₯)

β€’ We use log to base 2 (unit is bit)

β€’ For joint pmf 𝑝𝑋,π‘Œ of a random variables 𝑋, π‘Œ,

πœ„π‘‹,π‘Œ π‘₯, 𝑦 = log1

𝑝𝑋,π‘Œ(π‘₯, 𝑦)

Page 7: Finite-blocklength schemes in information theory

Self-information

β€’ E.g. in English text, the most frequent letter is β€œe” (13%), and the least frequent letter is β€œz” (0.074%) (according to https://en.wikipedia.org/wiki/Letter_frequency)

β€’ Let 𝑋 ∈ {a,… , z} be a random letter

β€’ Have

πœ„π‘‹ e = log1

0.13β‰ˆ 2.94 bits

πœ„π‘‹ z = log1

0.00074β‰ˆ 10.40 bits

Page 8: Finite-blocklength schemes in information theory

Self-information - Properties

β€’ πœ„π‘‹ π‘₯ β‰₯ 0

β€’ If 𝑝𝑋 is the uniform distribution over [1. . π‘˜],πœ„π‘‹ π‘₯ = log π‘˜ for π‘₯ ∈ [1. . π‘˜]

β€’ (Invariant under relabeling) If 𝑓 is an injective function, then πœ„π‘“(𝑋) 𝑓(π‘₯) = πœ„π‘‹ π‘₯

β€’ (Additive) If 𝑋, π‘Œ are independent,πœ„π‘‹,π‘Œ π‘₯, 𝑦 = πœ„π‘‹ π‘₯ + πœ„π‘Œ 𝑦

Page 9: Finite-blocklength schemes in information theory

Information spectrum

β€’ If 𝑋 is a random variable, πœ„π‘‹ 𝑋 is random as well

β€’ Some values of 𝑋 may contain more information than others

β€’ The distribution of πœ„π‘‹ 𝑋 (or its cumulative distribution function) is called the information spectrum

β€’ πœ„π‘‹ 𝑋 is a constant if and only if 𝑋 follows a uniform distribution

β€’ Information spectrum is a probability distribution, which can be unwieldy

β€’ We sometimes want a single number to summarize the amount of information of 𝑋

Page 10: Finite-blocklength schemes in information theory

Entropy

β€’ The Shannon entropy

𝐻 𝑋 = 𝐻(𝑝𝑋) = 𝐄 πœ„π‘‹ 𝑋 =π‘₯𝑝𝑋(π‘₯) log

1

𝑝𝑋(π‘₯)is the average of the self-informationβ€’ A number (not random) that roughly corresponds to

the amount of information in 𝑋

β€’ Treat 0 log(1/0) = 0

β€’ Similarly the joint entropy of 𝑋 and π‘Œ is 𝐻 𝑋, π‘Œ = 𝐄 πœ„π‘‹,π‘Œ 𝑋, π‘Œ

Page 11: Finite-blocklength schemes in information theory

Entropy - Properties

β€’ 𝐻(𝑋) β‰₯ 0, and 𝐻 𝑋 = 0 iff 𝑋 is (almost surely) a constant

β€’ If 𝑋 ∈ [1. . π‘˜], then 𝐻 𝑋 ≀ log π‘˜β€’ Equality iff 𝑋 is uniform over [1. . π‘˜]β€’ Proof: Jensen’s ineq. on concave function 𝑧 ↦ 𝑧 log(1/𝑧)

β€’ If 𝑓 is a function, then 𝐻 𝑓(𝑋) ≀ 𝐻(𝑋)β€’ If 𝑓 is injective, equality holds (invariant under relabeling)

β€’ Consequences: 𝐻 𝑋, π‘Œ β‰₯ 𝐻(𝑋), 𝐻 𝑋, 𝑓 𝑋 = 𝐻(𝑋)

β€’ (Subadditive) 𝐻 𝑋, π‘Œ ≀ 𝐻 𝑋 + 𝐻(π‘Œ)β€’ Equality holds iff 𝑋, π‘Œ independent (additive)

β€’ 𝐻 𝑋 is concave in 𝑝𝑋

Page 12: Finite-blocklength schemes in information theory

A random English letter

β€’ Self-information ranges from πœ„π‘‹ e β‰ˆ 2.94 toπœ„π‘‹ z β‰ˆ 10.40

β€’ 𝐻 𝑋 β‰ˆ 4.18

𝑝𝑋(π‘₯)

πœ„π‘‹(π‘₯)

(according to https://en.wikipedia.org/wiki/Letter_frequency)

Page 13: Finite-blocklength schemes in information theory

Why is entropy a reasonable measure of information?

β€’ Axiomatic characterization:𝐻 𝑋 is the only measure that satisfies β€’ Subadditivity. 𝐻 𝑋, π‘Œ ≀ 𝐻 𝑋 + 𝐻 π‘Œ

β€’ Additivity. 𝐻 𝑋,π‘Œ = 𝐻 𝑋 + 𝐻 π‘Œ if 𝑋,π‘Œ independent

β€’ Invariant under relabeling and adding a zero mass

β€’ 𝐻 𝑋 is continuous in 𝑝𝑋‒ 𝐻 𝑋 = 1 when 𝑋~Unif{0,1}

[AczΓ©l, J., Forte, B., & Ng, C. T. (1974). Why the Shannon and Hartley entropies are 'natural’]

β€’ Operational characterizations:β€’ 𝐻 𝑋 is approximately the number of coin flips needed to

generate 𝑋 [D. E. Knuth & A. C. Yao. (1976). The complexity of nonuniform random number generation]

β€’ 𝐻 𝑋 is approximately the number of bits needed to compress 𝑋

Page 14: Finite-blocklength schemes in information theory

Information density

β€’ The information density between two random variables 𝑋, π‘Œ is

πœ„π‘‹;π‘Œ π‘₯; 𝑦 = πœ„π‘Œ 𝑦 βˆ’ πœ„π‘Œ|𝑋 𝑦 π‘₯

= log𝑝𝑋,π‘Œ π‘₯, 𝑦

𝑝𝑋 π‘₯ π‘π‘Œ 𝑦= log

π‘π‘Œ|𝑋(𝑦|π‘₯)

π‘π‘Œ(𝑦)β€’ πœ„π‘Œ 𝑦 is the info of π‘Œ = 𝑦 without knowing 𝑋 = π‘₯

β€’ πœ„π‘Œ|𝑋 𝑦 π‘₯ is the info of π‘Œ = 𝑦 after knowing 𝑋 = π‘₯

β€’ πœ„π‘‹;π‘Œ π‘₯; 𝑦 measures how much knowing 𝑋 = π‘₯ reduces the info of π‘Œ = 𝑦

β€’ Can be positive/negative/zero

β€’ Zero if 𝑋, π‘Œ independent

Page 15: Finite-blocklength schemes in information theory

Information density

β€’ πœ„π‘‹;π‘Œ π‘₯; 𝑦 = πœ„π‘Œ 𝑦 βˆ’ πœ„π‘Œ|𝑋 𝑦 π‘₯

= log𝑝𝑋,π‘Œ π‘₯,𝑦

𝑝𝑋 π‘₯ π‘π‘Œ 𝑦= log

π‘π‘Œ|𝑋(𝑦|π‘₯)

π‘π‘Œ(𝑦)

β€’ E.g. 𝑋, π‘Œ are the indicators of whether it rains today/tomorrow resp., with the following prob. matrix

β€’ πœ„π‘‹;π‘Œ 1; 1 = log0.2

0.3β‹…0.3β‰ˆ 1.15

β€’ Knowing it rains today decreases the info of β€œtomorrow will rain”

β€’ πœ„π‘‹;π‘Œ 1; 0 = log0.1

0.3β‹…0.7β‰ˆ βˆ’1.07

β€’ Knowing it rains today increases the info of β€œtomorrow will not rain”

π‘Œ = 0 π‘Œ = 1

𝑋 = 0 0.6 0.1

𝑋 = 1 0.1 0.2

Page 16: Finite-blocklength schemes in information theory

Mutual information

β€’ The mutual information between two random variables 𝑋, π‘Œ is

𝐼 𝑋; π‘Œ = 𝐄 πœ„π‘‹;π‘Œ 𝑋; π‘Œ

= 𝐄 log𝑝𝑋,π‘Œ 𝑋, π‘Œ

𝑝𝑋 𝑋 π‘π‘Œ π‘Œ= 𝐻 π‘Œ βˆ’ 𝐻 π‘Œ 𝑋= 𝐻 𝑋 + 𝐻 π‘Œ βˆ’ 𝐻(𝑋, π‘Œ)

β€’ Always nonnegative since 𝐻 π‘Œ β‰₯ 𝐻 π‘Œ 𝑋

β€’ Measures the dependency between 𝑋, π‘Œβ€’ Zero iff 𝑋, π‘Œ independent

Page 17: Finite-blocklength schemes in information theory

Source coding & channel coding

β€’ Source coding: compressing a source 𝑋~𝑝𝑋

β€’ Channel coding: transmitting a message 𝑀 through a noisy channel

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Enc𝑀 ∈ {1,… , π‘˜}

𝑋~𝑝𝑋 Dec 𝑋

Page 18: Finite-blocklength schemes in information theory

One-shot channel coding

β€’ Message 𝑀~Unif{1,… , π‘˜}

β€’ Encoder maps message to channel input 𝑋 = 𝑓(𝑀)β€’ The set π’ž = 𝑓 π‘š :π‘š ∈ 1,… , π‘˜ is the codebook

β€’ Its elements 𝑓 π‘š are called codewords

β€’ Channel output π‘Œ follows conditional distribution π‘π‘Œ|𝑋

β€’ Decoder maps π‘Œ to decoded message 𝑀 = 𝑔(π‘Œ)

β€’ Goal: error prob 𝐏( 𝑀 β‰  𝑀) is small

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Page 19: Finite-blocklength schemes in information theory

One-shot channel coding

β€’ Want 𝐏 𝑀 β‰  𝑀 ≀ πœ–

Thm [Yassaee et al. 2013]. Fix any 𝑝𝑋. There exists code with

𝐏 𝑀 β‰  𝑀 ≀ 1 βˆ’ 𝐄1

1 + π‘˜2βˆ’πœ„π‘‹;π‘Œ 𝑋;π‘Œ

≀ 𝐄 min{π‘˜2βˆ’πœ„π‘‹;π‘Œ 𝑋;π‘Œ , 1}where 𝑋, π‘Œ ~π‘π‘‹π‘π‘Œ|𝑋

[Yassaee, Aref, and Gohari, "A technique for deriving one-shot achievability results in network information theory," ISIT 2013.]

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Page 20: Finite-blocklength schemes in information theory

One-shot channel coding

β€’ Random codebook generation: generate𝑓 π‘š ~𝑝𝑋 i.i.d. for π‘š ∈ {1,… , π‘˜}

Given π‘Œ, the decoder:

β€’ (Maximum likelihood decoder) Find ΰ·π‘š that maximizes π‘π‘Œ|𝑋(π‘Œ|𝑓 ΰ·π‘š )β€’ Optimal – attains the lowest error prob. for a fixed 𝑓

β€’ (Stochastic likelihood decoder) Chooses ΰ·π‘š with prob.

𝐏 ΰ·π‘š π‘Œ =π‘π‘Œ|𝑋(π‘Œ|𝑓 ΰ·π‘š )

Οƒπ‘šβ€² π‘π‘Œ|𝑋(π‘Œ|𝑓 π‘šβ€² )=

2πœ„π‘‹;π‘Œ(𝑓 ΰ·π‘š ;π‘Œ)

Οƒπ‘šβ€² 2πœ„π‘‹;π‘Œ(𝑓 π‘šβ€² ;π‘Œ)

[Yassaee-Aref-Gohari 2013]

Page 21: Finite-blocklength schemes in information theory

β€’ 𝐏 ΰ·π‘š π‘Œ =2πœ„π‘‹;π‘Œ(𝑓 ΰ·žπ‘š ;π‘Œ)

Οƒπ‘šβ€² 2

πœ„π‘‹;π‘Œ(𝑓 π‘šβ€² ;π‘Œ)[Yassaee-Aref-Gohari 2013]

𝐏 𝑀 = 𝑀

= π„π’ž1

π‘˜Οƒπ‘š,π‘¦π‘π‘Œ|𝑋(𝑦|𝑓(π‘š))

2πœ„π‘‹;π‘Œ(𝑓 π‘š ;𝑦)

Οƒπ‘šβ€² 2

πœ„π‘‹;π‘Œ(𝑓 π‘šβ€² ;𝑦)

= π„π’ž Οƒπ‘¦π‘π‘Œ|𝑋(𝑦|𝑓(1))2πœ„π‘‹;π‘Œ(𝑓 1 ;𝑦)

Οƒπ‘šβ€² 2

πœ„π‘‹;π‘Œ(𝑓 π‘šβ€² ;𝑦)(Symmetry)

= σ𝑦𝐄𝑓(1)𝐄𝑓 2 ,…,𝑓(π‘˜) π‘π‘Œ|𝑋(𝑦|𝑓(1))2πœ„π‘‹;π‘Œ(𝑓 1 ;𝑦)

2πœ„π‘‹;π‘Œ(𝑓 1 ;𝑦)+Οƒπ‘šβ€²β‰ 1

2πœ„π‘‹;π‘Œ(𝑓 π‘šβ€² ;𝑦)

β‰₯ σ𝑦 𝐄𝑓(1) π‘π‘Œ|𝑋(𝑦|𝑓(1))2πœ„π‘‹;π‘Œ(𝑓 1 ;𝑦)

2πœ„π‘‹;π‘Œ(𝑓 1 ;𝑦)+π‘˜βˆ’1(Jensen)

β‰₯ σ𝑦 𝐄𝑓(1) π‘π‘Œ|𝑋(𝑦|𝑓(1))1

1+π‘˜2βˆ’πœ„π‘‹;π‘Œ(𝑓 1 ;𝑦)

= σ𝑦σπ‘₯ 𝑝𝑋(π‘₯) π‘π‘Œ|𝑋(𝑦|π‘₯)1

1+π‘˜2βˆ’πœ„π‘‹;π‘Œ(π‘₯;𝑦)

= 𝐄1

1+π‘˜2βˆ’πœ„π‘‹;π‘Œ 𝑋;π‘Œ

Page 22: Finite-blocklength schemes in information theory

Asymptotic channel coding

β€’ Memoryless: π‘π‘Œπ‘›|𝑋𝑛 𝑦𝑛 π‘₯𝑛 = ς𝑖=1𝑛 π‘π‘Œ|𝑋(𝑦𝑖|π‘₯𝑖)

β€’ Applying one-shot:

𝑃𝑒 = 𝐏 𝑀 β‰  𝑀 ≀ 𝐄 min{2π‘›π‘…βˆ’Οƒπ‘–=1𝑛 πœ„π‘‹;π‘Œ 𝑋𝑖;π‘Œπ‘– , 1} ,

where 𝑋𝑖 , π‘Œπ‘– ~π‘π‘‹π‘π‘Œ|𝑋 i.i.d. for 𝑖 = 1, … , 𝑛

β€’ Asymptotic (𝑛 β†’ ∞): haveσ𝑖=1𝑛 πœ„π‘‹;π‘Œ 𝑋𝑖; π‘Œπ‘– β‰ˆ 𝑛𝐼(𝑋; π‘Œ) by law of large numbers,

so 𝑃𝑒 β†’ 0 if 𝑅 < 𝐼 𝑋; π‘Œ

β€’ Recovers (achievability part of) Shannon’s channel coding theorem: Channel capacity is

𝐢 = max𝑝𝑋

𝐼(𝑋; π‘Œ)

Enc𝑀~Unif{1,… , 2𝑛𝑅}𝑋𝑛

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œπ‘›

Page 23: Finite-blocklength schemes in information theory

Codebook as a black box

β€’ Random codebook: π’ž = {𝑓 π‘š }~𝑝𝑋 i.i.d. for π‘š ∈ {1,… , π‘˜}

β€’ Decoder: Find ΰ·π‘š = argmax 𝑝𝑋|π‘Œ 𝑓 ΰ·π‘š π‘Œ /𝑝𝑋(𝑓 ΰ·π‘š )

β€’ Treat codebook π’ž as a box:β€’ Operation 1: Query 𝑀, get 𝑋~𝑝𝑋‒ Operation 2: Query posterior distribution 𝑝𝑋|π‘Œ, get 𝑀

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Box

𝑀 𝑋~𝑝𝑋 𝑝𝑋|π‘Œ 𝑀

Page 24: Finite-blocklength schemes in information theory

A general black box

β€’ Consider random variable π‘ˆ

β€’ Only one operation: Query distribution 𝑄, get π‘ˆ~𝑄

β€’ Want box to have β€œmemory”‒ If we query the same 𝑄 twice, should get the same π‘ˆ

β€’ If we query similar 𝑄1, 𝑄2, then π‘ˆ1, π‘ˆ2 are equal with high probability

Magic box!𝑄 π‘ˆ~𝑄

Page 25: Finite-blocklength schemes in information theory

Using the general black box

β€’ Let π‘ˆ = (𝑋,𝑀)

β€’ Encoding: Query 𝑄 = 𝑃𝑋 Γ— π›Ώπ‘š (π›Ώπ‘š is degenerate distribution 𝐏 𝑀 = π‘š = 1), get (𝑋,π‘š)

β€’ Decoding: Query 𝑄 = 𝑃𝑋|π‘Œ Γ— 𝑃𝑀 (𝑃𝑀 is Unif{1,… , π‘˜}), get ( 𝑋, ΰ·π‘š)

β€’ Input partial knowledge into box, get full knowledge

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Magic box about (𝑋,𝑀)

𝑄 = 𝑃𝑋 Γ— π›Ώπ‘š 𝑋~𝑝𝑋 𝑄 = 𝑝𝑋|π‘Œ Γ— 𝑃𝑀 𝑀

Page 26: Finite-blocklength schemes in information theory

How to build the box

β€’ Operation: Query distribution 𝑄, get π‘ˆ~𝑄‒ Memory: If we query similar 𝑄1, 𝑄2, then π‘ˆ1, π‘ˆ2 are equal

with high probability

β€’ Attempt 1: Generate π‘ˆ~𝑄 afresh for each query?β€’ Does not have memory!

β€’ Attempt 2: Generate random seed 𝑍 at the beginning, then use the same seed to generate all π‘ˆ~𝑄 ?β€’ Only guarantees to give the same π‘ˆ for the same 𝑄‒ No guarantee for similar but different 𝑄1, 𝑄2

β€’ Need a way to generate π‘ˆ that is not sensitive to small changes to 𝑄

Magic box!𝑄 π‘ˆ~𝑄

Page 27: Finite-blocklength schemes in information theory

How to build the boxβ€’ Generate random seed 𝑍 at the beginning, then use

the same seed to generate all π‘ˆ~𝑄 ?

β€’ Exponential distribution with rate πœ†Exp(πœ†) has prob. density function

𝑓 𝑧; πœ† = πœ†π‘’βˆ’πœ†π‘§ for 𝑧 β‰₯ 0β€’ If 𝑍~Exp(πœ†), then π‘Žπ‘~Exp(πœ†/π‘Ž)

β€’ For 𝑍𝑖~Exp(πœ†π‘–) indep. for 𝑖 = 1, … , 𝑙, have

𝐏 argmin𝑖𝑍𝑖 = 𝑗 =πœ†π‘—

πœ†1 +β‹―+ πœ†π‘™

β€’ Let 𝑍 = 𝑍1, … , 𝑍𝑙 be the seed, 𝑍𝑒~Exp(1) i.i.d.

β€’ Query 𝑄, output π‘ˆ = argmin𝑒𝑍𝑒

𝑄(𝑒)

Magic box!𝑄 π‘ˆ~𝑄

C. T. Li and A. El Gamal. Strong functional representation lemma and applications to coding theorems. IEEE Trans. Inf. Theory, 64(11):6967–6978, 2018.

C. T. Li and V. Anantharam, "A Unified Framework for One-Shot Achievability via the Poisson Matching Lemma," IEEE Trans. Inf. Theory, vol. 67, no. 5,

pp. 2624-2651, 2021.

Page 28: Finite-blocklength schemes in information theory

How to build the box

β€’ Let 𝑍 = 𝑍1, … , 𝑍𝑙 be the seed, 𝑍𝑖~Exp(1) i.i.d.

β€’ Query 𝑄, output π‘ˆ = argmin𝑒𝑍𝑒

𝑄(𝑒)

β€’ 𝐏 π‘ˆ = 𝑒 =𝑄(𝑒)

𝑄(1)+β‹―+𝑄(𝑙)= 𝑄(𝑒) OK!

β€’ Give same π‘ˆ for same 𝑄 since π‘ˆ is a function of 𝑄 and 𝑍 (fixed at the beginning) OK!

β€’ Small changes to 𝑄 is unlikely to affect

argmin𝑒𝑍𝑒

𝑄(𝑒)OK!

Magic box!𝑄 π‘ˆ~𝑄

Page 29: Finite-blocklength schemes in information theory

How to build the box

β€’ Let 𝑍 = 𝑍1, … , 𝑍𝑙 be the seed, 𝑍𝑖~Exp(1) i.i.d.

β€’ Query 𝑄, output π‘ˆ = argmin𝑒𝑍𝑒

𝑄(𝑒)

β€’ If 𝑙 = 2, then π‘ˆ = 1 iff𝑍1

𝑄(1)<

𝑍2

𝑄(2)⇔

𝑍1

𝑍1+𝑍2< 𝑄(1)

Magic box!𝑄 π‘ˆ~𝑄

𝑍1𝑍1 + 𝑍2

0

1

𝑄 1 = 𝐏𝑋~𝑄(𝑋 = 1)

π‘ˆ = 1

1

π‘ˆ = 2

Page 30: Finite-blocklength schemes in information theory

Poisson matching lemma

β€’ Let 𝑍 = 𝑍1, … , 𝑍𝑙 be the seed, 𝑍𝑖~Exp(1) i.i.d.

β€’ Query 𝑄, output π‘ˆπ‘„ = argmin𝑒𝑍𝑒

𝑄(𝑒)

β€’ Poisson matching lemma [Li-Anantharam 2018]:If we query 𝑃, 𝑄 to get π‘ˆπ‘ƒ, π‘ˆπ‘„ respectively, then

𝐏 π‘ˆπ‘„ β‰  π‘ˆπ‘ƒ π‘ˆπ‘ƒ ≀𝑃(π‘ˆπ‘ƒ)

𝑄(π‘ˆπ‘ƒ)

C. T. Li and A. El Gamal. Strong functional representation lemma and applications to coding theorems. IEEE Trans. Inf. Theory, 64(11):6967–6978, 2018.

C. T. Li and V. Anantharam, "A Unified Framework for One-Shot Achievability via the Poisson Matching Lemma," IEEE Trans. Inf. Theory, vol. 67, no. 5,

pp. 2624-2651, 2021.

Page 31: Finite-blocklength schemes in information theory

A general black box

β€’ Operation: Query distribution 𝑄, get π‘ˆ~𝑄

β€’ Guarantee: If we query 𝑃, 𝑄 to get π‘ˆπ‘ƒ, π‘ˆπ‘„respectively, then

𝐏 π‘ˆπ‘„ β‰  π‘ˆπ‘ƒ π‘ˆπ‘ƒ ≀𝑃(π‘ˆπ‘ƒ)

𝑄(π‘ˆπ‘ƒ)

β€’ We can use this box alone to prove many tight one-shot/finite-blocklength/asymptotic coding results

Magic box!𝑄 π‘ˆ~𝑄

Page 32: Finite-blocklength schemes in information theory

β€’ Let π‘ˆ = (𝑋,𝑀)

β€’ Encoding: Query 𝑄 = 𝑃𝑋 Γ— 𝛿𝑀, get (𝑋,𝑀)

β€’ Decoding: Query 𝑄 = 𝑃𝑋|π‘Œ Γ— 𝑃𝑀, get ( 𝑋, 𝑀)

β€’ Poisson matching lemma:𝐏 𝑀 β‰  𝑀 ≀ 𝐄 𝐏 𝑀 β‰  𝑀 𝑀,𝑋, π‘Œ

≀ 𝐄 min(𝑃𝑋×𝛿𝑀)(𝑋,𝑀)

(𝑃𝑋|π‘ŒΓ—π‘ƒπ‘€)(𝑋,𝑀), 1

= 𝐄 min𝑃𝑋(𝑋)

𝑃𝑋|π‘Œ(𝑋|π‘Œ)/π‘˜, 1

= 𝐄 min π‘˜2βˆ’πœ„π‘‹;π‘Œ(𝑋;π‘Œ), 1

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Magic box about (𝑋,𝑀)

𝑄 = 𝑃𝑋 Γ— 𝛿𝑀 𝑋~𝑝𝑋 𝑄 = 𝑃𝑋|π‘Œ Γ— 𝑃𝑀 𝑀

C. T. Li and V. Anantharam, "A Unified Framework for

One-Shot Achievability via the Poisson Matching Lemma,"

IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 2624-2651, 2021.

Page 33: Finite-blocklength schemes in information theory

Channel coding – removing the box

β€’ The box contains a random seed in it

β€’ In reality, encoder and decoder cannot share common randomness

β€’ 𝑃𝑒 ≀ 𝐄 min π‘˜2βˆ’πœ„π‘‹;π‘Œ(𝑋;π‘Œ), 1 averaged over choices of seed

β€’ There exists fixed seed s.t. 𝑃𝑒 ≀ 𝐄 min π‘˜2βˆ’πœ„π‘‹;π‘Œ(𝑋;π‘Œ), 1

Enc𝑀~Unif{1,… , π‘˜}𝑋

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œ

Fixed box about (𝑋,𝑀)

𝑄 = 𝑃𝑋 Γ— 𝛿𝑀 𝑋~𝑝𝑋 𝑄 = 𝑃𝑋|π‘Œ Γ— 𝑃𝑀 𝑀

Page 34: Finite-blocklength schemes in information theory

Second-order asymptotics

β€’ 𝑃𝑒 ≀ 𝐄 min 2πΏβˆ’Οƒπ‘–=1𝑛 πœ„π‘‹;π‘Œ 𝑋𝑖;π‘Œπ‘– , 1 , 𝑋𝑖 , π‘Œπ‘– ~π‘π‘‹π‘π‘Œ|𝑋 i.i.d.

β€’ 𝑃𝑒 β‰ˆ 0 if 𝐿 β‰ͺ σ𝑖=1𝑛 πœ„ 𝑋𝑖; π‘Œπ‘– , 𝑃𝑒 β‰ˆ 1 if 𝐿 ≫ σ𝑖=1

𝑛 πœ„ 𝑋𝑖; π‘Œπ‘–β€’ First-order: optimal 𝐿 β‰ˆ 𝑛𝐼(𝑋; π‘Œ)

β€’ Central limit theorem:σ𝑖=1𝑛 πœ„ 𝑋𝑖; π‘Œπ‘– approximately follows 𝑁(𝑛𝐼 𝑋; π‘Œ , 𝑛𝑉), where

𝑉 = Var[πœ„ 𝑋; π‘Œ ]

β€’ For a fixed 𝑃𝑒 = πœ–, optimal 𝐿 β‰ˆ 𝑛𝐼 𝑋; π‘Œ βˆ’ π‘›π‘‰π‘„βˆ’1 πœ–where π‘„βˆ’1 πœ– is the inverse of the Q-function(𝑄 𝛾 = 1 βˆ’ Ξ¦(𝛾), Ξ¦ is the cdf of 𝑁(0,1))

β€’ The 𝑉 when 𝑝𝑋 is the capacity-achieving distribution (that maximizes 𝐼(𝑋; π‘Œ)) is called the channel dispersion

Enc𝑀~Unif{1,… , 2𝐿}𝑋𝑛

Dec 𝑀Channelπ‘π‘Œ|𝑋

π‘Œπ‘›

Y. Polyanskiy, H. V. Poor, and S. VerdΓΊ, β€œChannel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no.

5, pp. 2307–2359, 2010.

Page 35: Finite-blocklength schemes in information theory

σ𝑖=1𝑛 πœ„ 𝑋𝑖; π‘Œπ‘–

𝑛𝐼(𝑋; π‘Œ)

𝛾 𝑛𝑉

Fixed error prob.

cutoff point

(second order)

𝑠𝑑 =

𝑛𝑉

Error prob. β‰ˆπ(σ𝑖=1

𝑛 πœ„ 𝑋𝑖; π‘Œπ‘– ≀ 𝐿)