Nonstochastic Information Theory for Feedback Controldisc-cps15.imtlucca.it/pdf/Nair.pdfk k k k k k...

Post on 28-Feb-2021

1 views 0 download

Transcript of Nonstochastic Information Theory for Feedback Controldisc-cps15.imtlucca.it/pdf/Nair.pdfk k k k k k...

Nonstochastic Information Theoryfor Feedback Control

Girish Nair

Department of Electrical and Electronic EngineeringUniversity of Melbourne

Australia

Dutch Institute of Systems and Control Summer SchoolZandvoort aan ZeeThe Netherlands

June 2015

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 1 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 2 / 68

What is Information?

Wikipedia:Information is that which informsi.e. that from which data and knowledge can be derived(as data represents values attributed to parameters,while knowledge is acquired through understanding ofreal things or abstract concepts, in any particular field ofstudy).

In [Shannon BSTJ 1948], information was somewhat moreconcretely defined, within a probability space.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 3 / 68

What is Information?

Wikipedia:Information is that which informsi.e. that from which data and knowledge can be derived(as data represents values attributed to parameters,while knowledge is acquired through understanding ofreal things or abstract concepts, in any particular field ofstudy).

In [Shannon BSTJ 1948], information was somewhat moreconcretely defined, within a probability space.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 3 / 68

What is Information?

Wikipedia:Information is that which informsi.e. that from which data and knowledge can be derived(as data represents values attributed to parameters,while knowledge is acquired through understanding ofreal things or abstract concepts, in any particular field ofstudy).

In [Shannon BSTJ 1948], information was somewhat moreconcretely defined, within a probability space.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 3 / 68

What is Information?

Wikipedia:Information is that which informsi.e. that from which data and knowledge can be derived(as data represents values attributed to parameters,while knowledge is acquired through understanding ofreal things or abstract concepts, in any particular field ofstudy).

In [Shannon BSTJ 1948], information was somewhat moreconcretely defined, within a probability space.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 3 / 68

Shannon Entropy

Prior uncertainty or entropy of a discrete random variable (rv)X ⇠ pX

H[X ] := Elog2

✓1

pX (X )

◆�=�Â

xpX (x) log2 pX (x)� 0.

Minimum expected no. yes/no questions sufficient to determine X .Joint (discrete) entropy H[X ,Y ] defined by replacing pX with pX ,Y .Conditional entropy of X given Yis average uncertainty in X given Y :

H[X |Y ] := Elog2

✓1

pX |Y (X |Y )

◆�⌘ H[X ,Y ]�H[Y ] (� 0).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 4 / 68

Shannon Entropy

Prior uncertainty or entropy of a discrete random variable (rv)X ⇠ pX

H[X ] := Elog2

✓1

pX (X )

◆�=�Â

xpX (x) log2 pX (x)� 0.

Minimum expected no. yes/no questions sufficient to determine X .Joint (discrete) entropy H[X ,Y ] defined by replacing pX with pX ,Y .Conditional entropy of X given Yis average uncertainty in X given Y :

H[X |Y ] := Elog2

✓1

pX |Y (X |Y )

◆�⌘ H[X ,Y ]�H[Y ] (� 0).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 4 / 68

Shannon Entropy

Prior uncertainty or entropy of a discrete random variable (rv)X ⇠ pX

H[X ] := Elog2

✓1

pX (X )

◆�=�Â

xpX (x) log2 pX (x)� 0.

Minimum expected no. yes/no questions sufficient to determine X .Joint (discrete) entropy H[X ,Y ] defined by replacing pX with pX ,Y .Conditional entropy of X given Yis average uncertainty in X given Y :

H[X |Y ] := Elog2

✓1

pX |Y (X |Y )

◆�⌘ H[X ,Y ]�H[Y ] (� 0).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 4 / 68

Shannon Information

Information gained about X from Y := Reduction in uncertainty:

I[X ;Y ] := H[X ]�H[X |Y ].

Called mutual information since symmetric:

I[X ;Y ] = �Âx ,y

pX ,Y (x ,y) log2

✓pX (x)pY (y)pX ,Y (x ,y)

⌘ H[X ]+H[Y ]�H[X ,Y ].

For continuous X ,Y , replace pmf’s p with corresponding pdf’s f .

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 5 / 68

Shannon Information

Information gained about X from Y := Reduction in uncertainty:

I[X ;Y ] := H[X ]�H[X |Y ].

Called mutual information since symmetric:

I[X ;Y ] = �Âx ,y

pX ,Y (x ,y) log2

✓pX (x)pY (y)pX ,Y (x ,y)

⌘ H[X ]+H[Y ]�H[X ,Y ].

For continuous X ,Y , replace pmf’s p with corresponding pdf’s f .

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 5 / 68

Shannon Information

Information gained about X from Y := Reduction in uncertainty:

I[X ;Y ] := H[X ]�H[X |Y ].

Called mutual information since symmetric:

I[X ;Y ] = �Âx ,y

pX ,Y (x ,y) log2

✓pX (x)pY (y)pX ,Y (x ,y)

⌘ H[X ]+H[Y ]�H[X ,Y ].

For continuous X ,Y , replace pmf’s p with corresponding pdf’s f .

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 5 / 68

Codes for Stationary Memoryless Random Channels

Decoder Channel kY

Coder kX

n

kkknnXY

XY

xyqxyp

xyqxyp

nn

kk

0:0:0|

|

)|()|(

)|()|(

:0:0

M Message

A block code is defined byan error tolerance e > 0, block length n+1 2 Nand message-set cardinality µ � 1;an encoder mapping g s.t. for any independent, uniformlydistributed message M 2 {m1, . . . ,mµ},

X0:n = g(i) if M = mi ;

and a decoder M̂ = d (Y0:n) s.t. Pr[M̂ 6= M] e .Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 6 / 68

Codes for Stationary Memoryless Random Channels

Decoder Channel kY

Coder kX

n

kkknnXY

XY

xyqxyp

xyqxyp

nn

kk

0:0:0|

|

)|()|(

)|()|(

:0:0

M Message

A block code is defined byan error tolerance e > 0, block length n+1 2 Nand message-set cardinality µ � 1;an encoder mapping g s.t. for any independent, uniformlydistributed message M 2 {m1, . . . ,mµ},

X0:n = g(i) if M = mi ;

and a decoder M̂ = d (Y0:n) s.t. Pr[M̂ 6= M] e .Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 6 / 68

Codes for Stationary Memoryless Random Channels

Decoder Channel kY

Coder kX

n

kkknnXY

XY

xyqxyp

xyqxyp

nn

kk

0:0:0|

|

)|()|(

)|()|(

:0:0

M Message

A block code is defined byan error tolerance e > 0, block length n+1 2 Nand message-set cardinality µ � 1;an encoder mapping g s.t. for any independent, uniformlydistributed message M 2 {m1, . . . ,mµ},

X0:n = g(i) if M = mi ;

and a decoder M̂ = d (Y0:n) s.t. Pr[M̂ 6= M] e .Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 6 / 68

Codes for Stationary Memoryless Random Channels

Decoder Channel kY

Coder kX

n

kkknnXY

XY

xyqxyp

xyqxyp

nn

kk

0:0:0|

|

)|()|(

)|()|(

:0:0

M Message

A block code is defined byan error tolerance e > 0, block length n+1 2 Nand message-set cardinality µ � 1;an encoder mapping g s.t. for any independent, uniformlydistributed message M 2 {m1, . . . ,mµ},

X0:n = g(i) if M = mi ;

and a decoder M̂ = d (Y0:n) s.t. Pr[M̂ 6= M] e .Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 6 / 68

Capacity and Information

Define (ordinary) capacity C operationally as the highest block-codingrate that yields vanishing error probability:

C := lime!0

supn,µ2N,g,d

log2 µn+1

= lime!0

limn!•

supµ2N,g,d

log2 µn+1

.

Shannon showed that capacity can also be thought of intrinsically, asthe maximum information rate across channel:

Theorem (Shannon BSTJ 1948)

C = supn�0,pX0:n

I[X0:n;Y0:n]

n+1

✓= sup

pX

I[X ;Y ]

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 7 / 68

Capacity and Information

Define (ordinary) capacity C operationally as the highest block-codingrate that yields vanishing error probability:

C := lime!0

supn,µ2N,g,d

log2 µn+1

= lime!0

limn!•

supµ2N,g,d

log2 µn+1

.

Shannon showed that capacity can also be thought of intrinsically, asthe maximum information rate across channel:

Theorem (Shannon BSTJ 1948)

C = supn�0,pX0:n

I[X0:n;Y0:n]

n+1

✓= sup

pX

I[X ;Y ]

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 7 / 68

Networked State Estimation/Control

Classical assumption: controllers and estimators knew plantoutputs perfectly.Since the 60’s this assumption has been challenged:

I Delays, due to latency and intermittent channel access, in controlarea networks.

I Quantisation errors in digital control,I Finite communication capacity per sensor in long-range radar

surveillance networks

Focus here on limited quantiser resolution and capacity.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 8 / 68

Estimation/Control over Communication Channels

kU kY Decoder/ Estimator

kX̂Channel

kQQuantiser/ Coder

kS

kkkk

kkk

VBUAXXWGXY

�� �

�1

,

kk WV , Noise

kkkk

kkk

VBUAXXWGXY

�� �

�1

, kU kY Decoder/ Controller

Channel kQ Quantiser/ Coder

kS

kk WV , Noise

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 9 / 68

Additive Noise Model

Early work considered errorless digital channels and staticquantisers, with uniform quantiser errors modelled as additive,uncorrelated noise [e.g. Curry 1970] with variance µ 2�2R (R = bitrate).Good approximation for stable plants and high R, and allowslinear stochastic estimation/control theory to be applied.However, for unstable plants it leads to conclusions that arewrong, e.g.

I if plant is noiseless and unstable, then states/estimation errorscannot converge to 0;

I and if plant is unstable, then mean-square-boundedstates/estimation errors can always be achieved.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 10 / 68

Additive Noise Model

Early work considered errorless digital channels and staticquantisers, with uniform quantiser errors modelled as additive,uncorrelated noise [e.g. Curry 1970] with variance µ 2�2R (R = bitrate).Good approximation for stable plants and high R, and allowslinear stochastic estimation/control theory to be applied.However, for unstable plants it leads to conclusions that arewrong, e.g.

I if plant is noiseless and unstable, then states/estimation errorscannot converge to 0;

I and if plant is unstable, then mean-square-boundedstates/estimation errors can always be achieved.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 10 / 68

Additive Noise Model

Early work considered errorless digital channels and staticquantisers, with uniform quantiser errors modelled as additive,uncorrelated noise [e.g. Curry 1970] with variance µ 2�2R (R = bitrate).Good approximation for stable plants and high R, and allowslinear stochastic estimation/control theory to be applied.However, for unstable plants it leads to conclusions that arewrong, e.g.

I if plant is noiseless and unstable, then states/estimation errorscannot converge to 0;

I and if plant is unstable, then mean-square-boundedstates/estimation errors can always be achieved.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 10 / 68

Additive Noise Model

Early work considered errorless digital channels and staticquantisers, with uniform quantiser errors modelled as additive,uncorrelated noise [e.g. Curry 1970] with variance µ 2�2R (R = bitrate).Good approximation for stable plants and high R, and allowslinear stochastic estimation/control theory to be applied.However, for unstable plants it leads to conclusions that arewrong, e.g.

I if plant is noiseless and unstable, then states/estimation errorscannot converge to 0;

I and if plant is unstable, then mean-square-boundedstates/estimation errors can always be achieved.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 10 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 11 / 68

Noisy Channels

‘Stable’ states/estimation errors possible iff a suitable channelfigure-of-merit (FoM) satisfies

FoM > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Unlike noiseless channel case, FoM depends critically on stabilitynotion and noise model:

I FoM = C - states/est. errors ! 0 almost surely (a.s.) [Matveev-SavkinSIAM07], or mean-square bounded (MSB) states over AWGNchannel [Braslavsky et al. TAC07]

I FoM = Cany - MSB states over random discrete memorylesschannels [Sahai-Mitter TIT06]

I FoM = C0f for control or C0 for state estimation, with a.s. boundedstates/est. errors [Matveev-Savkin IJC07]

As C � Cany � C0f � C0, these criteria generally do not coincide.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 12 / 68

Noisy Channels

‘Stable’ states/estimation errors possible iff a suitable channelfigure-of-merit (FoM) satisfies

FoM > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Unlike noiseless channel case, FoM depends critically on stabilitynotion and noise model:

I FoM = C - states/est. errors ! 0 almost surely (a.s.) [Matveev-SavkinSIAM07], or mean-square bounded (MSB) states over AWGNchannel [Braslavsky et al. TAC07]

I FoM = Cany - MSB states over random discrete memorylesschannels [Sahai-Mitter TIT06]

I FoM = C0f for control or C0 for state estimation, with a.s. boundedstates/est. errors [Matveev-Savkin IJC07]

As C � Cany � C0f � C0, these criteria generally do not coincide.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 12 / 68

Noisy Channels

‘Stable’ states/estimation errors possible iff a suitable channelfigure-of-merit (FoM) satisfies

FoM > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Unlike noiseless channel case, FoM depends critically on stabilitynotion and noise model:

I FoM = C - states/est. errors ! 0 almost surely (a.s.) [Matveev-SavkinSIAM07], or mean-square bounded (MSB) states over AWGNchannel [Braslavsky et al. TAC07]

I FoM = Cany - MSB states over random discrete memorylesschannels [Sahai-Mitter TIT06]

I FoM = C0f for control or C0 for state estimation, with a.s. boundedstates/est. errors [Matveev-Savkin IJC07]

As C � Cany � C0f � C0, these criteria generally do not coincide.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 12 / 68

Noisy Channels

‘Stable’ states/estimation errors possible iff a suitable channelfigure-of-merit (FoM) satisfies

FoM > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Unlike noiseless channel case, FoM depends critically on stabilitynotion and noise model:

I FoM = C - states/est. errors ! 0 almost surely (a.s.) [Matveev-SavkinSIAM07], or mean-square bounded (MSB) states over AWGNchannel [Braslavsky et al. TAC07]

I FoM = Cany - MSB states over random discrete memorylesschannels [Sahai-Mitter TIT06]

I FoM = C0f for control or C0 for state estimation, with a.s. boundedstates/est. errors [Matveev-Savkin IJC07]

As C � Cany � C0f � C0, these criteria generally do not coincide.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 12 / 68

Noisy Channels

‘Stable’ states/estimation errors possible iff a suitable channelfigure-of-merit (FoM) satisfies

FoM > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Unlike noiseless channel case, FoM depends critically on stabilitynotion and noise model:

I FoM = C - states/est. errors ! 0 almost surely (a.s.) [Matveev-SavkinSIAM07], or mean-square bounded (MSB) states over AWGNchannel [Braslavsky et al. TAC07]

I FoM = Cany - MSB states over random discrete memorylesschannels [Sahai-Mitter TIT06]

I FoM = C0f for control or C0 for state estimation, with a.s. boundedstates/est. errors [Matveev-Savkin IJC07]

As C � Cany � C0f � C0, these criteria generally do not coincide.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 12 / 68

Missing Information

If the goal is MSB or a.s. convergence ! 0 of states/estimationerrors, then information theory is crucial for finding lower bounds.However, when the goal is a.s. bounded states/errors, classicalinformation theory has played no role so far in networkedestimation/control.Yet information in some sense must be flowing across thechannel, even without a probabilistic model/objective.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 13 / 68

Questions

Is there a meaningful theory of information for nonrandomvariables?Can we construct an information-theoretic basis for networkedestimation/control with nonrandom noise?Are there intrinsic, information-theoretic interpretations of C0 andC0f ?

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 14 / 68

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 15 / 68

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 15 / 68

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 15 / 68

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 15 / 68

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 15 / 68

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 15 / 68

Why Nonstochastic Anyway? (cont.)

Related to the previous points,In most digital comm. systems, bit periods Tb ⇡ 2⇥10�5s orshorter.) Thermal and shot noise (s µ

pTb) noticeable compared to

detected signal amplitudes (µ Tb).Control systems typically operate with longer sample or bitperiods, 10�2 or 10�3s.) Thermal/shot noise negligible compared to signal amplitudes.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 16 / 68

Why Nonstochastic Anyway? (cont.)

Related to the previous points,In most digital comm. systems, bit periods Tb ⇡ 2⇥10�5s orshorter.) Thermal and shot noise (s µ

pTb) noticeable compared to

detected signal amplitudes (µ Tb).Control systems typically operate with longer sample or bitperiods, 10�2 or 10�3s.) Thermal/shot noise negligible compared to signal amplitudes.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 16 / 68

Why Nonstochastic Anyway? (cont.)

For safety or mission-critical reasons, stability and performanceguarantees often required every time a control system is used, ifdisturbances within rated bounds.Especially if plant is unstable or marginally stable.Or if we wish to interconnect several control systems and still besure of performance.In contrast, most consumer-oriented communications requiresgood performance only on average, or with high probability.Occasional violations of specifications permitted, and cannot beprevented within a probabilistic framework.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 17 / 68

Why Nonstochastic Anyway? (cont.)

For safety or mission-critical reasons, stability and performanceguarantees often required every time a control system is used, ifdisturbances within rated bounds.Especially if plant is unstable or marginally stable.Or if we wish to interconnect several control systems and still besure of performance.In contrast, most consumer-oriented communications requiresgood performance only on average, or with high probability.Occasional violations of specifications permitted, and cannot beprevented within a probabilistic framework.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 17 / 68

Why Nonstochastic Anyway? (cont.)

For safety or mission-critical reasons, stability and performanceguarantees often required every time a control system is used, ifdisturbances within rated bounds.Especially if plant is unstable or marginally stable.Or if we wish to interconnect several control systems and still besure of performance.In contrast, most consumer-oriented communications requiresgood performance only on average, or with high probability.Occasional violations of specifications permitted, and cannot beprevented within a probabilistic framework.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 17 / 68

Probability in Practice

‘If there’s a fifty-fifty chance that something can go wrong,nine out of ten times, it will.’

(attrib. L. ‘Yogi’ Berra, former US baseball player)

(Photo from Wikipedia)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 18 / 68

Uncertain Variable Formalism

Define an uncertain variable (uv) X to be a mapping from anunderlying sample space ⌦ to a space X.Each w 2 ⌦ may represent a specific combination of noise/inputsignals into a system, and X may represent a state/outputvariable.For a given w, x = X (w) is the realisation of X .

Unlike probability theory, no s -algebra ⇢ 2⌦ or measure on ⌦ isimposed.Assume ⌦ is uncountable to accommodate continuous X.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 19 / 68

Uncertain Variable Formalism

Define an uncertain variable (uv) X to be a mapping from anunderlying sample space ⌦ to a space X.Each w 2 ⌦ may represent a specific combination of noise/inputsignals into a system, and X may represent a state/outputvariable.For a given w, x = X (w) is the realisation of X .

Unlike probability theory, no s -algebra ⇢ 2⌦ or measure on ⌦ isimposed.Assume ⌦ is uncountable to accommodate continuous X.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 19 / 68

UV Formalism- Ranges and Conditioning

Marginal range JX K := {X (w) : w 2 ⌦}✓ X.Joint range JX ,Y K := {(X (w),Y (w)) : w 2 ⌦}✓ X⇥Y.Conditional range JX |yK := {X (w) : Y (w) = y ,w 2 ⌦}.

In the absence of statistical structure, the joint range fullycharacterises the relationship between X and Y . Note

JX ,Y K =[

y2JY KJX |yK⇥{y},

i.e. joint range is given by the conditional and marginal, analogously toprobability theory.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 20 / 68

Independence Without ProbabilityDefinition

The uv’s X ,Y are called (mutually) unrelated if

JX ,Y K = JX K⇥ JY K, (1)

denoted X ? Y. Else called related.

Equivalent characterisation:

Proposition

The uv’s X ,Y unrelated if

JX |yK = JX K, 8y 2 JY K. (2)

Unrelatedness is equivalent to X and Y inducing qualitativelyindependent [Rényi’70] partitions of ⌦ when ⌦ is finite.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 21 / 68

Examples of Relatedness and Unrelatedness

y ya b a b| 'Y x Y⊂

a b,X Y a b,X Ya ba b| 'Y

Y x

=a bYy’

y’

a b a b| 'X y X⊂

y

x x

a b a b| 'X Xa bX

x’ x’

a) X,Y related b) X,Y unrelateda b a b| 'X X y=a bX

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 22 / 68

Markovness without Probability

DefinitionX ,Y ,Z said to form a Markov uncertainty chain X �Y �Z if

JX |y ,zK = JX |yK, 8(y ,z) 2 JY ,Z K. (3)

Equivalent to

JX ,Z |yK = JX |yK⇥ JZ |yK, 8y 2 JY K,

i.e. X ,Z conditionally unrelated given Y , or in other wordsX ? Z |Y .X ,Y ,Z said to form a conditional Markov uncertainty chain givenW if X � (Y ,W )�Z .Also write as X �Y �Z |W or X ? Z |(Y ,W ).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 23 / 68

Markovness without Probability

DefinitionX ,Y ,Z said to form a Markov uncertainty chain X �Y �Z if

JX |y ,zK = JX |yK, 8(y ,z) 2 JY ,Z K. (3)

Equivalent to

JX ,Z |yK = JX |yK⇥ JZ |yK, 8y 2 JY K,

i.e. X ,Z conditionally unrelated given Y , or in other wordsX ? Z |Y .X ,Y ,Z said to form a conditional Markov uncertainty chain givenW if X � (Y ,W )�Z .Also write as X �Y �Z |W or X ? Z |(Y ,W ).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 23 / 68

Markovness without Probability

DefinitionX ,Y ,Z said to form a Markov uncertainty chain X �Y �Z if

JX |y ,zK = JX |yK, 8(y ,z) 2 JY ,Z K. (3)

Equivalent to

JX ,Z |yK = JX |yK⇥ JZ |yK, 8y 2 JY K,

i.e. X ,Z conditionally unrelated given Y , or in other wordsX ? Z |Y .X ,Y ,Z said to form a conditional Markov uncertainty chain givenW if X � (Y ,W )�Z .Also write as X �Y �Z |W or X ? Z |(Y ,W ).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 23 / 68

Information without Probability

DefinitionTwo points (x ,y),(x 0,y 0) 2 JX ,Y K are called taxicab connected(x ,y)! (x 0y 0) if 9 a sequence

(x ,y) = (x1,y1),(x2,y2), . . . ,(xn�1,yn�1),(xn,yn) = (x 0,y 0)

of points in JX ,Y K such that each point differs in only one coordinatefrom its predecessor.

Not hard to see that ! is an equivalence relation on JX ,Y K.Call its equivalence classes a taxicab partition T [X ;Y ] of JX ,Y K.Define a nonstochastic information index

I⇤[X ;Y ] := log2 |T [X ;Y ]| 2 [0,•]. (4)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 24 / 68

Information without Probability

DefinitionTwo points (x ,y),(x 0,y 0) 2 JX ,Y K are called taxicab connected(x ,y)! (x 0y 0) if 9 a sequence

(x ,y) = (x1,y1),(x2,y2), . . . ,(xn�1,yn�1),(xn,yn) = (x 0,y 0)

of points in JX ,Y K such that each point differs in only one coordinatefrom its predecessor.

Not hard to see that ! is an equivalence relation on JX ,Y K.Call its equivalence classes a taxicab partition T [X ;Y ] of JX ,Y K.Define a nonstochastic information index

I⇤[X ;Y ] := log2 |T [X ;Y ]| 2 [0,•]. (4)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 24 / 68

Connection to Common Random Variables

T [X ;Y ] also called ergodic decomposition [Gács-Körner PCIT72].For discrete X ,Y , the elements of T [X ;Y ] are the connectedcomponents of [Wolf-Wullschleger itw04], which were shown there to bethe maximal common rv Z⇤, i.e.

I Z⇤ = f⇤(X ) = g⇤(Y ) under suitable mappings f⇤,g⇤(since points in distinct sets in T [X ;Y ] are not taxicab-connected)

I If another rv Z ⌘ f (X )⌘ g(Y ), then Z ⌘ k(Z⇤)(since all points in the same set in T [X ;Y ] are taxicab-connected)

Not hard to see that Z⇤ also has the largest no. distinct values ofany common rv Z ⌘ f (X )⌘ g(Y ) .I⇤[X ;Y ] = Hartley entropy of Z⇤.Maximal common rv’s first described in the brief paper ‘The latticetheory of information’ [Shannon TIT53].

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 25 / 68

Connection to Common Random Variables

T [X ;Y ] also called ergodic decomposition [Gács-Körner PCIT72].For discrete X ,Y , the elements of T [X ;Y ] are the connectedcomponents of [Wolf-Wullschleger itw04], which were shown there to bethe maximal common rv Z⇤, i.e.

I Z⇤ = f⇤(X ) = g⇤(Y ) under suitable mappings f⇤,g⇤(since points in distinct sets in T [X ;Y ] are not taxicab-connected)

I If another rv Z ⌘ f (X )⌘ g(Y ), then Z ⌘ k(Z⇤)(since all points in the same set in T [X ;Y ] are taxicab-connected)

Not hard to see that Z⇤ also has the largest no. distinct values ofany common rv Z ⌘ f (X )⌘ g(Y ) .I⇤[X ;Y ] = Hartley entropy of Z⇤.Maximal common rv’s first described in the brief paper ‘The latticetheory of information’ [Shannon TIT53].

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 25 / 68

ExamplesExamples

25

y

x

y

xz=1

z=0

z=1

z=0

| | 2 max.# distinct valuesthat can always be agreed on from separate observations of & . X Y

= =T | | 1 max.# distinct valuesthat can always be agreed on from separate observations of & . X Y

= =T

z=0

z=0

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 26 / 68

Equivalent View via Overlap Partitions

As in probability, often easier to work with conditional rather thanjoint ranges.Let JX |Y K := {JX |yK : y 2 JY K} be the conditional range family.

DefinitionTwo points x ,x 0 are called JX |Y K-overlap-connected if 9 a sequence ofsets B1, . . . ,Bn 2 JX |Y K s.t.

x 2 B1 and x 0 2 Bn

Bi \Bi+1 6= /0, 8i 2 [1 : n�1].

Overlap connectedness is an equivalence relation on JX K,induced by JX |Y K.Let the overlap partition JX |Y K⇤ of JX K denote the equivalenceclasses.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 27 / 68

Equivalent View via Overlap Partitions

As in probability, often easier to work with conditional rather thanjoint ranges.Let JX |Y K := {JX |yK : y 2 JY K} be the conditional range family.

DefinitionTwo points x ,x 0 are called JX |Y K-overlap-connected if 9 a sequence ofsets B1, . . . ,Bn 2 JX |Y K s.t.

x 2 B1 and x 0 2 Bn

Bi \Bi+1 6= /0, 8i 2 [1 : n�1].

Overlap connectedness is an equivalence relation on JX K,induced by JX |Y K.Let the overlap partition JX |Y K⇤ of JX K denote the equivalenceclasses.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 27 / 68

Equivalent View via Overlap Partitions (cont.)

PropositionFor any uv’s X ,Y,

I⇤[X ;Y ] = log2 |JX |Y K⇤| . (5)

Proof Sketch:For any two points (x ,y),(x 0,y 0) 2 JX ,Y K, (x ,y)! (x 0,y 0) iff x 0

and x 0 are JX |Y K-overlap-connected.This allows us to set up a bijection between the partitions T [X ;Y ]and JX |Y K⇤.) T [X ;Y ] and JX |Y K⇤ must have the same cardinality.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 28 / 68

Equivalent View via Overlap Partitions (cont.)

PropositionFor any uv’s X ,Y,

I⇤[X ;Y ] = log2 |JX |Y K⇤| . (5)

Proof Sketch:For any two points (x ,y),(x 0,y 0) 2 JX ,Y K, (x ,y)! (x 0,y 0) iff x 0

and x 0 are JX |Y K-overlap-connected.This allows us to set up a bijection between the partitions T [X ;Y ]and JX |Y K⇤.) T [X ;Y ] and JX |Y K⇤ must have the same cardinality.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 28 / 68

Equivalent View via Overlap Partitions (cont.)

PropositionFor any uv’s X ,Y,

I⇤[X ;Y ] = log2 |JX |Y K⇤| . (5)

Proof Sketch:For any two points (x ,y),(x 0,y 0) 2 JX ,Y K, (x ,y)! (x 0,y 0) iff x 0

and x 0 are JX |Y K-overlap-connected.This allows us to set up a bijection between the partitions T [X ;Y ]and JX |Y K⇤.) T [X ;Y ] and JX |Y K⇤ must have the same cardinality.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 28 / 68

Properties of I⇤

(Nonnegativity) I⇤[X ;Y ]� 0 (obvious)(Symmetry) I⇤[X ;Y ] = I⇤[Y ;X ]. Follows from the fact that

(x ,y)! (x 0,y 0) 2 JX ,Y K () (y ,x)! (y 0,x 0) 2 JY ,X K. (6)

From this property and (5), knowing just one of the conditionalrange families JX |Y K or JY |X K is enough to determine I⇤[X ;Y ].Not like ordinary mutual information.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 29 / 68

Properties of I⇤ (cont.)

Proposition (Monotonicity)For any uv’s X ,Y ,Z,

I⇤[X ;Y ,Z ]� I⇤[X ;Y ]. (7)

Proof: Idea is to find a surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. This wouldautomatically imply that the latter cannot have greater cardinality.

Pick any set B 2 JX |Y ,Z K⇤ and choose a B0 2 JX |Y K⇤ s.t.B\B0 6= /0.At least one such B0 exists, since JX |Y K⇤ covers JX K ◆ B.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 30 / 68

Properties of I⇤ (cont.)

Proposition (Monotonicity)For any uv’s X ,Y ,Z,

I⇤[X ;Y ,Z ]� I⇤[X ;Y ]. (7)

Proof: Idea is to find a surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. This wouldautomatically imply that the latter cannot have greater cardinality.

Pick any set B 2 JX |Y ,Z K⇤ and choose a B0 2 JX |Y K⇤ s.t.B\B0 6= /0.At least one such B0 exists, since JX |Y K⇤ covers JX K ◆ B.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 30 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Thus B 7! B0 is a well-defined map from JX |Y ,Z K⇤ ! JX |Y K⇤.It is also onto since, as noted before, every set B0 2 JX |Y K⇤intersects some B in JX |Y ,Z K⇤, which covers JX K.So B 7! B0 is the required surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 31 / 68

Properties of I⇤ (cont.)Proposition (Data Processing)For Markov uncertainty chains X �Y �Z (3),

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

By monotonicity and the overlap partition characterisation of I⇤,

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

By Markovness (3), JX |y ,zK = JX |yK, 8y 2 JY K and z 2 JZ |yK.) JX |Y ,Z K = JX |Y K.) JX |Y ,Z K⇤ = JX |Y K⇤.Substitute into the RHS of the equation above. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 32 / 68

Properties of I⇤ (cont.)Proposition (Data Processing)For Markov uncertainty chains X �Y �Z (3),

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

By monotonicity and the overlap partition characterisation of I⇤,

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

By Markovness (3), JX |y ,zK = JX |yK, 8y 2 JY K and z 2 JZ |yK.) JX |Y ,Z K = JX |Y K.) JX |Y ,Z K⇤ = JX |Y K⇤.Substitute into the RHS of the equation above. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 32 / 68

Properties of I⇤ (cont.)Proposition (Data Processing)For Markov uncertainty chains X �Y �Z (3),

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

By monotonicity and the overlap partition characterisation of I⇤,

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

By Markovness (3), JX |y ,zK = JX |yK, 8y 2 JY K and z 2 JZ |yK.) JX |Y ,Z K = JX |Y K.) JX |Y ,Z K⇤ = JX |Y K⇤.Substitute into the RHS of the equation above. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 32 / 68

Properties of I⇤ (cont.)Proposition (Data Processing)For Markov uncertainty chains X �Y �Z (3),

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

By monotonicity and the overlap partition characterisation of I⇤,

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

By Markovness (3), JX |y ,zK = JX |yK, 8y 2 JY K and z 2 JZ |yK.) JX |Y ,Z K = JX |Y K.) JX |Y ,Z K⇤ = JX |Y K⇤.Substitute into the RHS of the equation above. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 32 / 68

Properties of I⇤ (cont.)Proposition (Data Processing)For Markov uncertainty chains X �Y �Z (3),

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

By monotonicity and the overlap partition characterisation of I⇤,

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

By Markovness (3), JX |y ,zK = JX |yK, 8y 2 JY K and z 2 JZ |yK.) JX |Y ,Z K = JX |Y K.) JX |Y ,Z K⇤ = JX |Y K⇤.Substitute into the RHS of the equation above. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 32 / 68

Stationary Memoryless Uncertain Channels - Take 1

An uncertain signal X is a mapping from ⌦ to the space X• ofdiscrete-time sequences x = (xi)

•i=0 in X.

A stationary memoryless uncertain channel may be defined interms of

I input and output spaces X,Y;I a set-valued transition function T : X! 2Y;I and the family of all uncertain input-output signal pairs (X ,Y ) s.t.

JYk |x0:k ,y0:k�1K = JYk |xk K = T(xk ), k 2 Z�0. (9)

If channel ‘used without feedback’, then impose the extraconstraint

JXk |x0:k�1,y0:k�1K = JXk |x0:k�1K, k 2 Z�0, (10)

on (X ,Y ).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 33 / 68

Stationary Memoryless Uncertain Channels - Take 1

An uncertain signal X is a mapping from ⌦ to the space X• ofdiscrete-time sequences x = (xi)

•i=0 in X.

A stationary memoryless uncertain channel may be defined interms of

I input and output spaces X,Y;I a set-valued transition function T : X! 2Y;I and the family of all uncertain input-output signal pairs (X ,Y ) s.t.

JYk |x0:k ,y0:k�1K = JYk |xk K = T(xk ), k 2 Z�0. (9)

If channel ‘used without feedback’, then impose the extraconstraint

JXk |x0:k�1,y0:k�1K = JXk |x0:k�1K, k 2 Z�0, (10)

on (X ,Y ).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 33 / 68

Stationary Memoryless Uncertain Channels - Take 1

An uncertain signal X is a mapping from ⌦ to the space X• ofdiscrete-time sequences x = (xi)

•i=0 in X.

A stationary memoryless uncertain channel may be defined interms of

I input and output spaces X,Y;I a set-valued transition function T : X! 2Y;I and the family of all uncertain input-output signal pairs (X ,Y ) s.t.

JYk |x0:k ,y0:k�1K = JYk |xk K = T(xk ), k 2 Z�0. (9)

If channel ‘used without feedback’, then impose the extraconstraint

JXk |x0:k�1,y0:k�1K = JXk |x0:k�1K, k 2 Z�0, (10)

on (X ,Y ).

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 33 / 68

Channel Noise?

Previous formulation parallels [Massey isit90] for stationarymemoryless stochastic channels:

fYk |X0:k ,Y0:k�1(yk |x0:k ,y0:k�1) = fYk |Xk (yk |xk )⌘ q(yk ,xk ).

In many cases, it is enough to think in terms of these conditionalranges. Channel noise implicit.However, in many cases it is convenient to model channel noiseexplicitly. E.g.

I when the transmitter has access to some function of past channelnoise, not just past channel outputs,

I or when the channel is part of a larger system, with other input andnoise signals.In this case, previous formulation would have to be changed toinclude the other terms in the conditioning arguments.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 34 / 68

Channel Noise?

Previous formulation parallels [Massey isit90] for stationarymemoryless stochastic channels:

fYk |X0:k ,Y0:k�1(yk |x0:k ,y0:k�1) = fYk |Xk (yk |xk )⌘ q(yk ,xk ).

In many cases, it is enough to think in terms of these conditionalranges. Channel noise implicit.However, in many cases it is convenient to model channel noiseexplicitly. E.g.

I when the transmitter has access to some function of past channelnoise, not just past channel outputs,

I or when the channel is part of a larger system, with other input andnoise signals.In this case, previous formulation would have to be changed toinclude the other terms in the conditioning arguments.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 34 / 68

Channel as Noisy Function

DefinitionA stationary memoryless uncertain channel (SMUC) consists of

an unrelated, identically spread (uis) noise signal V = (Vk )•k=0

taking values over a space V, i.e.

JVk |v0:k�1K = JVkK = V, 8v0:k�1 2 Vk ,k 2 Z�0; (11)

input and output spaces X,Y, and a transition functiont : X⇥V! Y;and the family G of all uncertain input-output signal pairs (X ,Y )s.t. 8k 2 Z�0,

I Yk = t(Xk ,Vk ),I and X0:k ? Vk

If channel used w/o feedback, then tighten last condition so thatX ? V . Yields smaller family Gnf ⇢ G .

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 35 / 68

Zero Error Coding in UV Framework (No Feedback)

Decoder Channel kY

Coder kX

M Message

M

kV

A zero-error code w/o feedback is defined byI a block length n+1 2 N;I a message cardinality µ � 1;I and an encoder mapping g : [1 : µ]! Xn+1, s.t. for any M ? V

taking µ distinct values m1, . . . ,mµ ,F X0:n = g(i) if M = mi .F |JM|y0:nK|= 1,8y0:n 2 JY0:nK.

Last condition equivalent to existence of a decoder that alwaysmaps Y0:n 7! M, despite channel noise.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 36 / 68

Zero Error Coding in UV Framework (No Feedback)

Decoder Channel kY

Coder kX

M Message

M

kV

A zero-error code w/o feedback is defined byI a block length n+1 2 N;I a message cardinality µ � 1;I and an encoder mapping g : [1 : µ]! Xn+1, s.t. for any M ? V

taking µ distinct values m1, . . . ,mµ ,F X0:n = g(i) if M = mi .F |JM|y0:nK|= 1,8y0:n 2 JY0:nK.

Last condition equivalent to existence of a decoder that alwaysmaps Y0:n 7! M, despite channel noise.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 36 / 68

Zero Error Capacity and I⇤Zero-error capacity C0 defined operationally, as the highestblock-coding rate that yields zero errors:

C0 := supn,µ2N,g1:n

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (12)

Theorem (after N. TAC13)

C0 = supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1

= lim

n!•sup

(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1

!. (13)

In [Wolf-Wullschleger itw04], C0 was characterised as the largestShannon entropy rate of the maximal rv Zn common to discreteX0:n,Y0:n.Key idea is similar, but nonstochastic and applicable tocontinuous-valued X ,Y

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 37 / 68

Zero Error Capacity and I⇤Zero-error capacity C0 defined operationally, as the highestblock-coding rate that yields zero errors:

C0 := supn,µ2N,g1:n

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (12)

Theorem (after N. TAC13)

C0 = supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1

= lim

n!•sup

(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1

!. (13)

In [Wolf-Wullschleger itw04], C0 was characterised as the largestShannon entropy rate of the maximal rv Zn common to discreteX0:n,Y0:n.Key idea is similar, but nonstochastic and applicable tocontinuous-valued X ,Y

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 37 / 68

Zero Error Capacity and I⇤Zero-error capacity C0 defined operationally, as the highestblock-coding rate that yields zero errors:

C0 := supn,µ2N,g1:n

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (12)

Theorem (after N. TAC13)

C0 = supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1

= lim

n!•sup

(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1

!. (13)

In [Wolf-Wullschleger itw04], C0 was characterised as the largestShannon entropy rate of the maximal rv Zn common to discreteX0:n,Y0:n.Key idea is similar, but nonstochastic and applicable tocontinuous-valued X ,Y

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 37 / 68

Proof: � (Construct a Code)

Pick any (X ,Y ) 2 Gnf ,n 2 N. Let

µ = |JX0:n;Y0:nK⇤|⌘ |JY0:n;X0:nK⇤| ,

and index the overlap partition sets:

JX0:n|Y0:nK⇤ ⌘ {PX (z) : z 2 [1 : µ]} , (14)JY0:n|X0:nK⇤ ⌘ {PY (z) : z 2 [1 : µ]} . (15)

Define uv Z as the unique index s.t. PX (Z ) 3 X0:n.This is also the unique index s.t. PY (Z ) 3 Y0:n.For each z 2 [1 : µ], pick an input sequence x(z) 2 PX (z)✓ JX0:nKand define the coder map

g(z) = x(z) 2 JX0:nK, 8z 2 [1 : µ].

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 38 / 68

Proof: � (Construct a Code)

Pick any (X ,Y ) 2 Gnf ,n 2 N. Let

µ = |JX0:n;Y0:nK⇤|⌘ |JY0:n;X0:nK⇤| ,

and index the overlap partition sets:

JX0:n|Y0:nK⇤ ⌘ {PX (z) : z 2 [1 : µ]} , (14)JY0:n|X0:nK⇤ ⌘ {PY (z) : z 2 [1 : µ]} . (15)

Define uv Z as the unique index s.t. PX (Z ) 3 X0:n.This is also the unique index s.t. PY (Z ) 3 Y0:n.For each z 2 [1 : µ], pick an input sequence x(z) 2 PX (z)✓ JX0:nKand define the coder map

g(z) = x(z) 2 JX0:nK, 8z 2 [1 : µ].

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 38 / 68

Proof: � (cont.)Now, consider any message M ? V that can take µ distinct valuesm1, . . . ,mµ . Encode this message to give an input uv sequence

X 00:n = x(i) if M = mi .

This yields an output sequence Y 00:n, where

Y 0k = t(X 0

k ,Vk ), k 2 [0 : n].

As M and X0:n each ? V , it follows that if M = mi then

JY 00:n|X 0

0:n = x(i)K = JY0:n|X0:n = x(i)K ✓ PY (i).

Sets PY (1), . . .PY (µ) are disjoint since they form a partition) Message M can be recovered from Y 0

0:n with this code.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 39 / 68

Proof: � (cont.)Now, consider any message M ? V that can take µ distinct valuesm1, . . . ,mµ . Encode this message to give an input uv sequence

X 00:n = x(i) if M = mi .

This yields an output sequence Y 00:n, where

Y 0k = t(X 0

k ,Vk ), k 2 [0 : n].

As M and X0:n each ? V , it follows that if M = mi then

JY 00:n|X 0

0:n = x(i)K = JY0:n|X0:n = x(i)K ✓ PY (i).

Sets PY (1), . . .PY (µ) are disjoint since they form a partition) Message M can be recovered from Y 0

0:n with this code.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 39 / 68

Proof: � (cont.)Now, consider any message M ? V that can take µ distinct valuesm1, . . . ,mµ . Encode this message to give an input uv sequence

X 00:n = x(i) if M = mi .

This yields an output sequence Y 00:n, where

Y 0k = t(X 0

k ,Vk ), k 2 [0 : n].

As M and X0:n each ? V , it follows that if M = mi then

JY 00:n|X 0

0:n = x(i)K = JY0:n|X0:n = x(i)K ✓ PY (i).

Sets PY (1), . . .PY (µ) are disjoint since they form a partition) Message M can be recovered from Y 0

0:n with this code.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 39 / 68

Proof: � (cont.)

Thus

C0 �log2 µn+1

=log2 |JX0:n|Y0:nK⇤|

n+1=

I⇤[X0:n;Y0:n]

n+1.

As (X ,Y ) 2 Gnf and n 2 Z were arbitrary,

C0 � supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 40 / 68

Proof: � (cont.)

Thus

C0 �log2 µn+1

=log2 |JX0:n|Y0:nK⇤|

n+1=

I⇤[X0:n;Y0:n]

n+1.

As (X ,Y ) 2 Gnf and n 2 Z were arbitrary,

C0 � supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 40 / 68

Proof: (Construct (X ,Y ) 2 Gnf )

Select an arbitrary zero-error code (n,µ,g).Pick a message uv M ? V taking distinct values m1, . . . ,mµ .Set

X0:n = g(i)if M = mi

Xk = Xn, k > n.Yk = t(Xk ,Vk ), k 2 Z�0.

As X0:n is a function of M ? V , it follows that X ? V .Thus (X ,Y ) 2 Gnf .

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 41 / 68

Proof: (cont.)

By zero-error property, the sets JY0:n|X0:n = g(i)K, i = 1, . . . ,µ, aredisjoint, therefore distinct.Thus each partition set in JY0:n|X0:nK⇤ contains exactly one ofthese sets:

I It includes at least one set JY0:n|x0:nK.I If it includes more than one such set then, by definition of the

overlap partition they would have overlaps, which is impossible.

) µ = |JY0:n|X0:nK⇤|.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 42 / 68

Proof: (cont.)

Thus

log2 µn+1

=log2 |JY0:n|X0:nK⇤|

n+1 sup

n�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1.

As the zero-error code (n,µ,g) was arbitrary, we can take asupremum in the LHS to get

C0 supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n+1.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 43 / 68

Conditional Maximin Information

Let T [X ;Y |w ] := taxicab partition of the conditional joint rangeJX ,Y |wK, given W = w .Then define conditional nonstochastic information

I⇤[X ;Y |W ] := minw2JW K

log2 |T [X ;Y |w ]| .

= Log-cardinality of most refined variable common to (X ,W ) and(Y ,W ) but unrelated to W .I.e. if two agents each observe X ,Y separately but also share W ,then I⇤[X ;Y |W ] captures the most refined variable that is ‘new’with respect to W and on which they can both agree.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 44 / 68

Conditional Maximin Information

Let T [X ;Y |w ] := taxicab partition of the conditional joint rangeJX ,Y |wK, given W = w .Then define conditional nonstochastic information

I⇤[X ;Y |W ] := minw2JW K

log2 |T [X ;Y |w ]| .

= Log-cardinality of most refined variable common to (X ,W ) and(Y ,W ) but unrelated to W .I.e. if two agents each observe X ,Y separately but also share W ,then I⇤[X ;Y |W ] captures the most refined variable that is ‘new’with respect to W and on which they can both agree.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 44 / 68

Zero Error Coding with Feedback

Decoder Channel kY

Coder kX

M Message

M

kV

Unit Delay 1−kY

A zero-error code with feedback is defined byI a block length n+1 2 N;I a message cardinality µ � 1;I and a sequence g0:n of encoder mappings s.t. for any message

M ? V taking values m1, . . . ,mµ ,F Xk = gk (i ,Y0:k�1) if M = mi ,F |JM|y0:nK|= 1,8y0:n 2 JY0:nK.

Last condition equivalent to existence of a decoder that canreconstruct M from Y0:n without error.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 45 / 68

C0f and Directed Nonstochastic Information

Zero-error feedback capacity C0f defined operationally, as thehighest feedback coding rate that yields zero errors:

C0f := supn,µ2N,g1:n

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (16)

Growth rate of maximum cardinality of sets of feedback codingfunctions that can be unambiguously determined from channeloutputs.Define directed nonstochastic information

I⇤[X0:n ! Y0:n] :=n

Âk=0

I⇤[X0:k ;Yk |Y0:k�1]

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 46 / 68

C0f in terms of Directed Nonstochastic Information

Theorem (N. cdc12)For a stationary memoryless uncertain channel,

C0f = supn�0,(X ,Y )2G

I⇤[X0:n ! Y0:n]

n+1.

Parallels characterisation in [Kim TIT08, Tatikonda-Mitter TIT09] for ordinaryfeedback capacity Cf of stochastic channels:

Cf = supn�0,pXk |X0:k�1 ,Y0:k�1

,0kn

I[X0:n ! Y0:n]

n+1,

where Marko-Massey directed informationI[X0:n ! Y0:n] := Ân

k=0 I[X0:k ;Yk |Y0:k�1],and conditional information I[X ;Y |Z ] := H[X |Z ]�H[X |Y ,Z ].

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 47 / 68

LTI State Estimation over Noisy Channels

kU kY Decoder/ Estimator

kX̂Channel

kQQuantiser/ Coder

kS

kkkk

kkk

VBUAXXWGXY

�� �

�1

,

kk WV , Noise

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 48 / 68

LTI State Estimation - Disturbance-Free

Plant: LTI, noiseless, zero input:

Xk+1 = AXk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Estimator: Q0:k 7! X̂k+1.Objective: Uniform r-exponential convergence from an l-ball. I.e.

given r, l > 0, construct a coder-estimator s.t. for any uvX0 with JX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXk � X̂kk= 0.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 49 / 68

LTI State Estimation - Disturbance-Free

Plant: LTI, noiseless, zero input:

Xk+1 = AXk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Estimator: Q0:k 7! X̂k+1.Objective: Uniform r-exponential convergence from an l-ball. I.e.

given r, l > 0, construct a coder-estimator s.t. for any uvX0 with JX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXk � X̂kk= 0.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 49 / 68

Disturbance-Free State Estimation and C0

Assumptions:DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable, where Ar := A restricted to

eigenspace governed by eigenvalues of magnitude � r.DF3: X0 ? Z

Theorem (N. TAC13)If uniform r-exponential convergence is achieved from some l-ball,then

C0 � Â|li |�r

log2

✓|li |r

◆. (17)

Conversely, if (26) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniform r-exponential convergence from Bl(0) can beconstructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 50 / 68

Disturbance-Free State Estimation and C0

Assumptions:DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable, where Ar := A restricted to

eigenspace governed by eigenvalues of magnitude � r.DF3: X0 ? Z

Theorem (N. TAC13)If uniform r-exponential convergence is achieved from some l-ball,then

C0 � Â|li |�r

log2

✓|li |r

◆. (17)

Conversely, if (26) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniform r-exponential convergence from Bl(0) can beconstructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 50 / 68

Necessity Argument - Scalar Case

Pick arbitrarily large t 2 N and small e 2⇣

0,1� r|l |

⌘.

Divide [�l , l] into

k :=

$����(1� e)l

r

����t%� 1

equal intervals of length 2l/k.Inside each interval construct a centred subinterval I(s) of shorterlength l/k. Define the subinterval family

H := {I(s) : s = 1, . . . ,k}, (18)

noting that subintervals 2 H are separated by a gap � l/k.Set the initial state range JX0K =

SH2H H ⇢ [�l , l].

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 51 / 68

Necessity Argument - Scalar Case (cont.)

Let Ek := Xk � X̂k . By hypothesis, 9f > 0 s.t.

frk � supJ|Ek |K� 0.5diamJEkK (19)� 0.5diamJEk |q0:k�1K (20)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

z

= 0.5diamJl kX0|q0:k�1K (21)= 0.5|l |k diamJX0|q0:k�1K (22)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 52 / 68

Necessity Argument - Scalar Case (cont.)

Let Ek := Xk � X̂k . By hypothesis, 9f > 0 s.t.

frk � supJ|Ek |K� 0.5diamJEkK (19)� 0.5diamJEk |q0:k�1K (20)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

z

= 0.5diamJl kX0|q0:k�1K (21)= 0.5|l |k diamJX0|q0:k�1K (22)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 52 / 68

Necessity Argument - Scalar Case (cont.)

Let Ek := Xk � X̂k . By hypothesis, 9f > 0 s.t.

frk � supJ|Ek |K� 0.5diamJEkK (19)� 0.5diamJEk |q0:k�1K (20)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

z

= 0.5diamJl kX0|q0:k�1K (21)= 0.5|l |k diamJX0|q0:k�1K (22)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 52 / 68

Necessity Argument - Scalar Case (cont.)

Let Ek := Xk � X̂k . By hypothesis, 9f > 0 s.t.

frk � supJ|Ek |K� 0.5diamJEkK (19)� 0.5diamJEk |q0:k�1K (20)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

z

= 0.5diamJl kX0|q0:k�1K (21)= 0.5|l |k diamJX0|q0:k�1K (22)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 52 / 68

Necessity Argument - Scalar Case (cont.)

Let Ek := Xk � X̂k . By hypothesis, 9f > 0 s.t.

frk � supJ|Ek |K� 0.5diamJEkK (19)� 0.5diamJEk |q0:k�1K (20)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

z

= 0.5diamJl kX0|q0:k�1K (21)= 0.5|l |k diamJX0|q0:k�1K (22)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 52 / 68

Necessity Argument - Scalar Case (cont.)

Next show that for large t , no two sets in H (18) can beJX0|Q0:t�1K-overlap-connected:Suppose in contradiction that 9H 2 H that isJX0|Q0:t�1K-overlap-connected with another set in H .)9 JX0|q0:t�1K containing both a point u 2 H and a point v insome H0 2 H \{H})

|u�v | diamJX0|q0:t�1K(22) 2fr t

|l |t . (23)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 53 / 68

Necessity Argument - Scalar Case (cont.)

Next show that for large t , no two sets in H (18) can beJX0|Q0:t�1K-overlap-connected:Suppose in contradiction that 9H 2 H that isJX0|Q0:t�1K-overlap-connected with another set in H .)9 JX0|q0:t�1K containing both a point u 2 H and a point v insome H0 2 H \{H})

|u�v | diamJX0|q0:t�1K(22) 2fr t

|l |t . (23)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 53 / 68

Necessity Argument - Scalar Case (cont.)

Next show that for large t , no two sets in H (18) can beJX0|Q0:t�1K-overlap-connected:Suppose in contradiction that 9H 2 H that isJX0|Q0:t�1K-overlap-connected with another set in H .)9 JX0|q0:t�1K containing both a point u 2 H and a point v insome H0 2 H \{H})

|u�v | diamJX0|q0:t�1K(22) 2fr t

|l |t . (23)

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 53 / 68

Necessity Argument - Scalar Case (cont.)

However, any two sets 2 H are separated by a distance of atleast l/k. So

|u�v | � lk=

lj((1� e)|l |/r)t

k

� l((1� e)|l |/r)t =

lr t

|(1� e)l |t .

The RHS of this would exceed the RHS of (23) when t issufficiently large that

� 11�e�t> 2f/l , yielding a contradiction.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 54 / 68

Necessity Argument - Scalar Case (cont.)

So for large enough t , no two sets of H areJX0|Q0:t�1K-overlap-connected.So

2I⇤[X0;Q0:t�1] ⌘ |JX0|Q0:t�1K⇤| � |H |

=

$����(1� e)l

r

����t%

� 0.5����(1� e)l

r

����t, (24)

since bxc> x/2, for every x � 1.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 55 / 68

Necessity Argument - Scalar Case (cont.)

So for large enough t , no two sets of H areJX0|Q0:t�1K-overlap-connected.So

2I⇤[X0;Q0:t�1] ⌘ |JX0|Q0:t�1K⇤| � |H |

=

$����(1� e)l

r

����t%

� 0.5����(1� e)l

r

����t, (24)

since bxc> x/2, for every x � 1.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 55 / 68

Necessity Argument - Scalar Case (cont.)

But X0 �S0:t�1 �Q0:t�1 is a Markov uncertainty chain, so

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

tC0.

Substitute into the LHS of (24), take logarithms and divide by t toget

C0 � log2(1� e)+ log2

����lr

�����1t.

Letting t ! • yields

C0 � log2(1� e)+ log2

����lr

���� .

As e can be made arbitrarily small, we are done. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 56 / 68

Necessity Argument - Scalar Case (cont.)

But X0 �S0:t�1 �Q0:t�1 is a Markov uncertainty chain, so

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

tC0.

Substitute into the LHS of (24), take logarithms and divide by t toget

C0 � log2(1� e)+ log2

����lr

�����1t.

Letting t ! • yields

C0 � log2(1� e)+ log2

����lr

���� .

As e can be made arbitrarily small, we are done. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 56 / 68

Necessity Argument - Scalar Case (cont.)

But X0 �S0:t�1 �Q0:t�1 is a Markov uncertainty chain, so

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

tC0.

Substitute into the LHS of (24), take logarithms and divide by t toget

C0 � log2(1� e)+ log2

����lr

�����1t.

Letting t ! • yields

C0 � log2(1� e)+ log2

����lr

���� .

As e can be made arbitrarily small, we are done. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 56 / 68

Necessity Argument - Scalar Case (cont.)

But X0 �S0:t�1 �Q0:t�1 is a Markov uncertainty chain, so

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

tC0.

Substitute into the LHS of (24), take logarithms and divide by t toget

C0 � log2(1� e)+ log2

����lr

�����1t.

Letting t ! • yields

C0 � log2(1� e)+ log2

����lr

���� .

As e can be made arbitrarily small, we are done. ⇤

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 56 / 68

State Estimation with Plant Disturbances

Plant: LTIXk+1 = AXk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Estimator: Q0:k 7! X̂k+1.Objective: Uniformly bounded estimation errors beginning from an

l-ball. I.e. given l > 0, construct a coder-estimator s.t. forany initial state X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXk � X̂kk< •.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 57 / 68

State Estimation with Plant Disturbances

Plant: LTIXk+1 = AXk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Estimator: Q0:k 7! X̂k+1.Objective: Uniformly bounded estimation errors beginning from an

l-ball. I.e. given l > 0, construct a coder-estimator s.t. forany initial state X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXk � X̂kk< •.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 57 / 68

Estimation with Disturbances and C0Assumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable, where A1 := A restricted to

eigenspace governed by eigenvalues of magnitude � 1.D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. TAC13)If uniformly bounded estimation errors are achieved from some l-ball,then

C0 � Â|li |�1

log2 |li |. (25)

Conversely, if (25) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniformly bounded estimation errors from Bl(0) can beconstructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 58 / 68

Estimation with Disturbances and C0Assumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable, where A1 := A restricted to

eigenspace governed by eigenvalues of magnitude � 1.D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. TAC13)If uniformly bounded estimation errors are achieved from some l-ball,then

C0 � Â|li |�1

log2 |li |. (25)

Conversely, if (25) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniformly bounded estimation errors from Bl(0) can beconstructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 58 / 68

Control over Noisy Channels

kkkk

kkk

VBUAXXWGXY

�� �

�1

, kU kY Decoder/ Controller

Channel kQ Quantiser/ Coder

kS

kk WV , Noise

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 59 / 68

LTI Control - Disturbance-Free

Plant: LTI, noiseless, zero input:

Xk+1 = AXk +BUk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Controller: Q0:k 7! Uk .Objective: Uniform r-exponential stability on an l-ball. I.e. given

r, l > 0, construct a coder-controller s.t. for any uv X0 withJX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXkk= 0.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 60 / 68

LTI Control - Disturbance-Free

Plant: LTI, noiseless, zero input:

Xk+1 = AXk +BUk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Controller: Q0:k 7! Uk .Objective: Uniform r-exponential stability on an l-ball. I.e. given

r, l > 0, construct a coder-controller s.t. for any uv X0 withJX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXkk= 0.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 60 / 68

Disturbance-Free Control and C0fAssumptions:

DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable and (Ar ,B) is controllable, where

Ar := A restricted to eigenspace governed by eigenvaluesof magnitude � r.

DF3: X0 ? Z

PropositionIf uniform r-exponential stability is achieved on some l-ball, then

C0f � Â|li |�r

log2

✓|li |r

◆. (26)

Conversely, if (26) holds strictly, then for any l > 0, a coder-controllerthat achieves uniform r-exponential stability on Bl(0) can beconstructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 61 / 68

Disturbance-Free Control and C0fAssumptions:

DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable and (Ar ,B) is controllable, where

Ar := A restricted to eigenspace governed by eigenvaluesof magnitude � r.

DF3: X0 ? Z

PropositionIf uniform r-exponential stability is achieved on some l-ball, then

C0f � Â|li |�r

log2

✓|li |r

◆. (26)

Conversely, if (26) holds strictly, then for any l > 0, a coder-controllerthat achieves uniform r-exponential stability on Bl(0) can beconstructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 61 / 68

Control with Plant Disturbances

Plant: LTIXk+1 = AXk +BUk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Controller: Q0:k 7! Uk .Objective: Uniformly bounded states beginning from an l-ball. I.e.

given l > 0, construct a coder-controller s.t. for any initialstate X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXkk< •.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 62 / 68

Control with Plant Disturbances

Plant: LTIXk+1 = AXk +BUk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Controller: Q0:k 7! Uk .Objective: Uniformly bounded states beginning from an l-ball. I.e.

given l > 0, construct a coder-controller s.t. for any initialstate X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXkk< •.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 62 / 68

Control with Disturbances and C0fAssumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable and (A1,B) is controllable, where

A1 := A restricted to eigenspace governed by eigenvaluesof magnitude � 1.

D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. cdc12)If uniformly bounded estimation errors are achieved from some l-ball,then

C0f � Â|li |�1

log2 |li |. (27)

Conversely, if (27) holds strictly, then for any l > 0, a coder-controllerthat achieves uniformly bounded states from Bl(0) can be constructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 63 / 68

Control with Disturbances and C0fAssumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable and (A1,B) is controllable, where

A1 := A restricted to eigenspace governed by eigenvaluesof magnitude � 1.

D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. cdc12)If uniformly bounded estimation errors are achieved from some l-ball,then

C0f � Â|li |�1

log2 |li |. (27)

Conversely, if (27) holds strictly, then for any l > 0, a coder-controllerthat achieves uniformly bounded states from Bl(0) can be constructed.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 63 / 68

What Does That Add to [Matveev-Savkin IJC07]?

Matveev and Savkin considered similar estimation and controlproblems. Mixed formulation - plant noise was a nonstochastic,bounded disturbance, while initial state and channel werestochastic and independent.Aim was to achieve a.s. boundedness for any plant noise.Proof of necessity there used the randomness of the initial stateand channel to apply a law of large numbers.No information theory.Here, necessity is proved using data processing on Markovuncertainty chains, and analysing I⇤ and directed I⇤.No statistical assumptions.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 64 / 68

What Does That Add to [Matveev-Savkin IJC07]?

Matveev and Savkin considered similar estimation and controlproblems. Mixed formulation - plant noise was a nonstochastic,bounded disturbance, while initial state and channel werestochastic and independent.Aim was to achieve a.s. boundedness for any plant noise.Proof of necessity there used the randomness of the initial stateand channel to apply a law of large numbers.No information theory.Here, necessity is proved using data processing on Markovuncertainty chains, and analysing I⇤ and directed I⇤.No statistical assumptions.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 64 / 68

What Does That Add to [Matveev-Savkin IJC07]?

Matveev and Savkin considered similar estimation and controlproblems. Mixed formulation - plant noise was a nonstochastic,bounded disturbance, while initial state and channel werestochastic and independent.Aim was to achieve a.s. boundedness for any plant noise.Proof of necessity there used the randomness of the initial stateand channel to apply a law of large numbers.No information theory.Here, necessity is proved using data processing on Markovuncertainty chains, and analysing I⇤ and directed I⇤.No statistical assumptions.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 64 / 68

Summary

This talk described:A nonstochastic theory of uncertainty and information, withoutassuming a probability spaceIntrinsic characterisations of the operational zero-error capacityand zero-error feedback capacity for stationary memorylesschannels.An information-theoretic basis for analysing worst-case networkedestimation/control with bounded noise.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 65 / 68

Outlook

Theory is still far from mature!Tractable algorithms to estimate C0 (perhaps Monte Carlo)?Disturbances with bounded energy or time-averages?C0f for channels with memory?Zero-error feedback capacity with imperfect channel feedback?Multi-agent systems...?

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 66 / 68

References

C.E. Shannon, “A mathematical theory of communication”, Bell Syst. Tech. Jour., vol. 27, pp. 379–423, 623–56, 1948.

G.N. Nair, “A nonstochastic information theory for communication and state estimation”, IEEE Trans. Automatic Control,USA, vol. 58, no. 6, pp. 1497–510, 2013.

G.N. Nair, “A nonstochastic information theory for feedback”, Proc. 51st IEEE Conf. Decision and Control, Maui, USA,pp. 1343–8, 2012

J. Baillieul, “Feedback designs in information-based control”, Stochastic Theory and Control. Proceedings of a Workshopheld in Lawrence, Kansas, pp. 35–57, Springer, 2002

S. Tatikonda and S. Mitter, “Control under communication constraints”, IEEE Trans. Automatic Control, USA, vol. 49, no.7, pp. 1056–68, July 2004.

G. N. Nair and R. J. Evans, “Stabilizability of stochastic linear systems with finite feedback data rates”, SIAM Jour.Control and Optimization, vol. 43, no. 2, pp. 413–36, July 2004.

A. S. Matveev and A. V. Savkin, “An analogue of Shannon information theory for detection and stabilization via noisydiscrete communication channels”, SIAM Jour. Control and Optimization, vol. 46, no. 4, pp. 1323–67, 2007.

J. H. Braslavsky, R. H. Middleton and J. S. Freudenberg, “Feedback stabilization over signal-to-noise ratio constrainedchannels”, IEEE Trans. Automatic Control, USA, vol. 52, no. 8, pp. 1391–403, 2007

A. Sahai and S. Mitter, “The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisycommunication link part 1: scalar systems”, IEEE Trans. Info. Theory, pp. 3369–95, vol. 52, no. 8, 2006.

A.S. Matveev and A.V. Savkin, “Shannon zero error capacity in the problems of state estimation and stabilization vianoisy communication channels”, Int. Jour. Control, vol. 80, pp. 241–55, 2007.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 67 / 68

References (cont.)

C.E. Shannon, “The lattice theory of information”, Trans. IRE Professional Group on Info. Theory, vol. 1, iss. 1, Feb.1953., pp. 105–8.

C.E. Shannon, “The zero-error capacity of a noisy channel”, Proc. IRE Trans. Info. Theory, vol. 2, pp. 8–19, 1956.

P. Gacs and J. Korner, “Common information is far less than mutual information”, Problems of Control and InformationTheory, vol. 2, no. 2, pp. 149–62, 1973

S. Wolf and J. Wullschleger, “Zero-error information and applications in cryptography”, in Proc. Info. Theory Workshop,San Antonio, USA, 2004, pp. 1–6.

J. Massey, “Causality, feedback and directed information”, in Proc. Int. Symp. Info. Theory App., Nov. 1990, pp. 1–6

Y.H. Kim, “A coding theorem for a class of stationary channels with feedback”, IEEE Trans. Info. Theory, 1488–99, 2008.

S. Tatikonda and S. Mitter, “The capacity of channels with feedback”, IEEE Trans. Info. Theory, pp. 323–49, 2009.

G. Klir, Uncertainty and Information: Foundations of Generalized Information Theory, Wiley, 2006, ch. 2.

H. Shingin and Y. Ohta, “Disturbance rejection with information constraints: performance limitations of a scalar system forbounded and Gaussian disturbances”, Automatica, vol. 48, no. 6, pp. 1111–6, 2012.

W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints I”, IEEE Trans. AutomaticControl, USA, vol. 42, pp. 1294–9, 1997.

W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints II: stabilization with limitedinformation feedback”, IEEE Trans. Automatic Control, USA, vol. 44, pp. 1049–53, 1999.

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 68 / 68