Nonstochastic Information Theory for Feedback Controldisc-cps15.imtlucca.it/pdf/Nair.pdfk k k k k k...

Nonstochastic Information Theoryfor Feedback Control

Girish Nair

Department of Electrical and Electronic EngineeringUniversity of Melbourne

Australia

Dutch Institute of Systems and Control Summer SchoolZandvoort aan ZeeThe Netherlands

June 2015

Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 1 / 68

Outline

1 Basics of Shannon Information Theory

2 Overview of Capacity-Limited State Estimation/Control

3 Motivation for Nonstochastic Control

4 Uncertain Variables, Unrelatedness and Markovness

5 Nonstochastic Information

6 Channels and Coding Theorems

7 State Estimation and Control via Noisy Channels

Outline

What is Information?

Wikipedia:Information is that which informsi.e. that from which data and knowledge can be derived(as data represents values attributed to parameters,while knowledge is acquired through understanding ofreal things or abstract concepts, in any particular field ofstudy).

In [Shannon BSTJ 1948], information was somewhat moreconcretely defined, within a probability space.

Shannon Entropy

Prior uncertainty or entropy of a discrete random variable (rv)X ⇠ pX

H[X ] := Elog2

pX (X )

◆�=�Â

xpX (x) log2 pX (x)� 0.

Minimum expected no. yes/no questions sufficient to determine X .Joint (discrete) entropy H[X ,Y ] defined by replacing pX with pX ,Y .Conditional entropy of X given Yis average uncertainty in X given Y :

H[X |Y ] := Elog2

pX |Y (X |Y )

◆�⌘ H[X ,Y ]�H[Y ] (� 0).

Shannon Entropy

H[X ] := Elog2

pX (X )

◆�=�Â

H[X |Y ] := Elog2

pX |Y (X |Y )

◆�⌘ H[X ,Y ]�H[Y ] (� 0).

Shannon Entropy

H[X ] := Elog2

pX (X )

◆�=�Â

H[X |Y ] := Elog2

pX |Y (X |Y )

◆�⌘ H[X ,Y ]�H[Y ] (� 0).

Shannon Information

Information gained about X from Y := Reduction in uncertainty:

I[X ;Y ] := H[X ]�H[X |Y ].

Called mutual information since symmetric:

I[X ;Y ] = �Âx ,y

pX ,Y (x ,y) log2

✓pX (x)pY (y)pX ,Y (x ,y)

⌘ H[X ]+H[Y ]�H[X ,Y ].

For continuous X ,Y , replace pmf’s p with corresponding pdf’s f .

Shannon Information

I[X ;Y ] := H[X ]�H[X |Y ].

I[X ;Y ] = �Âx ,y

pX ,Y (x ,y) log2

⌘ H[X ]+H[Y ]�H[X ,Y ].

Shannon Information

I[X ;Y ] := H[X ]�H[X |Y ].

I[X ;Y ] = �Âx ,y

pX ,Y (x ,y) log2

⌘ H[X ]+H[Y ]�H[X ,Y ].

Codes for Stationary Memoryless Random Channels

Decoder Channel kY

Coder kX

kkknnXY

xyqxyp

0:0:0|

)|()|(

M Message

A block code is defined byan error tolerance e > 0, block length n+1 2 Nand message-set cardinality µ � 1;an encoder mapping g s.t. for any independent, uniformlydistributed message M 2 {m1, . . . ,mµ},

X0:n = g(i) if M = mi ;

and a decoder M̂ = d (Y0:n) s.t. Pr[M̂ 6= M] e .Nair (Uni. Melbourne) Nonstochastic Information DISC School 2015 6 / 68

Decoder Channel kY

Coder kX

kkknnXY

xyqxyp

0:0:0|

)|()|(

M Message

X0:n = g(i) if M = mi ;

Decoder Channel kY

Coder kX

kkknnXY

xyqxyp

0:0:0|

)|()|(

M Message

X0:n = g(i) if M = mi ;

Decoder Channel kY

Coder kX

kkknnXY

xyqxyp

0:0:0|

)|()|(

M Message

X0:n = g(i) if M = mi ;

Capacity and Information

Define (ordinary) capacity C operationally as the highest block-codingrate that yields vanishing error probability:

C := lime!0

supn,µ2N,g,d

log2 µn+1

= lime!0

limn!•

supµ2N,g,d

log2 µn+1

Shannon showed that capacity can also be thought of intrinsically, asthe maximum information rate across channel:

Theorem (Shannon BSTJ 1948)

C = supn�0,pX0:n

I[X0:n;Y0:n]

✓= sup

I[X ;Y ]

Capacity and Information

Define (ordinary) capacity C operationally as the highest block-codingrate that yields vanishing error probability:

C := lime!0

supn,µ2N,g,d

log2 µn+1

= lime!0

limn!•

supµ2N,g,d

log2 µn+1

Shannon showed that capacity can also be thought of intrinsically, asthe maximum information rate across channel:

Theorem (Shannon BSTJ 1948)

C = supn�0,pX0:n

I[X0:n;Y0:n]

✓= sup

I[X ;Y ]

Networked State Estimation/Control

Classical assumption: controllers and estimators knew plantoutputs perfectly.Since the 60’s this assumption has been challenged:

I Delays, due to latency and intermittent channel access, in controlarea networks.

I Quantisation errors in digital control,I Finite communication capacity per sensor in long-range radar

surveillance networks

Focus here on limited quantiser resolution and capacity.

Estimation/Control over Communication Channels

kU kY Decoder/ Estimator

kX̂Channel

kQQuantiser/ Coder

VBUAXXWGXY

��

kk WV , Noise

VBUAXXWGXY

��

, kU kY Decoder/ Controller

Channel kQ Quantiser/ Coder

kk WV , Noise

Additive Noise Model

Early work considered errorless digital channels and staticquantisers, with uniform quantiser errors modelled as additive,uncorrelated noise [e.g. Curry 1970] with variance µ 2�2R (R = bitrate).Good approximation for stable plants and high R, and allowslinear stochastic estimation/control theory to be applied.However, for unstable plants it leads to conclusions that arewrong, e.g.

I if plant is noiseless and unstable, then states/estimation errorscannot converge to 0;

I and if plant is unstable, then mean-square-boundedstates/estimation errors can always be achieved.

Errorless Channels - Data Rate TheoremIn fact, coding-based analyses reveal that stable stateestimation/control possible iff

R > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Holds under various assumptions and stability notions:

I Random initial state, noiseless plant; mean r th power convergenceto 0.[N.-Evans, Auto.03]

I Bounded initial state, noiseless plant; uniform convergence to 0[Tatikonda-Mitter, TAC04]

I Random plant noise; mean-square boundedness.[N.-Evans, SICON04]I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04]

Additive uncorrelated noise models of quantisation fail to capturethe existence of such a threshold.Necessity typically proved using differential entropy power, quantisation theory or volumepartitioning bounds.

Sufficiency via explicit construction.

R > Â|li |�1

log2 |li |,

R > Â|li |�1

log2 |li |,

R > Â|li |�1

log2 |li |,

R > Â|li |�1

log2 |li |,

R > Â|li |�1

log2 |li |,

R > Â|li |�1

log2 |li |,

Noisy Channels

‘Stable’ states/estimation errors possible iff a suitable channelfigure-of-merit (FoM) satisfies

FoM > Â|li |�1

log2 |li |,

where l1, . . . ,ln = eigenvalues of plant matrix A.Unlike noiseless channel case, FoM depends critically on stabilitynotion and noise model:

I FoM = C - states/est. errors ! 0 almost surely (a.s.) [Matveev-SavkinSIAM07], or mean-square bounded (MSB) states over AWGNchannel [Braslavsky et al. TAC07]

I FoM = Cany - MSB states over random discrete memorylesschannels [Sahai-Mitter TIT06]

I FoM = C0f for control or C0 for state estimation, with a.s. boundedstates/est. errors [Matveev-Savkin IJC07]

As C � Cany � C0f � C0, these criteria generally do not coincide.

Noisy Channels

FoM > Â|li |�1

log2 |li |,

Noisy Channels

FoM > Â|li |�1

log2 |li |,

Noisy Channels

FoM > Â|li |�1

log2 |li |,

Noisy Channels

FoM > Â|li |�1

log2 |li |,

Missing Information

If the goal is MSB or a.s. convergence ! 0 of states/estimationerrors, then information theory is crucial for finding lower bounds.However, when the goal is a.s. bounded states/errors, classicalinformation theory has played no role so far in networkedestimation/control.Yet information in some sense must be flowing across thechannel, even without a probabilistic model/objective.

Questions

Is there a meaningful theory of information for nonrandomvariables?Can we construct an information-theoretic basis for networkedestimation/control with nonrandom noise?Are there intrinsic, information-theoretic interpretations of C0 andC0f ?

Why Nonstochastic Anyway?

Long tradition in control of treating noise as nonrandom perturbationwith bounded magnitude, energy or power:

Control systems usually have mechanical/chemical components,as well as electrical.

I Dominant disturbances may not be governed by known probabilitydistributions.

I E.g. in mechanical systems, main disturbance may be vibrations atresonant frequencies determined by machine dimensions andmaterial properties.

In contrast, communication systems are mainlyelectrical/electro-magnetic/optical.

I Dominant disturbances - thermal noise, shot noise, fading etc. -well-modelled by probability distributions derived fromstatistical/quantum physics.

Why Nonstochastic Anyway? (cont.)

Related to the previous points,In most digital comm. systems, bit periods Tb ⇡ 2⇥10�5s orshorter.) Thermal and shot noise (s µ

pTb) noticeable compared to

detected signal amplitudes (µ Tb).Control systems typically operate with longer sample or bitperiods, 10�2 or 10�3s.) Thermal/shot noise negligible compared to signal amplitudes.

Related to the previous points,In most digital comm. systems, bit periods Tb ⇡ 2⇥10�5s orshorter.) Thermal and shot noise (s µ

pTb) noticeable compared to

detected signal amplitudes (µ Tb).Control systems typically operate with longer sample or bitperiods, 10�2 or 10�3s.) Thermal/shot noise negligible compared to signal amplitudes.

For safety or mission-critical reasons, stability and performanceguarantees often required every time a control system is used, ifdisturbances within rated bounds.Especially if plant is unstable or marginally stable.Or if we wish to interconnect several control systems and still besure of performance.In contrast, most consumer-oriented communications requiresgood performance only on average, or with high probability.Occasional violations of specifications permitted, and cannot beprevented within a probabilistic framework.

Probability in Practice

‘If there’s a fifty-fifty chance that something can go wrong,nine out of ten times, it will.’

(attrib. L. ‘Yogi’ Berra, former US baseball player)

(Photo from Wikipedia)

Uncertain Variable Formalism

Define an uncertain variable (uv) X to be a mapping from anunderlying sample space ⌦ to a space X.Each w 2 ⌦ may represent a specific combination of noise/inputsignals into a system, and X may represent a state/outputvariable.For a given w, x = X (w) is the realisation of X .

Unlike probability theory, no s -algebra ⇢ 2⌦ or measure on ⌦ isimposed.Assume ⌦ is uncountable to accommodate continuous X.

Uncertain Variable Formalism

Define an uncertain variable (uv) X to be a mapping from anunderlying sample space ⌦ to a space X.Each w 2 ⌦ may represent a specific combination of noise/inputsignals into a system, and X may represent a state/outputvariable.For a given w, x = X (w) is the realisation of X .

Unlike probability theory, no s -algebra ⇢ 2⌦ or measure on ⌦ isimposed.Assume ⌦ is uncountable to accommodate continuous X.

UV Formalism- Ranges and Conditioning

Marginal range JX K := {X (w) : w 2 ⌦}✓ X.Joint range JX ,Y K := {(X (w),Y (w)) : w 2 ⌦}✓ X⇥Y.Conditional range JX |yK := {X (w) : Y (w) = y ,w 2 ⌦}.

In the absence of statistical structure, the joint range fullycharacterises the relationship between X and Y . Note

JX ,Y K =[

y2JY KJX |yK⇥{y},

i.e. joint range is given by the conditional and marginal, analogously toprobability theory.

Independence Without ProbabilityDefinition

The uv’s X ,Y are called (mutually) unrelated if

JX ,Y K = JX K⇥ JY K, (1)

denoted X ? Y. Else called related.

Equivalent characterisation:

Proposition

The uv’s X ,Y unrelated if

JX |yK = JX K, 8y 2 JY K. (2)

Unrelatedness is equivalent to X and Y inducing qualitativelyindependent [Rényi’70] partitions of ⌦ when ⌦ is finite.

Examples of Relatedness and Unrelatedness

y ya b a b| 'Y x Y⊂

a b,X Y a b,X Ya ba b| 'Y

=a bYy’

a b a b| 'X y X⊂

a b a b| 'X Xa bX

x’ x’

a) X,Y related b) X,Y unrelateda b a b| 'X X y=a bX

Markovness without Probability

DefinitionX ,Y ,Z said to form a Markov uncertainty chain X �Y �Z if

JX |y ,zK = JX |yK, 8(y ,z) 2 JY ,Z K. (3)

Equivalent to

JX ,Z |yK = JX |yK⇥ JZ |yK, 8y 2 JY K,

i.e. X ,Z conditionally unrelated given Y , or in other wordsX ? Z |Y .X ,Y ,Z said to form a conditional Markov uncertainty chain givenW if X � (Y ,W )�Z .Also write as X �Y �Z |W or X ? Z |(Y ,W ).

Equivalent to

Information without Probability

DefinitionTwo points (x ,y),(x 0,y 0) 2 JX ,Y K are called taxicab connected(x ,y)! (x 0y 0) if 9 a sequence

(x ,y) = (x1,y1),(x2,y2), . . . ,(xn�1,yn�1),(xn,yn) = (x 0,y 0)

of points in JX ,Y K such that each point differs in only one coordinatefrom its predecessor.

Not hard to see that ! is an equivalence relation on JX ,Y K.Call its equivalence classes a taxicab partition T [X ;Y ] of JX ,Y K.Define a nonstochastic information index

I⇤[X ;Y ] := log2 |T [X ;Y ]| 2 [0,•]. (4)

Information without Probability

DefinitionTwo points (x ,y),(x 0,y 0) 2 JX ,Y K are called taxicab connected(x ,y)! (x 0y 0) if 9 a sequence

(x ,y) = (x1,y1),(x2,y2), . . . ,(xn�1,yn�1),(xn,yn) = (x 0,y 0)

of points in JX ,Y K such that each point differs in only one coordinatefrom its predecessor.

Not hard to see that ! is an equivalence relation on JX ,Y K.Call its equivalence classes a taxicab partition T [X ;Y ] of JX ,Y K.Define a nonstochastic information index

I⇤[X ;Y ] := log2 |T [X ;Y ]| 2 [0,•]. (4)

Connection to Common Random Variables

T [X ;Y ] also called ergodic decomposition [Gács-Körner PCIT72].For discrete X ,Y , the elements of T [X ;Y ] are the connectedcomponents of [Wolf-Wullschleger itw04], which were shown there to bethe maximal common rv Z⇤, i.e.

I Z⇤ = f⇤(X ) = g⇤(Y ) under suitable mappings f⇤,g⇤(since points in distinct sets in T [X ;Y ] are not taxicab-connected)

I If another rv Z ⌘ f (X )⌘ g(Y ), then Z ⌘ k(Z⇤)(since all points in the same set in T [X ;Y ] are taxicab-connected)

Not hard to see that Z⇤ also has the largest no. distinct values ofany common rv Z ⌘ f (X )⌘ g(Y ) .I⇤[X ;Y ] = Hartley entropy of Z⇤.Maximal common rv’s first described in the brief paper ‘The latticetheory of information’ [Shannon TIT53].

Connection to Common Random Variables

T [X ;Y ] also called ergodic decomposition [Gács-Körner PCIT72].For discrete X ,Y , the elements of T [X ;Y ] are the connectedcomponents of [Wolf-Wullschleger itw04], which were shown there to bethe maximal common rv Z⇤, i.e.

I Z⇤ = f⇤(X ) = g⇤(Y ) under suitable mappings f⇤,g⇤(since points in distinct sets in T [X ;Y ] are not taxicab-connected)

I If another rv Z ⌘ f (X )⌘ g(Y ), then Z ⌘ k(Z⇤)(since all points in the same set in T [X ;Y ] are taxicab-connected)

Not hard to see that Z⇤ also has the largest no. distinct values ofany common rv Z ⌘ f (X )⌘ g(Y ) .I⇤[X ;Y ] = Hartley entropy of Z⇤.Maximal common rv’s first described in the brief paper ‘The latticetheory of information’ [Shannon TIT53].

ExamplesExamples

| | 2 max.# distinct valuesthat can always be agreed on from separate observations of & . X Y

= =T | | 1 max.# distinct valuesthat can always be agreed on from separate observations of & . X Y

Equivalent View via Overlap Partitions

As in probability, often easier to work with conditional rather thanjoint ranges.Let JX |Y K := {JX |yK : y 2 JY K} be the conditional range family.

DefinitionTwo points x ,x 0 are called JX |Y K-overlap-connected if 9 a sequence ofsets B1, . . . ,Bn 2 JX |Y K s.t.

x 2 B1 and x 0 2 Bn

Bi \Bi+1 6= /0, 8i 2 [1 : n�1].

Overlap connectedness is an equivalence relation on JX K,induced by JX |Y K.Let the overlap partition JX |Y K⇤ of JX K denote the equivalenceclasses.

Equivalent View via Overlap Partitions

As in probability, often easier to work with conditional rather thanjoint ranges.Let JX |Y K := {JX |yK : y 2 JY K} be the conditional range family.

DefinitionTwo points x ,x 0 are called JX |Y K-overlap-connected if 9 a sequence ofsets B1, . . . ,Bn 2 JX |Y K s.t.

x 2 B1 and x 0 2 Bn

Bi \Bi+1 6= /0, 8i 2 [1 : n�1].

Overlap connectedness is an equivalence relation on JX K,induced by JX |Y K.Let the overlap partition JX |Y K⇤ of JX K denote the equivalenceclasses.

Equivalent View via Overlap Partitions (cont.)

PropositionFor any uv’s X ,Y,

I⇤[X ;Y ] = log2 |JX |Y K⇤| . (5)

Proof Sketch:For any two points (x ,y),(x 0,y 0) 2 JX ,Y K, (x ,y)! (x 0,y 0) iff x 0

and x 0 are JX |Y K-overlap-connected.This allows us to set up a bijection between the partitions T [X ;Y ]and JX |Y K⇤.) T [X ;Y ] and JX |Y K⇤ must have the same cardinality.

I⇤[X ;Y ] = log2 |JX |Y K⇤| . (5)

Properties of I⇤

(Nonnegativity) I⇤[X ;Y ]� 0 (obvious)(Symmetry) I⇤[X ;Y ] = I⇤[Y ;X ]. Follows from the fact that

(x ,y)! (x 0,y 0) 2 JX ,Y K () (y ,x)! (y 0,x 0) 2 JY ,X K. (6)

From this property and (5), knowing just one of the conditionalrange families JX |Y K or JY |X K is enough to determine I⇤[X ;Y ].Not like ordinary mutual information.

Properties of I⇤ (cont.)

Proposition (Monotonicity)For any uv’s X ,Y ,Z,

I⇤[X ;Y ,Z ]� I⇤[X ;Y ]. (7)

Proof: Idea is to find a surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. This wouldautomatically imply that the latter cannot have greater cardinality.

Pick any set B 2 JX |Y ,Z K⇤ and choose a B0 2 JX |Y K⇤ s.t.B\B0 6= /0.At least one such B0 exists, since JX |Y K⇤ covers JX K ◆ B.

Properties of I⇤ (cont.)

Proposition (Monotonicity)For any uv’s X ,Y ,Z,

I⇤[X ;Y ,Z ]� I⇤[X ;Y ]. (7)

Proof: Idea is to find a surjection from JX |Y ,Z K⇤ ! JX |Y K⇤. This wouldautomatically imply that the latter cannot have greater cardinality.

Pick any set B 2 JX |Y ,Z K⇤ and choose a B0 2 JX |Y K⇤ s.t.B\B0 6= /0.At least one such B0 exists, since JX |Y K⇤ covers JX K ◆ B.

Proof of Monotonic Property (cont.)

Furthermore, exactly one such intersecting B0 2 JX |Y K⇤ exists foreach B 2 JX |Y ,Z K⇤, since B✓ B0:

I By definition, any x 2 B and x 0 2 B\B0 are connected by asequence of successively overlapping sets in JX |Y ,Z K.

I As JX |y ,zK ✓ JX |yK, x ,x 0 are also connected by a sequence ofsuccessively overlapping sets in JX |Y K.

I But B0 = all pts. that are JX |Y K-overlap connected with therepresentative pt. x 0 2 B0, so x 2 B0.

I As x was arbitrary, B✓ B0.

Properties of I⇤ (cont.)Proposition (Data Processing)For Markov uncertainty chains X �Y �Z (3),

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

By monotonicity and the overlap partition characterisation of I⇤,

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

I⇤[X ;Z ] I⇤[X ;Y ].

Proof:

I⇤[X ;Z ](7) I⇤[X ;Y ,Z ]

(5)= log |JX |Y ,Z K⇤|. (8)

Stationary Memoryless Uncertain Channels - Take 1

An uncertain signal X is a mapping from ⌦ to the space X• ofdiscrete-time sequences x = (xi)

•i=0 in X.

A stationary memoryless uncertain channel may be defined interms of

I input and output spaces X,Y;I a set-valued transition function T : X! 2Y;I and the family of all uncertain input-output signal pairs (X ,Y ) s.t.

JYk |x0:k ,y0:k�1K = JYk |xk K = T(xk ), k 2 Z�0. (9)

If channel ‘used without feedback’, then impose the extraconstraint

JXk |x0:k�1,y0:k�1K = JXk |x0:k�1K, k 2 Z�0, (10)

on (X ,Y ).

•i=0 in X.

on (X ,Y ).

•i=0 in X.

on (X ,Y ).

Channel Noise?

Previous formulation parallels [Massey isit90] for stationarymemoryless stochastic channels:

fYk |X0:k ,Y0:k�1(yk |x0:k ,y0:k�1) = fYk |Xk (yk |xk )⌘ q(yk ,xk ).

In many cases, it is enough to think in terms of these conditionalranges. Channel noise implicit.However, in many cases it is convenient to model channel noiseexplicitly. E.g.

I when the transmitter has access to some function of past channelnoise, not just past channel outputs,

I or when the channel is part of a larger system, with other input andnoise signals.In this case, previous formulation would have to be changed toinclude the other terms in the conditioning arguments.

Channel Noise?

Previous formulation parallels [Massey isit90] for stationarymemoryless stochastic channels:

fYk |X0:k ,Y0:k�1(yk |x0:k ,y0:k�1) = fYk |Xk (yk |xk )⌘ q(yk ,xk ).

In many cases, it is enough to think in terms of these conditionalranges. Channel noise implicit.However, in many cases it is convenient to model channel noiseexplicitly. E.g.

I when the transmitter has access to some function of past channelnoise, not just past channel outputs,

I or when the channel is part of a larger system, with other input andnoise signals.In this case, previous formulation would have to be changed toinclude the other terms in the conditioning arguments.

Channel as Noisy Function

DefinitionA stationary memoryless uncertain channel (SMUC) consists of

an unrelated, identically spread (uis) noise signal V = (Vk )•k=0

taking values over a space V, i.e.

JVk |v0:k�1K = JVkK = V, 8v0:k�1 2 Vk ,k 2 Z�0; (11)

input and output spaces X,Y, and a transition functiont : X⇥V! Y;and the family G of all uncertain input-output signal pairs (X ,Y )s.t. 8k 2 Z�0,

I Yk = t(Xk ,Vk ),I and X0:k ? Vk

If channel used w/o feedback, then tighten last condition so thatX ? V . Yields smaller family Gnf ⇢ G .

Zero Error Coding in UV Framework (No Feedback)

Decoder Channel kY

Coder kX

M Message

A zero-error code w/o feedback is defined byI a block length n+1 2 N;I a message cardinality µ � 1;I and an encoder mapping g : [1 : µ]! Xn+1, s.t. for any M ? V

taking µ distinct values m1, . . . ,mµ ,F X0:n = g(i) if M = mi .F |JM|y0:nK|= 1,8y0:n 2 JY0:nK.

Last condition equivalent to existence of a decoder that alwaysmaps Y0:n 7! M, despite channel noise.

Zero Error Coding in UV Framework (No Feedback)

Decoder Channel kY

Coder kX

M Message

A zero-error code w/o feedback is defined byI a block length n+1 2 N;I a message cardinality µ � 1;I and an encoder mapping g : [1 : µ]! Xn+1, s.t. for any M ? V

taking µ distinct values m1, . . . ,mµ ,F X0:n = g(i) if M = mi .F |JM|y0:nK|= 1,8y0:n 2 JY0:nK.

Last condition equivalent to existence of a decoder that alwaysmaps Y0:n 7! M, despite channel noise.

Zero Error Capacity and I⇤Zero-error capacity C0 defined operationally, as the highestblock-coding rate that yields zero errors:

C0 := supn,µ2N,g1:n

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (12)

Theorem (after N. TAC13)

C0 = supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

n!•sup

(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

!. (13)

In [Wolf-Wullschleger itw04], C0 was characterised as the largestShannon entropy rate of the maximal rv Zn common to discreteX0:n,Y0:n.Key idea is similar, but nonstochastic and applicable tocontinuous-valued X ,Y

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (12)

I⇤[X0:n;Y0:n]

n!•sup

(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

!. (13)

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (12)

I⇤[X0:n;Y0:n]

n!•sup

(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

!. (13)

Proof: � (Construct a Code)

Pick any (X ,Y ) 2 Gnf ,n 2 N. Let

µ = |JX0:n;Y0:nK⇤|⌘ |JY0:n;X0:nK⇤| ,

and index the overlap partition sets:

JX0:n|Y0:nK⇤ ⌘ {PX (z) : z 2 [1 : µ]} , (14)JY0:n|X0:nK⇤ ⌘ {PY (z) : z 2 [1 : µ]} . (15)

Define uv Z as the unique index s.t. PX (Z ) 3 X0:n.This is also the unique index s.t. PY (Z ) 3 Y0:n.For each z 2 [1 : µ], pick an input sequence x(z) 2 PX (z)✓ JX0:nKand define the coder map

g(z) = x(z) 2 JX0:nK, 8z 2 [1 : µ].

Proof: � (Construct a Code)

Pick any (X ,Y ) 2 Gnf ,n 2 N. Let

µ = |JX0:n;Y0:nK⇤|⌘ |JY0:n;X0:nK⇤| ,

and index the overlap partition sets:

JX0:n|Y0:nK⇤ ⌘ {PX (z) : z 2 [1 : µ]} , (14)JY0:n|X0:nK⇤ ⌘ {PY (z) : z 2 [1 : µ]} . (15)

Define uv Z as the unique index s.t. PX (Z ) 3 X0:n.This is also the unique index s.t. PY (Z ) 3 Y0:n.For each z 2 [1 : µ], pick an input sequence x(z) 2 PX (z)✓ JX0:nKand define the coder map

g(z) = x(z) 2 JX0:nK, 8z 2 [1 : µ].

Proof: � (cont.)Now, consider any message M ? V that can take µ distinct valuesm1, . . . ,mµ . Encode this message to give an input uv sequence

X 00:n = x(i) if M = mi .

This yields an output sequence Y 00:n, where

Y 0k = t(X 0

k ,Vk ), k 2 [0 : n].

As M and X0:n each ? V , it follows that if M = mi then

JY 00:n|X 0

0:n = x(i)K = JY0:n|X0:n = x(i)K ✓ PY (i).

Sets PY (1), . . .PY (µ) are disjoint since they form a partition) Message M can be recovered from Y 0

0:n with this code.

X 00:n = x(i) if M = mi .

Y 0k = t(X 0

k ,Vk ), k 2 [0 : n].

JY 00:n|X 0

0:n with this code.

X 00:n = x(i) if M = mi .

Y 0k = t(X 0

k ,Vk ), k 2 [0 : n].

JY 00:n|X 0

0:n with this code.

Proof: � (cont.)

C0 �log2 µn+1

=log2 |JX0:n|Y0:nK⇤|

I⇤[X0:n;Y0:n]

As (X ,Y ) 2 Gnf and n 2 Z were arbitrary,

C0 � supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

Proof: � (cont.)

C0 �log2 µn+1

=log2 |JX0:n|Y0:nK⇤|

I⇤[X0:n;Y0:n]

As (X ,Y ) 2 Gnf and n 2 Z were arbitrary,

C0 � supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

Proof: (Construct (X ,Y ) 2 Gnf )

Select an arbitrary zero-error code (n,µ,g).Pick a message uv M ? V taking distinct values m1, . . . ,mµ .Set

X0:n = g(i)if M = mi

Xk = Xn, k > n.Yk = t(Xk ,Vk ), k 2 Z�0.

As X0:n is a function of M ? V , it follows that X ? V .Thus (X ,Y ) 2 Gnf .

Proof: (cont.)

By zero-error property, the sets JY0:n|X0:n = g(i)K, i = 1, . . . ,µ, aredisjoint, therefore distinct.Thus each partition set in JY0:n|X0:nK⇤ contains exactly one ofthese sets:

I It includes at least one set JY0:n|x0:nK.I If it includes more than one such set then, by definition of the

overlap partition they would have overlaps, which is impossible.

) µ = |JY0:n|X0:nK⇤|.

Proof: (cont.)

log2 µn+1

=log2 |JY0:n|X0:nK⇤|

n+1 sup

n�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

As the zero-error code (n,µ,g) was arbitrary, we can take asupremum in the LHS to get

C0 supn�0,(X ,Y )2Gnf

I⇤[X0:n;Y0:n]

Conditional Maximin Information

Let T [X ;Y |w ] := taxicab partition of the conditional joint rangeJX ,Y |wK, given W = w .Then define conditional nonstochastic information

I⇤[X ;Y |W ] := minw2JW K

log2 |T [X ;Y |w ]| .

= Log-cardinality of most refined variable common to (X ,W ) and(Y ,W ) but unrelated to W .I.e. if two agents each observe X ,Y separately but also share W ,then I⇤[X ;Y |W ] captures the most refined variable that is ‘new’with respect to W and on which they can both agree.

Conditional Maximin Information

Let T [X ;Y |w ] := taxicab partition of the conditional joint rangeJX ,Y |wK, given W = w .Then define conditional nonstochastic information

I⇤[X ;Y |W ] := minw2JW K

log2 |T [X ;Y |w ]| .

= Log-cardinality of most refined variable common to (X ,W ) and(Y ,W ) but unrelated to W .I.e. if two agents each observe X ,Y separately but also share W ,then I⇤[X ;Y |W ] captures the most refined variable that is ‘new’with respect to W and on which they can both agree.

Zero Error Coding with Feedback

Decoder Channel kY

Coder kX

M Message

Unit Delay 1−kY

A zero-error code with feedback is defined byI a block length n+1 2 N;I a message cardinality µ � 1;I and a sequence g0:n of encoder mappings s.t. for any message

M ? V taking values m1, . . . ,mµ ,F Xk = gk (i ,Y0:k�1) if M = mi ,F |JM|y0:nK|= 1,8y0:n 2 JY0:nK.

Last condition equivalent to existence of a decoder that canreconstruct M from Y0:n without error.

C0f and Directed Nonstochastic Information

Zero-error feedback capacity C0f defined operationally, as thehighest feedback coding rate that yields zero errors:

C0f := supn,µ2N,g1:n

log2 µn+1

= limn!•

supµ2N,g1:n

log2 µn+1

. (16)

Growth rate of maximum cardinality of sets of feedback codingfunctions that can be unambiguously determined from channeloutputs.Define directed nonstochastic information

I⇤[X0:n ! Y0:n] :=n

I⇤[X0:k ;Yk |Y0:k�1]

C0f in terms of Directed Nonstochastic Information

Theorem (N. cdc12)For a stationary memoryless uncertain channel,

C0f = supn�0,(X ,Y )2G

I⇤[X0:n ! Y0:n]

Parallels characterisation in [Kim TIT08, Tatikonda-Mitter TIT09] for ordinaryfeedback capacity Cf of stochastic channels:

Cf = supn�0,pXk |X0:k�1 ,Y0:k�1

I[X0:n ! Y0:n]

where Marko-Massey directed informationI[X0:n ! Y0:n] := Ân

k=0 I[X0:k ;Yk |Y0:k�1],and conditional information I[X ;Y |Z ] := H[X |Z ]�H[X |Y ,Z ].

LTI State Estimation over Noisy Channels

kU kY Decoder/ Estimator

kX̂Channel

kQQuantiser/ Coder

VBUAXXWGXY

��

kk WV , Noise

LTI State Estimation - Disturbance-Free

Plant: LTI, noiseless, zero input:

Xk+1 = AXk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Channel: Stationary and memoryless, Qk = t(Sk ,Zk ), where Z =channel noise.

Estimator: Q0:k 7! X̂k+1.Objective: Uniform r-exponential convergence from an l-ball. I.e.

given r, l > 0, construct a coder-estimator s.t. for any uvX0 with JX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXk � X̂kk= 0.

LTI State Estimation - Disturbance-Free

Xk+1 = AXk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Estimator: Q0:k 7! X̂k+1.Objective: Uniform r-exponential convergence from an l-ball. I.e.

given r, l > 0, construct a coder-estimator s.t. for any uvX0 with JX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXk � X̂kk= 0.

Disturbance-Free State Estimation and C0

Assumptions:DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable, where Ar := A restricted to

eigenspace governed by eigenvalues of magnitude � r.DF3: X0 ? Z

Theorem (N. TAC13)If uniform r-exponential convergence is achieved from some l-ball,then

C0 � Â|li |�r

✓|li |r

◆. (17)

Conversely, if (26) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniform r-exponential convergence from Bl(0) can beconstructed.

Disturbance-Free State Estimation and C0

Assumptions:DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable, where Ar := A restricted to

eigenspace governed by eigenvalues of magnitude � r.DF3: X0 ? Z

Theorem (N. TAC13)If uniform r-exponential convergence is achieved from some l-ball,then

C0 � Â|li |�r

✓|li |r

◆. (17)

Conversely, if (26) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniform r-exponential convergence from Bl(0) can beconstructed.

Necessity Argument - Scalar Case

Pick arbitrarily large t 2 N and small e 2⇣

0,1� r|l |

Divide [�l , l] into

$��(1� e)l

��t%� 1

equal intervals of length 2l/k.Inside each interval construct a centred subinterval I(s) of shorterlength l/k. Define the subinterval family

H := {I(s) : s = 1, . . . ,k}, (18)

noting that subintervals 2 H are separated by a gap � l/k.Set the initial state range JX0K =

SH2H H ⇢ [�l , l].

Necessity Argument - Scalar Case (cont.)

Let Ek := Xk � X̂k . By hypothesis, 9f > 0 s.t.

frk � supJ|Ek |K� 0.5diamJEkK (19)� 0.5diamJEk |q0:k�1K (20)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

= 0.5diamJl kX0|q0:k�1K (21)= 0.5|l |k diamJX0|q0:k�1K (22)

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

= 0.5diamr

l kX0 �hk (q0:k�1) |q0:k�1

Next show that for large t , no two sets in H (18) can beJX0|Q0:t�1K-overlap-connected:Suppose in contradiction that 9H 2 H that isJX0|Q0:t�1K-overlap-connected with another set in H .)9 JX0|q0:t�1K containing both a point u 2 H and a point v insome H0 2 H \{H})

|u�v | diamJX0|q0:t�1K(22) 2fr t

|l |t . (23)

However, any two sets 2 H are separated by a distance of atleast l/k. So

|u�v | � lk=

lj((1� e)|l |/r)t

� l((1� e)|l |/r)t =

|(1� e)l |t .

The RHS of this would exceed the RHS of (23) when t issufficiently large that

� 11�e�t> 2f/l , yielding a contradiction.

So for large enough t , no two sets of H areJX0|Q0:t�1K-overlap-connected.So

2I⇤[X0;Q0:t�1] ⌘ |JX0|Q0:t�1K⇤| � |H |

$��(1� e)l

��t%

� 0.5��(1� e)l

��t, (24)

since bxc> x/2, for every x � 1.

So for large enough t , no two sets of H areJX0|Q0:t�1K-overlap-connected.So

2I⇤[X0;Q0:t�1] ⌘ |JX0|Q0:t�1K⇤| � |H |

$��(1� e)l

��t%

� 0.5��(1� e)l

��t, (24)

since bxc> x/2, for every x � 1.

But X0 �S0:t�1 �Q0:t�1 is a Markov uncertainty chain, so

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

Substitute into the LHS of (24), take logarithms and divide by t toget

C0 � log2(1� e)+ log2

��lr

��1t.

Letting t ! • yields

C0 � log2(1� e)+ log2

��lr

�� .

As e can be made arbitrarily small, we are done. ⇤

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

C0 � log2(1� e)+ log2

��lr

��1t.

C0 � log2(1� e)+ log2

��lr

�� .

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

C0 � log2(1� e)+ log2

��lr

��1t.

C0 � log2(1� e)+ log2

��lr

�� .

I⇤[X0;Q0:t�1] I⇤[S0:t�1;Q0:t�1]

C0 � log2(1� e)+ log2

��lr

��1t.

C0 � log2(1� e)+ log2

��lr

�� .

State Estimation with Plant Disturbances

Plant: LTIXk+1 = AXk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Estimator: Q0:k 7! X̂k+1.Objective: Uniformly bounded estimation errors beginning from an

l-ball. I.e. given l > 0, construct a coder-estimator s.t. forany initial state X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXk � X̂kk< •.

State Estimation with Plant Disturbances

Plant: LTIXk+1 = AXk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Estimator: Q0:k 7! X̂k+1.Objective: Uniformly bounded estimation errors beginning from an

l-ball. I.e. given l > 0, construct a coder-estimator s.t. forany initial state X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXk � X̂kk< •.

Estimation with Disturbances and C0Assumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable, where A1 := A restricted to

eigenspace governed by eigenvalues of magnitude � 1.D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. TAC13)If uniformly bounded estimation errors are achieved from some l-ball,then

C0 � Â|li |�1

log2 |li |. (25)

Conversely, if (25) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniformly bounded estimation errors from Bl(0) can beconstructed.

Estimation with Disturbances and C0Assumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable, where A1 := A restricted to

eigenspace governed by eigenvalues of magnitude � 1.D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. TAC13)If uniformly bounded estimation errors are achieved from some l-ball,then

C0 � Â|li |�1

log2 |li |. (25)

Conversely, if (25) holds strictly, then for any l > 0, a coder-estimatorthat achieves uniformly bounded estimation errors from Bl(0) can beconstructed.

Control over Noisy Channels

VBUAXXWGXY

��

, kU kY Decoder/ Controller

Channel kQ Quantiser/ Coder

kk WV , Noise

LTI Control - Disturbance-Free

Xk+1 = AXk +BUk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Controller: Q0:k 7! Uk .Objective: Uniform r-exponential stability on an l-ball. I.e. given

r, l > 0, construct a coder-controller s.t. for any uv X0 withJX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXkk= 0.

LTI Control - Disturbance-Free

Xk+1 = AXk +BUk , Yk = GXk , X0 a uv.

Coder: Y0:k 7! Sk

Controller: Q0:k 7! Uk .Objective: Uniform r-exponential stability on an l-ball. I.e. given

r, l > 0, construct a coder-controller s.t. for any uv X0 withJX0K ✓ Bl(0),

limk!•

supw2⌦

r�kkXkk= 0.

Disturbance-Free Control and C0fAssumptions:

DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable and (Ar ,B) is controllable, where

Ar := A restricted to eigenspace governed by eigenvaluesof magnitude � r.

DF3: X0 ? Z

PropositionIf uniform r-exponential stability is achieved on some l-ball, then

C0f � Â|li |�r

✓|li |r

◆. (26)

Conversely, if (26) holds strictly, then for any l > 0, a coder-controllerthat achieves uniform r-exponential stability on Bl(0) can beconstructed.

Disturbance-Free Control and C0fAssumptions:

DF1: A has one or more eigenvalues with magnitude > r.DF2: (G,Ar) is observable and (Ar ,B) is controllable, where

Ar := A restricted to eigenspace governed by eigenvaluesof magnitude � r.

DF3: X0 ? Z

PropositionIf uniform r-exponential stability is achieved on some l-ball, then

C0f � Â|li |�r

✓|li |r

◆. (26)

Conversely, if (26) holds strictly, then for any l > 0, a coder-controllerthat achieves uniform r-exponential stability on Bl(0) can beconstructed.

Control with Plant Disturbances

Plant: LTIXk+1 = AXk +BUk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Controller: Q0:k 7! Uk .Objective: Uniformly bounded states beginning from an l-ball. I.e.

given l > 0, construct a coder-controller s.t. for any initialstate X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXkk< •.

Control with Plant Disturbances

Plant: LTIXk+1 = AXk +BUk +Vk , Yk = GXk +Wk ,

Coder: Y0:k 7! Sk

Controller: Q0:k 7! Uk .Objective: Uniformly bounded states beginning from an l-ball. I.e.

given l > 0, construct a coder-controller s.t. for any initialstate X0 with JX0K ✓ Bl(0),

supk2Z�0,w2⌦

kXkk< •.

Control with Disturbances and C0fAssumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable and (A1,B) is controllable, where

A1 := A restricted to eigenspace governed by eigenvaluesof magnitude � 1.

D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. cdc12)If uniformly bounded estimation errors are achieved from some l-ball,then

C0f � Â|li |�1

log2 |li |. (27)

Conversely, if (27) holds strictly, then for any l > 0, a coder-controllerthat achieves uniformly bounded states from Bl(0) can be constructed.

Control with Disturbances and C0fAssumptions:

D1: A has one or more eigenvalues with magnitude � 1.D2: (G,A1) is observable and (A1,B) is controllable, where

A1 := A restricted to eigenspace governed by eigenvaluesof magnitude � 1.

D3: JVkK and JWkK are uniformly bounded over k .D4: X0,V ,W and Z are mutually unrelated.D5: The zero-noise sequence pair (v ,w) = (0,0) is valid, i.e.

(0,0) 2 JV ,W K.

Theorem (N. cdc12)If uniformly bounded estimation errors are achieved from some l-ball,then

C0f � Â|li |�1

log2 |li |. (27)

Conversely, if (27) holds strictly, then for any l > 0, a coder-controllerthat achieves uniformly bounded states from Bl(0) can be constructed.

What Does That Add to [Matveev-Savkin IJC07]?

Matveev and Savkin considered similar estimation and controlproblems. Mixed formulation - plant noise was a nonstochastic,bounded disturbance, while initial state and channel werestochastic and independent.Aim was to achieve a.s. boundedness for any plant noise.Proof of necessity there used the randomness of the initial stateand channel to apply a law of large numbers.No information theory.Here, necessity is proved using data processing on Markovuncertainty chains, and analysing I⇤ and directed I⇤.No statistical assumptions.

Summary

This talk described:A nonstochastic theory of uncertainty and information, withoutassuming a probability spaceIntrinsic characterisations of the operational zero-error capacityand zero-error feedback capacity for stationary memorylesschannels.An information-theoretic basis for analysing worst-case networkedestimation/control with bounded noise.

Outlook

Theory is still far from mature!Tractable algorithms to estimate C0 (perhaps Monte Carlo)?Disturbances with bounded energy or time-averages?C0f for channels with memory?Zero-error feedback capacity with imperfect channel feedback?Multi-agent systems...?

References

C.E. Shannon, “A mathematical theory of communication”, Bell Syst. Tech. Jour., vol. 27, pp. 379–423, 623–56, 1948.

G.N. Nair, “A nonstochastic information theory for communication and state estimation”, IEEE Trans. Automatic Control,USA, vol. 58, no. 6, pp. 1497–510, 2013.

G.N. Nair, “A nonstochastic information theory for feedback”, Proc. 51st IEEE Conf. Decision and Control, Maui, USA,pp. 1343–8, 2012

J. Baillieul, “Feedback designs in information-based control”, Stochastic Theory and Control. Proceedings of a Workshopheld in Lawrence, Kansas, pp. 35–57, Springer, 2002

S. Tatikonda and S. Mitter, “Control under communication constraints”, IEEE Trans. Automatic Control, USA, vol. 49, no.7, pp. 1056–68, July 2004.

G. N. Nair and R. J. Evans, “Stabilizability of stochastic linear systems with finite feedback data rates”, SIAM Jour.Control and Optimization, vol. 43, no. 2, pp. 413–36, July 2004.

A. S. Matveev and A. V. Savkin, “An analogue of Shannon information theory for detection and stabilization via noisydiscrete communication channels”, SIAM Jour. Control and Optimization, vol. 46, no. 4, pp. 1323–67, 2007.

J. H. Braslavsky, R. H. Middleton and J. S. Freudenberg, “Feedback stabilization over signal-to-noise ratio constrainedchannels”, IEEE Trans. Automatic Control, USA, vol. 52, no. 8, pp. 1391–403, 2007

A. Sahai and S. Mitter, “The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisycommunication link part 1: scalar systems”, IEEE Trans. Info. Theory, pp. 3369–95, vol. 52, no. 8, 2006.

A.S. Matveev and A.V. Savkin, “Shannon zero error capacity in the problems of state estimation and stabilization vianoisy communication channels”, Int. Jour. Control, vol. 80, pp. 241–55, 2007.

References (cont.)

C.E. Shannon, “The lattice theory of information”, Trans. IRE Professional Group on Info. Theory, vol. 1, iss. 1, Feb.1953., pp. 105–8.

C.E. Shannon, “The zero-error capacity of a noisy channel”, Proc. IRE Trans. Info. Theory, vol. 2, pp. 8–19, 1956.

P. Gacs and J. Korner, “Common information is far less than mutual information”, Problems of Control and InformationTheory, vol. 2, no. 2, pp. 149–62, 1973

S. Wolf and J. Wullschleger, “Zero-error information and applications in cryptography”, in Proc. Info. Theory Workshop,San Antonio, USA, 2004, pp. 1–6.

J. Massey, “Causality, feedback and directed information”, in Proc. Int. Symp. Info. Theory App., Nov. 1990, pp. 1–6

Y.H. Kim, “A coding theorem for a class of stationary channels with feedback”, IEEE Trans. Info. Theory, 1488–99, 2008.

S. Tatikonda and S. Mitter, “The capacity of channels with feedback”, IEEE Trans. Info. Theory, pp. 323–49, 2009.

G. Klir, Uncertainty and Information: Foundations of Generalized Information Theory, Wiley, 2006, ch. 2.

H. Shingin and Y. Ohta, “Disturbance rejection with information constraints: performance limitations of a scalar system forbounded and Gaussian disturbances”, Automatica, vol. 48, no. 6, pp. 1111–6, 2012.

W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints I”, IEEE Trans. AutomaticControl, USA, vol. 42, pp. 1294–9, 1997.

W. S. Wong and R. W. Brockett, “Systems with finite communication bandwidth constraints II: stabilization with limitedinformation feedback”, IEEE Trans. Automatic Control, USA, vol. 44, pp. 1049–53, 1999.

Nonstochastic Information Theory for Feedback Controldisc-cps15.imtlucca.it/pdf/Nair.pdfk k k k k k...

Documents

Transcript of Nonstochastic Information Theory for Feedback Controldisc-cps15.imtlucca.it/pdf/Nair.pdfk k k k k k...