Markov jump linear systems Optimal...

31
Markov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced Studies Lucca February 5, 2016

Transcript of Markov jump linear systems Optimal...

Page 1: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Markov jump linear systemsOptimal Control

Pantelis Sopasakis

IMT Institute for Advanced Studies Lucca

February 5, 2016

Page 2: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Abbreviations

1. MJLS: Markov Jump Linear Systems

2. FHOC: Finite Horizon Optimal Control

3. IHOC: Infinite Horizon Optimal Control

4. CARE: Coupled Algebraic Ricatti Equations

1 / 26

Page 3: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Outline

1. LQR (deterministic case) – A quick revision

2. FHOC for MJLS

3. IHOC for MJLS (CARE)

2 / 26

Page 4: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

I. Dynamic programming

3 / 26

Page 5: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Finite horizon optimal control

We have a (deterministic) LTI system

x(k + 1) = Ax(k) +Bu(k),

with x(0) = x0. For a given sequence of input values of length N , that is,πN = (u(0), u(1), . . . , u(N − 1)) we define the cost function

JN (πN ;x0) =

N−1∑k=0

`(x(k), u(k)) + `N (xN ).

Assume`(x, u) = x′Qx+ u′Ru, and `N (x) = x′PNx.

for some Q ∈ Sn+, Pf ∈ Sn++, R ∈ Sm++.

4 / 26

Page 6: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Finite horizon optimal control

We need to determine a finite sequence πN to minimise JN (πN ):

J?N (x0) = minπN

JN (πN ;x0)

subject to the system dynamics and x(0) = x0. DP recursion1:

VN (x(N)) = x(N)′PNx(N),

Vk(x(k)) = minuk

`(x(k), u(k)) + Vk+1(x(k + 1)),

for k = N − 1, . . . , 0.

1See for instance: F. Borelli, Constrained Optimal Control of Linear and HybridSystems, Springer, 2003.

5 / 26

Page 7: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26

Page 8: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26

Page 9: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26

Page 10: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26

Page 11: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26

Page 12: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Finite horizon optimal control

Let π?(x0) be the respective minimiser with

π?(x0) = {u?(1), u?(2), . . . , u?(N − 1)}.

Using DP we derive

Vk(x) = x′Pkx,

u?(k) = F (Pk+1)x(k)

where Pk is determined as follows:

Pk = A′Pk+1A+Q+A′Pk+1F (Pk+1)

andF (P ) = −B(B′PB +R)−1B′PA.

7 / 26

Page 13: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Infinite horizon optimal control

What happens as N →∞? Let us define

J∞(π;x0) =

∞∑k=0

`(x(k), u(k)),

where π is a sequence of inputs {u(k)}k∈N. For the series to converge it isof course required that

‖x(k)‖2, ‖u(k)‖2 → 0, as k →∞.

8 / 26

Page 14: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Infinite horizon optimal control

We can show that – under certain conditions2 – the IHOC problem issolvable and

J?∞(x) = x′P∞x,

u?(k) = F (P∞)x(k),

where P∞ is a fixed point of the DP recursion of the FHOC problem(Algebraic Ricatti Equation), that is

P∞ = A′P∞A+Q−A′P∞B(B′P∞B +R)−1B′P∞A.

2Provided that (A,B) is stabilisable and (Q1/2, A) is detectable. Then the matrixA+ BF (P∞) is stable. Proof. See D.P. Bertsekas, Dynamic programming andoptimal control, Vol. 1, 2005, Prop. 4.4.1.

9 / 26

Page 15: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

End of first section

I Revision of FHOC and DP

I We solved the LQR problem

10 / 26

Page 16: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

II. FHOC for MJLS

11 / 26

Page 17: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

FHOC for MJLS

Consider a MJLS

x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k) +Mθ(k) v(k)︸︷︷︸noise

,

with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantitythat will be penalised. We define the following cost functional

J(θ0, x0, πN ) :=

N−1∑k=0

E[‖z(k)‖2

]+ E

[x(T )′Vθ(N)x(T )

].

Where πN is a policy π = (u(0), . . . , u(N − 1)) with

u(k) = µk(x(k), θ(k)).

12 / 26

Page 18: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

FHOC assumptions

Let Gk be the σ-algebra generated by {x(t), θ(t); t = 0, . . . , N − 1}.

Assumptions on v:

1. v(k) are random variables with E[v(k)v(k)′1{θ(k)=i}

]= Ξi(k)

2. For every f , g, f(v(k)) and g(θ(k)) are independent w.r.t Gk

3. E[v(0)x(0)′1{θ(0)=i}

]= 0

Assumptions on z(k):

1. Ci(k)′Di(k) = 0 – no penalties of the form x(k)′Sθ(k)u(k)

2. Di(k)′Di(k) > 0

13 / 26

Page 19: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Control laws and policies for MJLS

A measurable functionµ : IRn ×N → IRm

is called a control law.

A (finite of infinite) sequence of control laws

π = {µ0, µ1, . . .},

where µk is Gk-measurable, called a control policy.

14 / 26

Page 20: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

FHOC – Dynamic programming recursion

To perform DP we introduce the cost functional

Jκ(θ(κ), x(κ), uκ) :=

N−1∑k=κ

E[‖z(k)‖2 | Gκ

]+ E

[x(T )′Vθ(N)x(T ) | Gκ

],

for κ ∈ {0, . . . , N − 1} where uκ = (u(κ), . . . , u(N − 1)) so that each u(k)is Gk-measurable. The optimal value of Jκ(θ(κ), x(κ), uκ) is then given by

J?κ(i, x) = x′Xi(κ)x+ α(κ),

where Xi is given by a Ricatti-like equation.

15 / 26

Page 21: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

FHOC – Dynamic programming recursion

We haveJ?κ(i, x) = x′Xi(κ)x+ α(κ),

where

Xi(N) = Vi,

Xi(k) = A′iE(X(k + 1))Ai −AiE(X(k + 1))BiFi(X(k + 1)) + C ′iCi,

where Ei(X) =∑N

j=1 pijXj , Ri(X) := D′iDi +B′iE(X)Bi and

Fi(X) := −R−1i B′iE(X)Ai.

The respective optimisers are given by

u?(k) = Fθ(k)(X(k + 1))x(k).

16 / 26

Page 22: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

End of second section

I Formulation of FHOC for MJLS considering also an additive noiseterm

I Control policies and control laws

I Solution of FHOC: piecewise linear control laws

u?(k) = κ(x(k), θ(k)) = Fθ(k)x(k).

17 / 26

Page 23: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

III. IHOC for MJLS and MSS

18 / 26

Page 24: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

IHOC for MJLS

Consider a MJLS without additive noise

x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k),

with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantity thatwill be penalised. We are now looking for sequences π = {u(k)}k∈N in

U =

∣∣∣∣ u(k) is Gk-measurable,∀k ∈ Nlimk→∞ E

[‖x(k)‖2

]= 0.

}

19 / 26

Page 25: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

IHOC for MJLS

With π ∈ U the following is a well-defined infinite horizon cost function

J(θ0, x0, π) :=

∞∑k=0

E[‖z(k)‖2

],

and the IHOC problem amounts to determining

J?(θ0, x0) := infπ∈U

J(θ0, x0, π),

and we define π? to be the respective optimiser with elements

u?(k) = ψk(θ(k), x(k)).

20 / 26

Page 26: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Objectives

1. Under what conditions does the IHOC problem have a solution?

2. How can this solution be determined?

3. Can we derive a MS-stabilising controller by solving the IHOCP?

21 / 26

Page 27: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Control CARE

Assume that there is X ∈ Hn+ satisfying the control CARE :

Xi=A′iEi(X)Ai−AiEi(X)Bi(D

′iDi+B

′iEi(X)Bi)

−1B′iEi(X)Ai+C′iCi

and letFi(X) := −(D′iDi+B

′iEi(X)Bi)

−1B′iEi(X)Ai.

The IHOC problem solution is given by

u?(k) = Fθ(k)(X)x(k)

and the value function is

J?(θ0, x0) = E[x′0Xθ0x0

].

22 / 26

Page 28: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Control CARE ⇒ MSS

The control CARE, when solvable, yields a MS-stabilising control law, i.e.,the closed-loop system

x(k + 1) = (Aθ(k) +Bθ(k)Fθ(k)(X))x(k),

is mean square stable.

23 / 26

Page 29: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Solvability conditions

The following conditions entail the solvability of the control CARE:

1. (A,B) – with A ∈ Hn and B ∈ Hn,m – is stabilisable,

2. (C,A) – with C ∈ Hn,nz is detectable.

Proof. Book of Costa et al., 2005, Corollary A.16.

24 / 26

Page 30: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

End of third section

I We formulated the infinite horizon optimal control problem

I The solution of IHOC produces a MS-stabilising control law

I IHOC is solved by a CARE which can be formulated as an LMI

I Solvability conditions: (A,B) is stabilisable, (C,A) is detectable

25 / 26

Page 31: Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

References

1. For an introduction to DP: D. P. Bertsekas, Dynamic Programming and OptimalControl. Athena Scientific, 2nd ed., 2000.

2. Chapter 4 of: O.L.V. Costa, M.D. Fragoso and R.P. Marques, Discrete-timeMarkov Jump Linear Systems, Springer 2005.

3. Chapter 6 of: M.H.A. Davis and R.B. Vinter, Stochastic modelling and control,Chapman and Hall, New York 1985.

4. M.D. Fragoso, Discrete-time jump LQG problem, Int. J. Systems Sci., 20(12), pp.2539–2545, 1989.

26 / 26