Markov jump linear systems Optimal...

Markov jump linear systemsOptimal Control

Pantelis Sopasakis

IMT Institute for Advanced Studies Lucca

February 5, 2016

Abbreviations

1. MJLS: Markov Jump Linear Systems

2. FHOC: Finite Horizon Optimal Control

3. IHOC: Infinite Horizon Optimal Control

4. CARE: Coupled Algebraic Ricatti Equations

1 / 26

Outline

1. LQR (deterministic case) – A quick revision

2. FHOC for MJLS

3. IHOC for MJLS (CARE)

2 / 26

I. Dynamic programming

3 / 26

Finite horizon optimal control

We have a (deterministic) LTI system

x(k + 1) = Ax(k) +Bu(k),

with x(0) = x0. For a given sequence of input values of length N , that is,πN = (u(0), u(1), . . . , u(N − 1)) we define the cost function

JN (πN ;x0) =

N−1∑k=0

`(x(k), u(k)) + `N (xN ).

Assume`(x, u) = x′Qx+ u′Ru, and `N (x) = x′PNx.

for some Q ∈ Sn+, Pf ∈ Sn++, R ∈ Sm++.

4 / 26


We need to determine a finite sequence πN to minimise JN (πN ):

J?N (x0) = minπN

JN (πN ;x0)

subject to the system dynamics and x(0) = x0. DP recursion1:

VN (x(N)) = x(N)′PNx(N),

Vk(x(k)) = minuk

`(x(k), u(k)) + Vk+1(x(k + 1)),

for k = N − 1, . . . , 0.

1See for instance: F. Borelli, Constrained Optimal Control of Linear and HybridSystems, Springer, 2003.

5 / 26

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26


Let π?(x0) be the respective minimiser with

π?(x0) = {u?(1), u?(2), . . . , u?(N − 1)}.

Using DP we derive

Vk(x) = x′Pkx,

u?(k) = F (Pk+1)x(k)

where Pk is determined as follows:

Pk = A′Pk+1A+Q+A′Pk+1F (Pk+1)

andF (P ) = −B(B′PB +R)−1B′PA.

7 / 26

Infinite horizon optimal control

What happens as N →∞? Let us define

J∞(π;x0) =

∞∑k=0

`(x(k), u(k)),

where π is a sequence of inputs {u(k)}k∈N. For the series to converge it isof course required that

‖x(k)‖2, ‖u(k)‖2 → 0, as k →∞.

8 / 26

Infinite horizon optimal control

We can show that – under certain conditions2 – the IHOC problem issolvable and

J?∞(x) = x′P∞x,

u?(k) = F (P∞)x(k),

where P∞ is a fixed point of the DP recursion of the FHOC problem(Algebraic Ricatti Equation), that is

P∞ = A′P∞A+Q−A′P∞B(B′P∞B +R)−1B′P∞A.

2Provided that (A,B) is stabilisable and (Q1/2, A) is detectable. Then the matrixA+ BF (P∞) is stable. Proof. See D.P. Bertsekas, Dynamic programming andoptimal control, Vol. 1, 2005, Prop. 4.4.1.

9 / 26

End of first section

I Revision of FHOC and DP

I We solved the LQR problem

10 / 26

II. FHOC for MJLS

11 / 26

FHOC for MJLS

Consider a MJLS

x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k) +Mθ(k) v(k)︸︷︷︸noise

,

with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantitythat will be penalised. We define the following cost functional

J(θ0, x0, πN ) :=

N−1∑k=0

E[‖z(k)‖2

]+ E

[x(T )′Vθ(N)x(T )

].

Where πN is a policy π = (u(0), . . . , u(N − 1)) with

u(k) = µk(x(k), θ(k)).

12 / 26

FHOC assumptions

Let Gk be the σ-algebra generated by {x(t), θ(t); t = 0, . . . , N − 1}.

Assumptions on v:

1. v(k) are random variables with E[v(k)v(k)′1{θ(k)=i}

]= Ξi(k)

2. For every f , g, f(v(k)) and g(θ(k)) are independent w.r.t Gk

3. E[v(0)x(0)′1{θ(0)=i}

]= 0

Assumptions on z(k):

1. Ci(k)′Di(k) = 0 – no penalties of the form x(k)′Sθ(k)u(k)

2. Di(k)′Di(k) > 0

13 / 26

Control laws and policies for MJLS

A measurable functionµ : IRn ×N → IRm

is called a control law.

A (finite of infinite) sequence of control laws

π = {µ0, µ1, . . .},

where µk is Gk-measurable, called a control policy.

14 / 26

FHOC – Dynamic programming recursion

To perform DP we introduce the cost functional

Jκ(θ(κ), x(κ), uκ) :=

N−1∑k=κ

E[‖z(k)‖2 | Gκ

]+ E

[x(T )′Vθ(N)x(T ) | Gκ

],

for κ ∈ {0, . . . , N − 1} where uκ = (u(κ), . . . , u(N − 1)) so that each u(k)is Gk-measurable. The optimal value of Jκ(θ(κ), x(κ), uκ) is then given by

J?κ(i, x) = x′Xi(κ)x+ α(κ),

where Xi is given by a Ricatti-like equation.

15 / 26

FHOC – Dynamic programming recursion

We haveJ?κ(i, x) = x′Xi(κ)x+ α(κ),

where

Xi(N) = Vi,

Xi(k) = A′iE(X(k + 1))Ai −AiE(X(k + 1))BiFi(X(k + 1)) + C ′iCi,

where Ei(X) =∑N

j=1 pijXj , Ri(X) := D′iDi +B′iE(X)Bi and

Fi(X) := −R−1i B′iE(X)Ai.

The respective optimisers are given by

u?(k) = Fθ(k)(X(k + 1))x(k).

16 / 26

End of second section

I Formulation of FHOC for MJLS considering also an additive noiseterm

I Control policies and control laws

I Solution of FHOC: piecewise linear control laws

u?(k) = κ(x(k), θ(k)) = Fθ(k)x(k).

17 / 26

III. IHOC for MJLS and MSS

18 / 26

IHOC for MJLS

Consider a MJLS without additive noise

x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k),

with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantity thatwill be penalised. We are now looking for sequences π = {u(k)}k∈N in

U =

{π

∣∣∣∣ u(k) is Gk-measurable,∀k ∈ Nlimk→∞ E

[‖x(k)‖2

]= 0.

}

19 / 26

IHOC for MJLS

With π ∈ U the following is a well-defined infinite horizon cost function

J(θ0, x0, π) :=

∞∑k=0

E[‖z(k)‖2

],

and the IHOC problem amounts to determining

J?(θ0, x0) := infπ∈U

J(θ0, x0, π),

and we define π? to be the respective optimiser with elements

u?(k) = ψk(θ(k), x(k)).

20 / 26

Objectives

1. Under what conditions does the IHOC problem have a solution?

2. How can this solution be determined?

3. Can we derive a MS-stabilising controller by solving the IHOCP?

21 / 26

Control CARE

Assume that there is X ∈ Hn+ satisfying the control CARE :

Xi=A′iEi(X)Ai−AiEi(X)Bi(D

′iDi+B

′iEi(X)Bi)

−1B′iEi(X)Ai+C′iCi

and letFi(X) := −(D′iDi+B

′iEi(X)Bi)

−1B′iEi(X)Ai.

The IHOC problem solution is given by

u?(k) = Fθ(k)(X)x(k)

and the value function is

J?(θ0, x0) = E[x′0Xθ0x0

].

22 / 26

Control CARE ⇒ MSS

The control CARE, when solvable, yields a MS-stabilising control law, i.e.,the closed-loop system

x(k + 1) = (Aθ(k) +Bθ(k)Fθ(k)(X))x(k),

is mean square stable.

23 / 26

Solvability conditions

The following conditions entail the solvability of the control CARE:

1. (A,B) – with A ∈ Hn and B ∈ Hn,m – is stabilisable,

2. (C,A) – with C ∈ Hn,nz is detectable.

Proof. Book of Costa et al., 2005, Corollary A.16.

24 / 26

End of third section

I We formulated the infinite horizon optimal control problem

I The solution of IHOC produces a MS-stabilising control law

I IHOC is solved by a CARE which can be formulated as an LMI

I Solvability conditions: (A,B) is stabilisable, (C,A) is detectable

25 / 26

References

1. For an introduction to DP: D. P. Bertsekas, Dynamic Programming and OptimalControl. Athena Scientific, 2nd ed., 2000.

2. Chapter 4 of: O.L.V. Costa, M.D. Fragoso and R.P. Marques, Discrete-timeMarkov Jump Linear Systems, Springer 2005.

3. Chapter 6 of: M.H.A. Davis and R.B. Vinter, Stochastic modelling and control,Chapman and Hall, New York 1985.

4. M.D. Fragoso, Discrete-time jump LQG problem, Int. J. Systems Sci., 20(12), pp.2539–2545, 1989.

26 / 26

Markov jump linear systems Optimal...

Documents

Transcript of Markov jump linear systems Optimal...