Markov jump linear systems Optimal...
Transcript of Markov jump linear systems Optimal...
Markov jump linear systemsOptimal Control
Pantelis Sopasakis
IMT Institute for Advanced Studies Lucca
February 5, 2016
Abbreviations
1. MJLS: Markov Jump Linear Systems
2. FHOC: Finite Horizon Optimal Control
3. IHOC: Infinite Horizon Optimal Control
4. CARE: Coupled Algebraic Ricatti Equations
1 / 26
Outline
1. LQR (deterministic case) – A quick revision
2. FHOC for MJLS
3. IHOC for MJLS (CARE)
2 / 26
I. Dynamic programming
3 / 26
Finite horizon optimal control
We have a (deterministic) LTI system
x(k + 1) = Ax(k) +Bu(k),
with x(0) = x0. For a given sequence of input values of length N , that is,πN = (u(0), u(1), . . . , u(N − 1)) we define the cost function
JN (πN ;x0) =
N−1∑k=0
`(x(k), u(k)) + `N (xN ).
Assume`(x, u) = x′Qx+ u′Ru, and `N (x) = x′PNx.
for some Q ∈ Sn+, Pf ∈ Sn++, R ∈ Sm++.
4 / 26
Finite horizon optimal control
We need to determine a finite sequence πN to minimise JN (πN ):
J?N (x0) = minπN
JN (πN ;x0)
subject to the system dynamics and x(0) = x0. DP recursion1:
VN (x(N)) = x(N)′PNx(N),
Vk(x(k)) = minuk
`(x(k), u(k)) + Vk+1(x(k + 1)),
for k = N − 1, . . . , 0.
1See for instance: F. Borelli, Constrained Optimal Control of Linear and HybridSystems, Springer, 2003.
5 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Finite horizon optimal control
Let π?(x0) be the respective minimiser with
π?(x0) = {u?(1), u?(2), . . . , u?(N − 1)}.
Using DP we derive
Vk(x) = x′Pkx,
u?(k) = F (Pk+1)x(k)
where Pk is determined as follows:
Pk = A′Pk+1A+Q+A′Pk+1F (Pk+1)
andF (P ) = −B(B′PB +R)−1B′PA.
7 / 26
Infinite horizon optimal control
What happens as N →∞? Let us define
J∞(π;x0) =
∞∑k=0
`(x(k), u(k)),
where π is a sequence of inputs {u(k)}k∈N. For the series to converge it isof course required that
‖x(k)‖2, ‖u(k)‖2 → 0, as k →∞.
8 / 26
Infinite horizon optimal control
We can show that – under certain conditions2 – the IHOC problem issolvable and
J?∞(x) = x′P∞x,
u?(k) = F (P∞)x(k),
where P∞ is a fixed point of the DP recursion of the FHOC problem(Algebraic Ricatti Equation), that is
P∞ = A′P∞A+Q−A′P∞B(B′P∞B +R)−1B′P∞A.
2Provided that (A,B) is stabilisable and (Q1/2, A) is detectable. Then the matrixA+ BF (P∞) is stable. Proof. See D.P. Bertsekas, Dynamic programming andoptimal control, Vol. 1, 2005, Prop. 4.4.1.
9 / 26
End of first section
I Revision of FHOC and DP
I We solved the LQR problem
10 / 26
II. FHOC for MJLS
11 / 26
FHOC for MJLS
Consider a MJLS
x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k) +Mθ(k) v(k)︸︷︷︸noise
,
with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantitythat will be penalised. We define the following cost functional
J(θ0, x0, πN ) :=
N−1∑k=0
E[‖z(k)‖2
]+ E
[x(T )′Vθ(N)x(T )
].
Where πN is a policy π = (u(0), . . . , u(N − 1)) with
u(k) = µk(x(k), θ(k)).
12 / 26
FHOC assumptions
Let Gk be the σ-algebra generated by {x(t), θ(t); t = 0, . . . , N − 1}.
Assumptions on v:
1. v(k) are random variables with E[v(k)v(k)′1{θ(k)=i}
]= Ξi(k)
2. For every f , g, f(v(k)) and g(θ(k)) are independent w.r.t Gk
3. E[v(0)x(0)′1{θ(0)=i}
]= 0
Assumptions on z(k):
1. Ci(k)′Di(k) = 0 – no penalties of the form x(k)′Sθ(k)u(k)
2. Di(k)′Di(k) > 0
13 / 26
Control laws and policies for MJLS
A measurable functionµ : IRn ×N → IRm
is called a control law.
A (finite of infinite) sequence of control laws
π = {µ0, µ1, . . .},
where µk is Gk-measurable, called a control policy.
14 / 26
FHOC – Dynamic programming recursion
To perform DP we introduce the cost functional
Jκ(θ(κ), x(κ), uκ) :=
N−1∑k=κ
E[‖z(k)‖2 | Gκ
]+ E
[x(T )′Vθ(N)x(T ) | Gκ
],
for κ ∈ {0, . . . , N − 1} where uκ = (u(κ), . . . , u(N − 1)) so that each u(k)is Gk-measurable. The optimal value of Jκ(θ(κ), x(κ), uκ) is then given by
J?κ(i, x) = x′Xi(κ)x+ α(κ),
where Xi is given by a Ricatti-like equation.
15 / 26
FHOC – Dynamic programming recursion
We haveJ?κ(i, x) = x′Xi(κ)x+ α(κ),
where
Xi(N) = Vi,
Xi(k) = A′iE(X(k + 1))Ai −AiE(X(k + 1))BiFi(X(k + 1)) + C ′iCi,
where Ei(X) =∑N
j=1 pijXj , Ri(X) := D′iDi +B′iE(X)Bi and
Fi(X) := −R−1i B′iE(X)Ai.
The respective optimisers are given by
u?(k) = Fθ(k)(X(k + 1))x(k).
16 / 26
End of second section
I Formulation of FHOC for MJLS considering also an additive noiseterm
I Control policies and control laws
I Solution of FHOC: piecewise linear control laws
u?(k) = κ(x(k), θ(k)) = Fθ(k)x(k).
17 / 26
III. IHOC for MJLS and MSS
18 / 26
IHOC for MJLS
Consider a MJLS without additive noise
x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k),
with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantity thatwill be penalised. We are now looking for sequences π = {u(k)}k∈N in
U =
{π
∣∣∣∣ u(k) is Gk-measurable,∀k ∈ Nlimk→∞ E
[‖x(k)‖2
]= 0.
}
19 / 26
IHOC for MJLS
With π ∈ U the following is a well-defined infinite horizon cost function
J(θ0, x0, π) :=
∞∑k=0
E[‖z(k)‖2
],
and the IHOC problem amounts to determining
J?(θ0, x0) := infπ∈U
J(θ0, x0, π),
and we define π? to be the respective optimiser with elements
u?(k) = ψk(θ(k), x(k)).
20 / 26
Objectives
1. Under what conditions does the IHOC problem have a solution?
2. How can this solution be determined?
3. Can we derive a MS-stabilising controller by solving the IHOCP?
21 / 26
Control CARE
Assume that there is X ∈ Hn+ satisfying the control CARE :
Xi=A′iEi(X)Ai−AiEi(X)Bi(D
′iDi+B
′iEi(X)Bi)
−1B′iEi(X)Ai+C′iCi
and letFi(X) := −(D′iDi+B
′iEi(X)Bi)
−1B′iEi(X)Ai.
The IHOC problem solution is given by
u?(k) = Fθ(k)(X)x(k)
and the value function is
J?(θ0, x0) = E[x′0Xθ0x0
].
22 / 26
Control CARE ⇒ MSS
The control CARE, when solvable, yields a MS-stabilising control law, i.e.,the closed-loop system
x(k + 1) = (Aθ(k) +Bθ(k)Fθ(k)(X))x(k),
is mean square stable.
23 / 26
Solvability conditions
The following conditions entail the solvability of the control CARE:
1. (A,B) – with A ∈ Hn and B ∈ Hn,m – is stabilisable,
2. (C,A) – with C ∈ Hn,nz is detectable.
Proof. Book of Costa et al., 2005, Corollary A.16.
24 / 26
End of third section
I We formulated the infinite horizon optimal control problem
I The solution of IHOC produces a MS-stabilising control law
I IHOC is solved by a CARE which can be formulated as an LMI
I Solvability conditions: (A,B) is stabilisable, (C,A) is detectable
25 / 26
References
1. For an introduction to DP: D. P. Bertsekas, Dynamic Programming and OptimalControl. Athena Scientific, 2nd ed., 2000.
2. Chapter 4 of: O.L.V. Costa, M.D. Fragoso and R.P. Marques, Discrete-timeMarkov Jump Linear Systems, Springer 2005.
3. Chapter 6 of: M.H.A. Davis and R.B. Vinter, Stochastic modelling and control,Chapman and Hall, New York 1985.
4. M.D. Fragoso, Discrete-time jump LQG problem, Int. J. Systems Sci., 20(12), pp.2539–2545, 1989.
26 / 26