Post on 14-Jul-2020
Lecture 17: Maximum Principle
Optimal Control Problem
For control system x = f (x(t), u(t), t), x(0) = x0, solve
minu(t), t∈[0,tf ]
J(u) =
∫ tf
0F (x , u, t) dt + g(x(tf ))
subject to x = f (x , u, t), ∀t ∈ [0, tf ]; x(0) = x0
• Solve for optimal control u∗ and optimal cost J∗
• Running cost F (x , u, t) ≥ 0 and terminal cost g(·) ≥ 0
• Free terminal time problem: tf may also need to be optimized
• Possible additional constraints:• State constraint x(t) ∈ Ωx(t) and control constraint u ∈ Ωu(t)• Terminal state constraint x(tf ) ∈ Sf
2 / 23
Examples
Example 1 (LQR)
• Dynamics x = Ax + Bu, with x(0) = x0, x(tf ) ∈ Sf
• Cost function
∫ tf
0
(xTQx + uTRu) dt
Example 2 (Brachistochrone Problem) Find the slide with the fastest
descent time: τdescent = 1√2g
∫ y0
0
√1+dx/dy
y dy
• Dynamicsdx
dt= u, with fixed x(0) = 0, x(y0) = x0
• Cost function 1√2g
∫ y0
0
√1+ut
dt
3 / 23
Dynamic Programming Approach
Value function V (x , t) is the optimal cost over the time interval[t, tf ] starting from x(t) = x :
V (x , t) := minu(τ), τ∈[t,tf ]
∫ tf
tF (x , u, τ) dτ + g(x(tf ))
subject to x = f (x , u, τ), ∀τ ∈ [t, tf ]; x(t) = x
Optimality Principle: assuming u(·) ≡ v in [t, t + δ] for some small δ > 0
V (x , t) = minv
∫ t+δ
t
F (x , v , t) dτ + V (x(t + δ), t + δ)
+ o(δ)
• x(t + δ) = x(t) + f (x(t), v , t)δ + o(δ)
• Taylor series expansion of V (·, ·) at (x(t), t)
4 / 23
Hamilton-Jacobi-Bellman equation
V (x , t) satisfies the Hamilton-Jacobi-Bellman (HJB) equation:
minuF (x , u, t) +∇xV (x , t) · f (x , u, t) = −∇tV (x , t)
with boundary condition: V (·, tf ) = g(·)
• A partial differential equation, typically solved backward in time
• Optimal control u∗ is the one achieving minimum above
• V may not be differentiable everywhere (viscosity solution)
5 / 23
Linear Quadratic Regulation
Continuous-time LQR problem:
minu(t), t∈[0,tf ]
∫ tf
0
(xTQx + uTRu
)dt + x(tf )TQf x(tf )
subject to x = Ax + Bu, x(0) = x0
• Value function is quadratic: V (x , t) = xTP(t)x with P(tf ) = Qf
• HJB equation implies P(·) satisfies the Riccati differential equation
minu
xTQx + uTRu + 2xTP(Ax + Bu)
= −xT Px
⇒ minu
[xu
]T [Q + PA + ATP PB
BTP R
] [xu
]= −xT Px
⇒ − P = Q + PA + ATP − PBR−1BTP
6 / 23
Pontryagin Maximum Principle
Suppose u∗(·), x∗(·) are a solution to the optimal control problem
minu
∫ tf
0
F (x , u, t) dt + g(x(tf ))
s.t. x = f (x , u, t), t ∈ [0, tf ]; x(0) = x0
Then there exists a co-state λ∗(·) ∈ Rn such thatx∗ = ∇λH(x∗, u∗, λ∗, t)
λ∗ = −∇xH(x∗, u∗, λ∗, t) (adjoint equation)
where H(x , u, λ, t) := F (x , u, t) + λT f (x , u, t) is the Hamiltonian
Optimal control u∗ satisfies ∇uH(x∗, u, λ∗, t) = 0, or more generally
H(x∗, u∗, λ∗, t) = infuH(x∗, u, λ∗, t)
The Mathematical Theory of Optimal Processes L. S. Pontryagin, V. G.Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, Interscience, 1962
7 / 23
Connection with Dynamic Programming
• Via dynamic programming, optimal control is decided from
u∗ = arg minu
[F (x∗, u, t) +∇xV (x∗, t) · f (x∗, t)
]• Via maximum principle, optimal control is decided from
u∗ = arg minu
H(x∗, u∗, λ∗, t) = arg minu
[F (x∗, u, t) + λ∗ · f (x∗, t)
]• Indeed, co-state is the gradient of value function w.r.t. state:
λ∗ = ∇xV (x∗, t), ∀t ∈ [t0, tf ]
8 / 23
Calculus of Variations
Perturb control u(·) to u(·) + δu(·)
State perturbed from x(·) to x(·) + δx(·), and
δx = ∇x f (x , u, t) δx +∇uf (x , u, t) δu + o(δ)
(linearizatized dynamics around x(·))
Optimality condition (assume g ≡ 0): constrained optimization problem
minδu,δx
δJ =
∫ tf
0
[∇xF (x , u, t)δx +∇uF (x , u, t)δu] dt
s.t. ˙δx = ∇x f (x , u, t) δx +∇uf (x , u, t) δu, ∀t ∈ [0, tf ]
achieves its minimum value of 0 at δu = 0 and δx = 0
9 / 23
Unconstrained Optimization
By introducing the Lagrange multiplier function λ(·), the constrainedoptimization problem is converted to an unconstrained one with Lagrangian∫ tf
0
[(∇xH(x , u, t) + λT
)δx +∇uH(x , u, t)δu
]dt − λT (tf )δx(tf )
where H = F + λT f is the Hamiltonian defined before.
To achieve minimum at δx = 0 and δu = 0, their coefficients should be zero,which implies the maximum principle
Further, we have λT (tf )δx(tf ) = 0
1 If x(tf ) is fixed, λ(tf ) is unconstrained
2 If x(tf ) is unconstrained, λ(tf ) = 0
3 If x(tf ) ∈ Sf , λ(tf ) ⊥ tangent space of Sf at x(tf ) (transversalitycondition)
Extension to g 6≡ 0 case straightforward (replace λ(tf ) with λ(tf )−∇xg(x(tf )))
10 / 23
Example
• Dynamics
x1 = u1
x2 = u2
, i.e., f (x , u, t) =
[u1
u2
]• x(0) = x0, x(tf ) ∈ Sf
• Cost
∫ tf
0
(u21 + u2
2) dt, i.e., F (x , u, t) = u21 + u2
2
Solution via maximum principle
• Co-state λ =[λ1 λ2
]T• Hamiltonion H(x , u, λ, t) = F + λT f = u2
1 + u22 + λ1u1 + λ2u2
• Co-state dynamics λ∗ = −∇xH(x∗, u∗, λ∗, t) = 0; thus λ∗ is constant
• Optimal control: ∂H∂ui
= 2u∗i + λ∗i = 0; hence u∗ = −λ∗
2is constant
• Transversality condition: u∗ (hence the optimal path) ⊥ Sf
11 / 23
Dubins Path
Vehicle dynamics
x = v0 cos θ
y = v0 sin θ
θ = u
with fixed x(0) = x0 and x(tf ) = xf
• Constant speed v0 and bounded turn rate u ∈ [−1, 1]
• Cost J(u) = tf =∫ tf
01 dt, i.e., F ≡ 1 (free terminal time problem)
• Shortest curve with bounded curvature connecting x0 and xf
Solution via maximum principle
• Hamiltonian H = 1 + v0(λ1 cos θ + λ2 sin θ) + λ3u
•
λ1 = 0 (i.e., λ1 is constant)
λ2 = 0 (i.e., λ2 is constant)
λ3 = v0(−λ1 sin θ + λ2 cos θ)
and u∗ =
1 if λ3 < 0
−1 if λ3 > 0
? if λ3 = 0
12 / 23
Dubins PathFact: optimal path is a combination of no more than three motion primitives
• S (straight): u ≡ 0
• L (left turn): u ≡ 1
• R (right turn): u ≡ −1
Further, the only possible combinations are LRL,RLR, LSL, LSR,RSL,RSR
“On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminalpositions and tangents”, L. E. Dubins. American Journal of Mathematics, 79:497-516, 1957.
13 / 23
A relook at LQR Problem
Continuous-time LQR problem:
minimize J(x , u) =1
2
∫ tf
0
(xTQx + uTRu) dt
subject to x = Ax + Bu, t ∈ [0, tf ], x(0) = x0
Hamiltonian function H(x , u, λ) = 12
(xTQx + uTRu
)+ λT (Ax + Bu)
By Maximum Principle, the optimal u, x , λ satisfy
Hu = Ru + BTλ = 0 ⇒ u∗ = −R−1BTλ∗x∗ = Hλ = Ax∗ − BR−1BTλ∗
λ∗ = −Hx = −Qx∗ − ATλ∗⇒
[x∗
λ∗
]=
[A −BR−1BT
−Q −AT
] [x∗
λ∗
]with two-point boundary conditions x(0) = x0, λtf = 0
Connection to value function Vt(x) = xTP(t)x : λ(t) = P(t)x(t)
14 / 23
Optimal Control of Hybrid System• Two modes 1 and 2 with domains
D1 = x2 ≥ 0 and D2 = x2 ≤ 0
• Identical dynamics f1 = f2 =
x1 = u1
x2 = u2
• Switching surface (guard) D1 ∩ D2
• Trivial reset condition
Suppose two modes have different running cost:
F1(x , u, t) = u21 + u2
2 , F2(x , u, t) = 2u21 + 2u2
2
Optimal control problem: Among all the solutions that start from x0 ∈ D1 attime 0, switch exactly once from mode 1 to mode 2, and end at xf ∈ D2 at afixed terminal time tf , find the one with the least cost
• With a switching time t1 ∈ (0, tf ), the cost is∫ t1
0
F1(x , u, t) dt +
∫ tf
t1
F2(x , u, t) dt
15 / 23
Variational Method
To see if u is optimal, perturb it to u + δu
• Switching time t1 is perturbed to t1 + δt1
• State trajectory is perturbed from x to x + δx (two segments)
• Cost J is perturbed to J + δJ
Optimality condition: For u, x to be optimal, following problem shouldhave optimal solution δx = 0 and δu = 0:
minδu
δJ subject to ODE(δx , δu), δx(0) = δx(tf ) = 0
• Introduce Lagrange multiplier (co-state) λ(t), t ∈ [0, tf ], to convert theabove constrained problem to unconstrained problem
• Integration by part and set coefficients of δu and δx to zero
16 / 23
Optimality Condition
(Hybrid) Hamiltonian function H(x , u, λ, t) is defined as
H(x , u, λ, t) :=
F1(x , u, t) + λT f1(x , u, t)
= u21 + u2
2 + λ1u1 + λ2u2 if x ∈ D1
F2(x , u, t) + λT f2(x , u, t)
= 2u21 + 2u2
2 + λ1u1 + λ2u2 if x ∈ D2
Suppose u∗, x∗ are an optimal solution with the switching timet∗1 ∈ (0, tf ). Then there exists a co-state λ∗(t), t ∈ [0, tf ], such that
x∗ = Hλ(x∗, u∗, λ∗, t), t ∈ [0, t1) ∪ (t1, tf ]
λ∗ = −Hx(x∗, u∗, λ∗, t), t ∈ [0, t1) ∪ (t1, tf ]
Moreover, u∗ satisfies ∇uH(x∗, u∗, λ∗, t) = 0; and trasversality condition
(λ(t+1 )− λ(t−1 )) ⊥ Tx∗(t1)(D1 ∩ D2)
17 / 23
Solving Optimality Condition
• As H does not depend on x , λ∗ = −Hx = 0 implies
λ∗(t) ≡
λ− t ∈ [0, t1)
λ+ t ∈ (t1, tf ]
• Transversal condition (λ(t+1 )− λ(t−1 )) ⊥ Tx∗(t1)(D1 ∩ D2) implies
λ−1 = λ+1
• ∇uH(x∗, u∗, λ∗, t) = 0 implies u∗ is constant in each mode:
u∗ =
−λ
−
2 t ∈ [0, t1)
−λ+
4 t ∈ (t1, tf ]
18 / 23
Finding Optimal Solution
Optimal u∗, x∗ specified by the two vectors λ−, λ+ satisfyingλ−1 = λ+
1
−λ−2
2 · t1 = vertical coordinate of x0
−λ−
2 · t1 − λ+
4 · (tf − t1) = xf − x0
• Four unknowns and four equations
• In general admits a unique solution
• Solution determines u∗, x∗ (and t1)
19 / 23
Snell’s Law
A light ray passing through a boundarybetween two isotropic media
• v1, v2: velocities of light
• n1, n2: refractive indexes
Snell’s Law:
sin θ1
sin θ2=
v1
v2=
n2
n1
From wikipedia.org
20 / 23
Hybrid Maximum Principle
A general theory of maximum principle for hybrid systems:
• H.J. Sussmann, A maximum principle for hybrid optimal controlproblems, CDC, pp. 425-430, 1999
Notable references on optimal control of hybrid systems:
• S. C. Bengea and R. A. DeCarlo, Optimal control of switchingsystems, Automatica , 2005.
• X. Xu and P. Antsaklis Optimal control of switched systems basedon parameterization of the switching instants, TAC, 2004.
• M. Egerstedt, Y. Wardi and H. Axelsson, Transition-timeoptimization for switched-mode dynamical systems, TAC, 2006.
21 / 23
Embedding Technique
Optimal control problem for a switched system with σ(t) ∈ 1, . . . ,m
minu(·),σ(·)
J =
∫ tf
0
Fσ(t)(x(t), u(t), t) dt + g(x(tf ), tf )
subject to x(t) = fσ(t)(x(t), u(t), t), ∀t ∈ [0, tf ]; x(0) = x0 (1)
Idea: solve the optimal control problem for a nonswitched system:
minu(·),∆(·)
J =
∫ tf
0
m∑i=1
∆i (t) · Fi (x(t), u(t), t) dt + g(x(tf ), tf )
subejct to x(t) =m∑i=1
∆i (t) · fi (x(t), u(t), t) (2)
• Non-switched system (2) has inputs u(·) and[∆1(t) · · · ∆m(t)
]taking values in the m-simplex: ∆i (t) ≥ 0,
∑mi=1 ∆i (t) = 1
• Under some mild conditions, solutions of (1) are dense in solutions of (2)
22 / 23
Embedding Technique
Hamiltonian of embedded system
H(x , u,∆, λ, t) =m∑i=1
∆i · Hi (x , u, λ, t)
where Hi (x , u, λ, t) = Fi + λT fi is the Hamiltonian of i-th subsystem
Optimal solution of embedded system:
x = Hλ =m∑i=1
∆i · fi (x , u, t), λ = −m∑i=1
∆i ·∂
∂xHi (x , u, λ, t)
Optimal u and ∆: H(x , u∗,∆∗, λ, t) = mini,u Hi (x , u, λ, t)
• Generally, optimal ∆∗ takes values in a corner of m-simplex
• In some cases, ∆∗ takes interior values and J∗ 6= J∗
“Optimal control of switching systems”, S. C. Bengea and R. A. DeCarlo,Automatica, 2005.
23 / 23