Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear...

Neighboring-Optimal Control via Linear-Quadratic (LQ) Feedback

Robert Stengel Optimal Control and Estimation, MAE 546

Princeton University, 2018

•  Linearization of nonlinear dynamic models–  Nominal trajectory–  Perturbations about the nominal trajectory–  Linear, time-invariant dynamic models–  Examples

•  Continuous-time, time-varying optimal LQ feedback control

•  Discrete-time and sampled-data linear dynamic models

•  Dynamic programming approach to optimal LQ sampled-data control

Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html

http://www.princeton.edu/~stengel/OptConEst.html

1

Neighboring Trajectories Satisfy Same Dynamic Equations

•  Neighboring-trajectory dynamic model is the same as the nominal dynamic model

!xN (t) = f[xN (t),uN (t),wN (t)], xN to( ) given

!x(t) = f[x(t),u(t),w(t)], x to( ) given

Δx(to ) = x(to )− xN (to )Δx(t) = x(t)− xN (t)Δ!x(t) = !x(t)− !xN (t)

!xN (t) = f[xN (t),uN (t),wN (t),t]!x(t) = !xN (t)+ Δ!x(t) = f[xN (t)+ Δx(t),uN (t)+ Δu(t),wN (t)+ Δw(t),t]

Δu(t) = u(t)− uN (t)Δw(t) = w(t)−wN (t)

2

Solve for the Nominal and Perturbation Trajectories Separately

Jacobian matrices of the linear model are evaluated along the nominal trajectory

!xN (t) = f[xN (t),uN (t),wN (t),t],xN to( ) given

Δ!x(t) ≈ ∂ f∂x

t( )Δx(t)+ ∂ f∂u

t( )Δu(t)+ ∂ f∂w

t( )Δw(t),

Δx to( ) given

F(t) ! ∂ f∂x x=xN (t )

u=uN (t )w=wN (t )

; G(t) ! ∂ f∂u x=xN (t )

u=uN (t )w=wN (t )

; L(t) ! ∂ f∂w x=xN (t )

u=uN (t )w=wN (t )

Δx(t) = F(t)Δx(t) +G(t)Δu(t) + L(t)Δw(t), Δx to( ) given3

!x(t) = !xN (t)+ Δ!x(t)≈ f[xN (t),uN (t),wN (t),t]

+ ∂ f∂x

Δx(t)+ ∂ f∂u

Δu(t)+ ∂ f∂w

Δw(t),

x(t0 ) = xN (t0 )+ Δx(to ) given

Linearization Example

4

Cubic Springs

Force is a nonlinear function of deflection

f x( ) = −k1x ± k2x3

moment θ( ) ≈ −k1θ + k2θ3

Example: Expand gsinθ in a power series

5

Stiffening Cubic Spring Example2nd-order nonlinear dynamic model

!x1(t) = f1 x(t)[ ] = x2 (t)!x2 (t) = f2 x(t)[ ] = −10x1(t)−10x1

3(t)− x2 (t)Integrate equations to produce nominal path

x1(0)x2 (0)

⎡

⎣⎢⎢

⎤

⎦⎥⎥→

f1N x(t)[ ]f2N x(t)[ ]

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥dt→

0

t f

∫x1N (t)

x2N (t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥

in 0,t f⎡⎣ ⎤⎦

Evaluate partial derivatives of the Jacobian matrices

∂ f1∂ x1

= 0; ∂ f1∂ x2

= 1

∂ f2∂ x1

= −10 − 30x1N2 (t); ∂ f2

∂ x2= −1

∂ f1∂u = 0;

∂ f1∂w = 0

∂ f2∂u = 0;

∂ f2∂w = 0

6

Nominal and Perturbation Dynamic Equations

!x1N (t) = x2N (t)

!x2N (t) = −10x1N (t)−10x1N3(t)− x2N (t)

Δx1(t)Δx2 (t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

0 1− 10 + 30x1N

2 (t)( ) −1

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx1(t)Δx2 (t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥

x1N (0)

x2N (0)

⎡

⎣⎢⎢

⎤

⎦⎥⎥= 0

9⎡

⎣⎢

⎤

⎦⎥

Nonlinear, Time-Invariant (NTI) Equation

Linear, Time-Varying (LTV) Equation

Initial Conditions for Nonlinear and Linear Models

7

Δx1(0)Δx2 (0)

⎡

⎣⎢⎢

⎤

⎦⎥⎥= 0

1⎡

⎣⎢

⎤

⎦⎥

Comparison of Approximate and Exact Solutions

xN (t)Δx(t)xN (t)+ Δx(t)x(t)

x2N (0) = 9Δx2 (0) = 1x2N (t)+ Δx2 (t) = 10x2 (t) = 10

!xN (t)Δ!x(t)!xN (t)+ Δ!x(t)!x t( )

8

Euler-Lagrange Equations for Minimizing Variational

Cost Function

9

Optimize Nominal and Neighboring Dynamic Models Separately

Linear, Neighboring-Optimal Control Added to Nominal Control

Δu(t) = −C(t)Δx(t)= −C(t) x(t)− x*(t)[ ]

Nominal, Nonlinear, Open-Loop Optimal Control

u(t) = uopt (t) ! u*(t)

10

u(t) = u*(t)+ Δu(t)

Expand Optimal Control Function

J x*(t0 )+ Δx(t0 )[ ], x*(t f )+ Δx(t f )⎡⎣ ⎤⎦{ } !J * x*(t0 ),x*(t f )⎡⎣ ⎤⎦ + ΔJ Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦ + Δ2J Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦

Nominal optimized cost, plus nonlinear dynamic constraint

Expand optimized cost function to second degree

J * x*(t0 ),x*(t f )⎡⎣ ⎤⎦ = φ x*(t f )⎡⎣ ⎤⎦ + L x*(t),u*(t)[ ]t0

t f

∫ dt

subject to nonlinear dynamic equation!x*(t) = f x*(t),u*(t)[ ], x(t0 ) = x0

= J * x *(t0 ),x *(t f )⎡⎣ ⎤⎦ + Δ2J Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦because First Variation, ΔJ Δx(t0 ),Δx(t f )⎡⎣ ⎤⎦ = 0

11

2nd Variation of the Cost Function

Cost weighting matrices expressed as

Objective: Given optimal nominal solution, minimize 2nd-variational cost subject to linear dynamic constraint

minΔu

Δ2J = 12ΔxT (t f )φxx (t f )Δx(t f )+

12

ΔxT (t) ΔuT (t)⎡⎣

⎤⎦

Lxx (t) Lxu(t)Lux (t) Luu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

P(t f ) ! φxx (t f ) =∂2φ∂x2

(t f )

Q(t) M(t)MT (t) R(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥!

Lxx (t) Lxu(t)Lux (t) Luu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥

dim P(t f )⎡⎣ ⎤⎦ = dim Q(t)[ ] = n × ndim R(t)[ ] = m ×mdim M(t)[ ] = n ×m

subject to perturbation dynamicsΔ!x(t) = F(t)Δx(t)+G(t)Δu(t), Δx(t0 ) = Δxo

12

2nd Variational Hamiltonian

Δ2J = 12ΔxT (t f )P(t f )Δx(t f )+

12


⎤⎦


⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦ = L Δx(t),Δu(t)[ ]+ ΔλT t( )f Δx(t),Δu(t)[ ]= 12

ΔxT (t)Q(t)Δx(t)+ 2ΔxT (t)M(t)Δu(t)+ ΔuT (t)R(t)Δu(t)⎡⎣ ⎤⎦

+ ΔλT t( ) F(t)Δx(t)+G(t)Δu(t)[ ]

Variational Lagrangian plus adjoined dynamic constraint

Variational cost function

= 12ΔxT (t f )P(t f )Δx(t f )+

12

ΔxT (t)Q(t)Δx(t)+ 2ΔxT (t)M(t)Δu(t)+ ΔuT (t)R(t)Δu(t)⎡⎣ ⎤⎦dtt0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

13

2nd Variational Euler-Lagrange Equations

Δλ t f( ) = φxx (t f )Δx(t f ) = P(t f )Δx(t f )

Terminal condition, solution for adjoint vector, and optimality condition

Δ λ t( ) = −

∂H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦∂Δx

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

T

= −Q(t)Δx(t) −M(t)Δu(t) − FT (t)Δλ t( )

∂H Δx(t),Δu(t),Δλ t( )⎡⎣ ⎤⎦∂Δu

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

T

=MT (t)Δx(t) + R(t)Δu(t) −GT (t)Δλ t( ) = 0

14Can explicitly solve for control

E-L #1

E-L #2

E-L #3

Two-Point Boundary-Value Problem

Δx(t) = F(t)Δx(t) +G(t)Δu(t)

Δλ t( ) = −Q(t)Δx(t) −M(t)Δu(t) − FT (t)Δλ t( )

Δλ t f( ) = P(t f )Δx(t f )

Δx(t0 ) = Δx0

State Equation

Adjoint Vector Equation

15

Δu(t) = −R−1(t) MT (t)Δx(t) +GT (t)Δλ t( )⎡⎣ ⎤⎦

From Hu = 0

Use Control Law to Solve the LTV Two-Point Boundary-Value Problem

Δu(t) = −R−1(t) MT (t)Δx(t) +GT (t)Δλ t( )⎡⎣ ⎤⎦

Δx(t)Δ λ t( )

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ −G(t)R−1(t)GT (t)

−Q(t) +M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦T

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Δx(t)Δλ t( )

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t0 )

Δλ t f( )⎡

⎣

⎢⎢

⎤

⎦

⎥⎥=

Δx0PfΔx f

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Perturbation state vectorPerturbation adjoint vector

From Hu = 0

Substitute for control in system and adjoint equations

Control law that feeds back state and adjoint vectors

Adjoint relationship at end point

16

Use Control Law to Solve the LTV Two-Point Boundary-Value Problem

Assume the adjoint relationship between state and control applies over the entire interval

Δλ t( ) = P t( )Δx t( )Substitute for adjoint: Control law feeds back state alone

Δu*(t)= –R–1(t) MT(t)Δx(t)+GT(t)P t( )Δx t( )⎡⎣ ⎤⎦= –R–1(t) MT(t)+GT(t)P t( )⎡⎣ ⎤⎦Δx t( )

dim C( ) = m × n

17 Δu*(t) ! argmin

Δu t( ) in t0 , t f( )Δ2J Δx Δu( )⎡⎣ ⎤⎦

Δu*(t) = –C(t)Δx t( )

Time-Varying Linear-Quadratic (LQ) Optimal Control Gain Matrix

•  Properties of feedback gain matrix–  Full state feedback (m x n)–  Time-varying matrix

•  R, G, and M given•  Control weighting matrix, R•  State-control weighting matrix, M•  Control effect matrix, G

Δu(t) = −C(t)Δx t( )

C(t) = R−1(t) GT (t)P t( ) +MT (t)⎡⎣ ⎤⎦

Optimal feedback gain matrix

18P(t) remains to be determined

Solution for the Adjoining Matrix, P(t)

Δλ t( ) = P t( )Δx t( ) + P t( )Δx t( )

P t( )Δx t( ) = Δ λ t( ) − P t( )Δx t( )

Time-derivative of adjoint vector

Rearrange

!P t( )Δx t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TΔλ t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )−G(t)R−1(t)GT (t)Δλ t( ){ }

Recall coupled state/adjoint equation

Δx(t)Δ λ t( )

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ −G(t)R−1(t)GT (t)

−Q(t) +M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t) −G(t)R−1(t)MT (t)⎡⎣ ⎤⎦T

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Δx(t)Δλ t( )

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Substitute in adjoint matrix equation

19

Solution for the Adjoining Matrix, P(t)

Substitute for adjoint vector

Δλ t( ) = P t( )Δx t( )

!P t( )Δx t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦

TP t( )Δx t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦Δx t( )−G(t)R−1(t)GT (t)P t( )Δx t( ){ }

... and eliminate state vector

20

Matrix Riccati Equation for P(t)

!P t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TP t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ + P t( )G(t)R−1(t)GT (t)P t( )P t f( ) = φxx t f( )

Nonlinear, ordinary differential equation for the Riccati Matrix, P(t), with terminal boundary conditions

21

Characteristics of the Adjoining ( Riccati) Matrix, P(t)

•  P(tf) is symmetric, n x n, and typically positive semi-definite

•  Matrix Riccati equation is symmetric•  Therefore, P(t) is symmetric and positive semi-definite

throughout•  Once P(t) has been determined, optimal feedback

control gain matrix, C(t) can be calculated

22

Example of Neighboring-Optimal Control: Improved Infection Treatment via

Feedback

23

Natural Response to Pathogen Assault (No Therapy)

24

�

J = 12s11x1 f

2 + s44x4 f

2( ) + 12

q11x12 + q44x4

2 + ru2( )dtto

t f

∫

Tradeoff Between Rate of Killing Pathogen, Preservation of Organ

Health, and Drug Use

Optimal Open-Loop Control

25

u(t) = u*(t)−C(t)Δx(t)= u*(t)−C(t)[x(t)− x*(t)]

50% Increased Infection: Nominal and Neighboring-Optimal Control (u1)

26

Optimal Control of Linear Systems

27

Time-Varying Linear-Quadratic (LQ) Feedback Control

LTV Continuous-time linear dynamic system

LTV optimal control law

Linear dynamic system with LTV optimal feedback control


Δu(t) = −R−1(t) MT (t)+GT (t)P t( )⎡⎣ ⎤⎦Δx t( ) −C(t)Δx t( )


= F(t)Δx(t)+G(t) −C(t)Δx t( )⎡⎣ ⎤⎦= F(t)−G(t)C(t)[ ]Δx(t)

28

LTI System with Time-Varying LQ Feedback Control

Open-loop system is LTI, but optimal control gain matrix is time-varying

Closed-loop system Δu(t) = −R−1 MT +GTP t( )⎡⎣ ⎤⎦Δx t( ) −C(t)Δx t( )

Δ!x(t) = FΔx(t)+G −C(t)Δx t( )⎡⎣ ⎤⎦= F −GC(t)[ ]Δx(t) = Fclosed−loop t( )Δx(t)

29

LTI dynamic system

Δx(t) = FΔx(t) +GΔu(t)

Example: LTV Optimal Control of a LTI First-Order System

Δx = fΔx + gΔu

Δu = −r−1 gp t( )⎡⎣ ⎤⎦Δx t( )

= −gp t( )r

Δx

!p t( ) = −q − 2 fp t( ) + g2p2 t( )r

p t f( ) = pf

Δ2J = 12pfΔx

2 (t f )+12

qΔx2 + 0( )ΔxΔu + rΔu2( )dtt0

t f

∫

30

!p t( ) = −1+ 2p t( ) + p2 t( )p t f( ) = 1

Δx = −Δx + Δu; x 0( ) = 1

Δu = − p t( )Δx

Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

Example: LTV Optimal Control of a Stable First-Order Systemf = −1; g = 1

q = r = 1

Control gain = p t( )

Δxopen−loop t( )

Δxclosed−loop t( )

p t( )

Δu t( )

31

Δu = − p t( )Δx

Δx = 1− p t( )⎡⎣ ⎤⎦Δx

Example: LTV Optimal Control of an Unstable First-Order Systemf = +1; g = 1

Control gain = p t( )

Δx = Δx + Δu; x 0( ) = 1

!p t( ) = −1− 2p t( ) + p2 t( )p t f( ) = 1

Δxopen−loop t( )

Δxclosed−loop t( )

p t( )

Δu t( )

32

Discrete-Time and Sampled-Data Linear, Time-Invariant

(LTI) Systems

33

Continuous- and Discrete-Time LTI System Models

Δ!x(t) = FΔx(t)+GΔu(t)+LΔw(t)Δx(t0 ) given

Continuous-time (“analog”) model is based on an ordinary differential equation

Dynamic Process

Observation Process

34

Δx(tk+1) = ΦΔx(tk )+ ΓΔu(tk )+ ΛΔw(tk )Δx(t0 ) given

Dynamic Process

Observation Process

Discrete-time (“digital”) model is based on an ordinary difference equation

Δy(t) = HxΔx(t)+HuΔu(t)+HwΔw(t)

Δy(tk ) = HxΔx(tk )+HuΔu(tk )+HwΔw(tk )

Digital Control Systems Use Sampled Data

Δxk = Δx tk( ) = Δx kΔt( )

•  Periodic sequence

•  Sampler is an analog-to-digital (A/D) converter•  Reconstructor is a digital-to-analog (D/A) converter

35

Solution of a LTI dynamic model

Δ!x(t) = FΔx(t)+GΔu(t)+LΔw(t), Δx(t0 ) given

Δx(t) = Δx(t0 )+ FΔx(τ )+GΔu(τ )+LΔw(τ )[ ]dτt0

t

∫•  ... has two parts

–  Unforced (homogeneous) response to initial conditions–  Forced response to control and disturbance inputs

LTI System Response to Inputs and Initial Conditions

36

Initial condition response

Step input response

Unforced Response to Initial Conditions

The state transition matrix propagates the state from t0 to t by a single multiplication

Δx(t) = Δx(t0 )+ FΔx(τ )[ ]dτt0

t

∫ = Φ t,t0( )Δx(t0 )

Neglecting forcing functions

The state transition matrix is the matrix exponential

Δx(t) = Δx(t0 )+ FΔx(τ )[ ]dτt0

t

∫= Φ t − t0( )Δx(t0 ) = eF t−t0( )Δx(t0 ) 37

LTI State Transition Matrix is the Matrix Exponential

eF t−t0( ) ! eFδ t =Matrix Exponential! Φ δ t( ) = State Transition Matrix

= I+ Fδ t + 12!Fδ t[ ]2 + 1

3!Fδ t[ ]3 + ...

See pages 79-84 of Optimal Control and Estimation for a description of how the State Transition Matrix is calculated for LTV

and LTI systems

38

Response to Inputs

Solution of the LTI model with piecewise-constant

forcing functions

Δx(tk ) = Δx(tk−1) + FΔx(τ ) +GΔu(τ ) + LΔw(τ )[ ]dτtk−1

tk

∫

Δx(tk ) = eFδ tΔx(tk−1)+ e

Fδ t e−F τ −tk−1( )⎡⎣ ⎤⎦dτtk−1

tk

∫ GΔu(tk−1)+LΔw(tk−1)[ ]= ΦΔx(tk−1)+ ΓΔu(tk−1)+ ΛΔw(tk−1)

39

δ t ! tk − tk−1 = constant

Φ ! eFδ t

Γ ! eFδ t I− e−Fδ t( )F−1G = eFδ t − I( )F−1G

Λ ! eFδ t I− e−Fδ t( )F−1L = eFδ t − I( )F−1L

Discrete-Time LTI System Response to Step Input

Δx(t1) = ΦΔx(t0 )+ ΓΔu(t0 )+ ΛΔw(t0 )Δx(t2 ) = ΦΔx(t1)+ ΓΔu(t1)+ ΛΔw(t1)Δx(t3) = ΦΔx(t2 )+ ΓΔu(t2 )+ ΛΔw(t2 )

!

Step Input: Discrete-time solution is exact at sampling instants

Propagation of Δx tk( )with constant δ t, Φ, Γ, and Λ

40

As time interval becomes very small, discrete-time model approaches

continuous-time model

Φ δ t→0⎯ →⎯⎯ I + Fδt( )Γ δ t→0⎯ →⎯⎯ GδtΛ δ t→0⎯ →⎯⎯ Lδt

Series Expressions for Discrete-Time LTI System Matrices

Γ = eFδ t − I( )F−1G = I + 12!F δt( )⎡⎣ ⎤⎦ +

13!F δt( )⎡⎣ ⎤⎦

2 + ...⎛⎝⎜

⎞⎠⎟Gδt

Λ = eFδ t − I( )F−1L = I + 12!F δt( )⎡⎣ ⎤⎦ +

13!F δt( )⎡⎣ ⎤⎦

2 + ...⎛⎝⎜

⎞⎠⎟Lδt

41

Φ = e Fδt = I + F δt( ) + 1

2!F δt( )⎡⎣ ⎤⎦

2+ 13!F δt( )⎡⎣ ⎤⎦

3+!

Sampled-Data Cost Function

minΔu t( )

Δ2J = minΔu t( )

12ΔxT (t f )P(t f )Δx(t f )+

12


⎤⎦

Q MMT R

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Minimize subject to sampled-data dynamic constraint

Δx(tk+1) = Φ δt( )Δx(tk ) + Γ δt( )Δu(tk )

Sampled-Data Cost Function: a Discrete-Time Cost Function that accounts for system response between sampling instants

minΔu t( )

Δ2J = minΔu t( )

12Δxk f

TPk fΔxk f +12


⎤⎦

Q MMT R

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

tk

tk+1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑

Sum integrals over short time intervals, (tk, tk+1)

42

Integrand of Sampled-Data Cost Function

Use dynamic equation ...

12


⎤⎦

Q MMT R

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

tk

tk+1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑

...to express the integrand in the sampling interval, (tk,tk+1)

Δx t( ) = Φ t,tk( )Δx tk( ) + Γ t,tk( )Δu tk( )! Φ t,tk( )Δxk + Γ t,tk( )Δuk

43

Integrand of Sampled-Data Cost Function

12


⎤⎦

Q MMT R

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

tk

tk+1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑

=12

ΔxkT Δuk

T⎡⎣

⎤⎦

ΦT t,tk( )QΦ t,tk( ) ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦

QΓ t,tk( ) +M⎡⎣ ⎤⎦TΦ t,tk( ) R + ΓT t,tk( )M +MTΓ t,tk( ) + ΓT t,tk( )QΓ t,tk( )⎡⎣ ⎤⎦

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥tk

tk+1

∫k

dtΔxkΔuk

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎧

⎨⎪

⎩⎪

⎫

⎬⎪

⎭⎪k=0

k f −1

∑

= 12

ΔxkT Δuk

T⎡⎣

⎤⎦

Q̂ M̂M̂T R̂

⎡

⎣⎢⎢

⎤

⎦⎥⎥k

ΔxkΔuk

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑

Bring state and control out of integralAssume control is constant in sampling interval

Integration has been replaced by summation

Berman, Gran, J. Aircraft, 1974 44

Sampled-Data Cost Function Weighting Matrices

Q̂ = ΦT t,tk( )QΦ t,tk( )dttk

tk+1

∫

M̂ = ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦dttk

tk+1

∫

R̂ = R + ΓT t,tk( )M +MTΓ t,tk( ) + ΓT t,tk( )QΓ t,tk( )⎡⎣ ⎤⎦dttk

tk+1

∫

The integrand accounts for continuous-time variations of the LTI system between sampling instants (“Inter-sample ripple”)

Assume Q, M, and R are constant in the integration interval

Φ t,tk( ) and Γ t,tk( ) vary in the integration interval

45

Dynamic Programming Approach to Sampled-Data

Optimal Control

46

Discrete-Time Hamilton-Jacobi-Bellman Equation

V (t0 ) =ϕk f+ Lk

k=0

k f −1

∑

Value Function at toDiscrete HJB equation

•  Begin at terminal point with optimal value function•  Working backward, add minimum value function

increment (Bellman’s Principle of Optimality)

Vk*= −minΔuk

Lk +Vk+1 *{ }= −min

ΔukHk , V * Δxk f *⎡⎣ ⎤⎦ = given

… optimal policy … whatever the initial state and initial decision …remaining decisions must constitute an optimal policy with regard to the current state

47

subject toΔxk+1 = ΦΔxk + ΓΔuk

Sampled-Data Cost Function Contains Terminal and Summation Costs

Integral cost has been replaced by a summation costTerminal cost is the same

minΔuk

Jsampled

= minΔuk

12Δxk f

TPk fΔxk f +12

ΔxkT Δuk

T⎡⎣

⎤⎦

Q̂ M̂M̂T R̂

⎡

⎣⎢⎢

⎤

⎦⎥⎥k

ΔxkΔuk

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

subject toΔxk+1 = ΦΔxk + ΓΔuk

48

Dynamic Programming Approach to Sampled-Data LQ Control

V (t0 ) =12Δxk f

TPk fΔxk f +12

ΔxkT Δuk

T⎡⎣

⎤⎦

Q̂ M̂M̂T R̂

⎡

⎣⎢⎢

⎤

⎦⎥⎥

ΔxkΔuk

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪k=0

k f −1

∑

Vk*= −minΔuk

12

Δxk *T Q̂Δxk *+2Δxk *T M̂Δuk + ΔukT R̂Δuk⎡⎣ ⎤⎦ +Vk+1 *⎧

⎨⎩

⎫⎬⎭

= −minΔuk

Hk , V * Δxk f *⎡⎣ ⎤⎦ = Δxk f *T Pk fΔxk f *T

subject to Δxk+1 = ΦΔxk + ΓΔuk

Quadratic Value Function at to

Discrete HJB equation

49

Optimality Condition

Vk =12Δxk

TPkΔxk ; Vk+1 =12Δxk+1

TPk+1Δxk+1

Assume value function takes a quadratic form

Optimality condition∂Hk

∂Δuk= Δxk

TM̂ + ΔukT R̂⎡⎣ ⎤⎦ +

∂Vk+1∂Δuk

= 0

∂Vk+1∂Δuk

=∂ 12Δxk+1

TPk+1Δxk+1⎡⎣⎢

⎤⎦⎥

∂Δuk= ΦΔxk + ΓΔuk[ ]T Pk+1Γ

ΔxkTM̂ + Δuk

T R̂⎡⎣ ⎤⎦ + ΔxkTΦT + Δuk

TΓT⎡⎣ ⎤⎦Pk+1Γ = 0

where

hence50

Minimizing Value of Control

Δuk = − R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦

−1M̂T + ΓTPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk

Must find Pk in 0,k f( )Use definitions of V * and Δu in HJB equation

∂Hk

∂Δuk= Δxk

T M̂ +ΦTPk+1Γ⎡⎣ ⎤⎦ + ΔukT R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦ = 0

51

Solution for Pk

12Δxk

TPkΔxk =

−minΔuk

12

ΔxkTQ̂Δxk + 2Δxk

TM̂ −CkΔxk( ) + −CkΔxk( )T R̂ −CkΔxk( ) + Δxk+1TPk+1Δxk+1⎡

⎣⎤⎦

⎧⎨⎩

⎫⎬⎭

Vk =12Δxk

TPkΔxk ; Vk+1 =12Δxk+1

TPk+1Δxk+1

Pk = Q̂+Φ TPk+1Φ − M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦TR̂ + Γ TPk+1Γ⎡⎣ ⎤⎦

−1M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦

Pk f given

Δuk = − R̂ + ΓTPk+1Γ⎡⎣ ⎤⎦

−1M̂T + ΓTPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk

Substitute for Vk ,Vk+1, and Δuk in discrete-time HJB equation

Rearrange and cancel Δxk on both sides of the equation to yield thediscrete-time Riccati equation

52

Discrete-Time System with Linear-Quadratic Feedback Control

Dynamic System

Control law

Dynamic system with LQ feedback control

Δxk+1 = ΦΔxk + ΓΔuk

Δuk = − R̂ + Γ TPk+1Γ⎡⎣ ⎤⎦

−1M̂ T + Γ TPk+1Φ⎡⎣ ⎤⎦Δxk −CkΔxk

Δxk+1 = ΦΔxk + ΓΔuk= ΦΔxk + Γ −CkΔx( )k= Φ − ΓCk( )Δxk

53

Next Time:Dynamic System Stability

ReadingOCE: Section 2.5

54

Supplemental Material

55

Neighboring Trajectories•  Nominal (or reference) trajectory and control history

xN (t), uN (t),wN (t){ } for t in [t0,t f ]

•  Trajectory perturbed by–  Small initial condition variation–  Small control variation–  Small disturbance variation

x(t), u(t),w(t){ } for t in [t0,t f ]

= xN (t)+ Δx(t), uN (t)+ Δu(t),wN (t)+ Δw(t){ }

�

x : dynamic stateu : control inputw : disturbance input

56

Example: Separate Solutions for Nominal and Perturbation Trajectories

Δ!x1Δ!x2Δ!x3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥=

0 1 0−2a1 x3N − x1N( ) −a2 a2 + 2a1 x3N − x1N( )c1 + b3u1N( ) c1 3c2x3N

2

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

Δx1Δx2Δx3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

+0 0b1 b2b3x1N 0

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

Δu1Δu2

⎡

⎣⎢⎢

⎤

⎦⎥⎥+

d00

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥Δw1 ;

Δx1(to )Δx2 (to )Δx3(to )

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥given

Linear, time-varying equation describes perturbation dynamics

Original nonlinear equation describes nominal dynamics

xN =

x1Nx2Nx3N

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=

x2N + dw1N

a2 x3N − x2N( ) + a1 x3N − x1N( )2 + b1u1Nc2x3N

3 + c1 x1N + x2N( ) + b3x1N u1N+ b2u2N

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

,

x1Nx2Nx3N

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

given

57

!p t( ) = −q + 2p t( ) + p2 t( ); q = 1 or 100

p t f( ) = 1 Δx = −Δx + Δu + Δw; x 0( ) = 0

Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

Example: LQ Optimal Control, Stable First-Order System, “White-Noise” Disturbance

q = 1 q = 100Open-Loop Response

Open- and Closed-Loop Responses

Δxopen−loop t( )

p t( )

Δu t( )

Δxopen−loop t( )

p t( )

Δu t( )

58

!p t( ) = −1+ 2p t( ) + p2 t( )p t f( ) = 1 or 1000

Δx = −Δx + Δu + Δw; x 0( ) = 0

Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

pf = 1 pf = 1000Open-Loop Response

Open- and Closed-Loop Responses

Δxopen−loop t( )

p t( )

Δu t( )

Δxopen−loop t( )

p t( )

Δu t( )

59


!p t( ) = −100 + 2p t( ) + p2 t( )p t f( ) = 1 or 0

Δx = −Δx + Δu + Δw; x 0( ) = 0

Δu = − p t( )Δx Δx = − 1+ p t( )⎡⎣ ⎤⎦Δx

pf = 1 pf = 0Δxopen−loop t( )

Δxopen−loop t( ), Δxclosed−loop t( )

p t( )

Δu t( )

Δxopen−loop t( )

Δxopen−loop t( ), Δxclosed−loop t( )

p t( )

Δu t( )

60


Multivariable LQ Optimal Control with Cross Weighting, M, = 0

!P t( ) = −Q[ ]− F[ ]T P t( )− P t( ) F[ ]+ P t( )GR−1GTP t( )P t f( ) = Pf

Δu(t) = −R−1 GTP t( )⎡⎣ ⎤⎦Δx t( )

Δ2J = 12ΔxT (t f )P(t f )Δx(t f )

+ 12


⎤⎦Q 00 R

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

t0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

No state/control coupling in cost function

Δx(t) = FΔx(t) +GΔu(t)

61

Nonlinear System with Neighboring-Optimal Feedback Control

Nonlinear dynamic system

x(t) = f x t( ),u(t)⎡⎣ ⎤⎦

u(t) = u * (t) −C(t)Δx t( ) = u * (t) −C(t) x t( ) − x * t( )⎡⎣ ⎤⎦

x(t) = f x t( ), u * (t) −C(t) x t( ) − x * t( )⎡⎣ ⎤⎦⎡⎣ ⎤⎦{ }

Neighboring-optimal control law

Nonlinear dynamic system with neighboring-optimal feedback control

62

Development of Neighboring-Optimal Therapy

x(t) = x * (t) + Δx(t)u(t) = u * (t) + Δu(t)

•  Expand dynamic equation to first degree

•  Compute nominal optimal control history using original nonlinear dynamic model

•  Compute optimal perturbation control using locally linearized dynamic model

•  Sum the two for neighboring-optimal control of the dynamic system

Δu(t) = −C(t) x(t)− xopt (t)⎡⎣ ⎤⎦u(t) = uopt (t)+ Δu(t)

u(t) = uopt (t)

63

First-Order LQ Example Code% First-Order LQ Example% Rob Stengel % 2/23/2011 clear global tp p q tw w xo = 0; to = 0; tf = 10; tspanx = [to tf]; tw = [0:0.01:10]; for k = 1:length(tw) w(k) = randn(1); end [tx,x] = ode15s('First',tspanx,xo); pf = 0; q = 100; tspanp = [tf to]; [tp,p] = ode15s('FirstRiccati',tspanp,pf); [tc,xc] = ode15s('FirstCL',tspanx,xo); u = interp1(tp,p,tc).*xc; figure subplot(4,1,1) plot(tx,x),grid,xlabel('Time'),ylabel('x-open-loop') subplot(4,1,2) plot(tp,p,'r'),grid,xlabel('Time'),ylabel('p') subplot(4,1,3) plot(tc,u),grid, xlabel('Time'),ylabel('u') subplot(4,1,4) plot(tc,xc,'r',tx,x,'b'),grid,xlabel('Time'),ylabel('x- closed-loop')

function xdot = First(t,x)global tp p tw w wt = interp1(tw,w,t,'nearest');xdot = -x + wt;

function pdot = FirstRiccati(t,p)

global qpdot = -q + 2*p + p^2;

function xdot = FirstCL(tc,xc);

global tp p tw w wt = interp1(tw,w,tc,'nearest');pt = interp1(tp,p,tc);xdot = -(1 + pt)*xc + wt;

64

Initial-Condition Response via State Transition

Φ = I+ F δ t( ) + 12!F δ t( )⎡⎣ ⎤⎦

2

+ 13!F δ t( )⎡⎣ ⎤⎦

3+ ...

Δx(t1) = Φ t1 − to( )Δx(t0 )Δx(t2 ) = Φ t2 − t1( )Δx(t1)Δx(t3) = Φ t3 − t2( )Δx(t2 )…

Δx(t1) = Φ δ t( )Δx(t0 ) = ΦΔx(t0 )Δx(t2 ) = ΦΔx(t1) = Φ2Δx(t0 )Δx(t3) = ΦΔx(t2 ) = Φ3Δx(t0 )

…

Propagation of Δx tk( ) in LTI system

State transition matrix is constant if tk − tk−1( ) = δ t = constant

65

Example: Equivalent Continuous-Time and Discrete-Time System Matrices

F = −1.2794 −7.98561 −1.2709

⎡

⎣⎢

⎤

⎦⎥

G = −9.0690

⎡

⎣⎢

⎤

⎦⎥

L = −7.9856−1.2709

⎡

⎣⎢

⎤

⎦⎥

Φ = 0.845 −0.69360.0869 0.8457

⎡

⎣⎢

⎤

⎦⎥

Γ = −0.8404−0.0414

⎡

⎣⎢

⎤

⎦⎥

Λ = −0.6936−0.1543

⎡

⎣⎢

⎤

⎦⎥

Φ = 0.0823 −1.47510.1847 0.0839

⎡

⎣⎢

⎤

⎦⎥

Γ = −2.4923−0.6429

⎡

⎣⎢

⎤

⎦⎥

Λ = −1.4751−0.9161

⎡

⎣⎢

⎤

⎦⎥

Continuous-time (“analog”) system

Discrete-time (“digital”) system

Time interval has a large

effect on the discrete-time

matrices

δt = 0.1s

δt = 0.5s

66

Evaluating Sampled-Data Weighting Matrices

Φ t,tk( ) and Γ t,tk( ) vary in the integration interval

Break interval into smaller intervals, and approximate as sum of short rectangular integration steps

Q̂ = ΦT t,0( )QΦ t,0( )dt0

Δt

∫

! ΦT tk−1,0( )QΦ tk−1,0( )δ t⎡⎣ ⎤⎦k=1

100

∑ , δ t = Δt 100, tk = kδ t

! eFT tk−1QeFtk−1δ t⎡⎣ ⎤⎦

k=1

100

∑

67

Q̂,M̂, and R̂ evaluated just once for LTI system

Γ = eFδ t − I( )F−1G = I + 12!F δt( )⎡⎣ ⎤⎦ +

13!F δt( )⎡⎣ ⎤⎦

2 + ...⎛⎝⎜

⎞⎠⎟Gδt

Evaluating Sampled-Data Weighting Matrices

M̂ = ΦT t,0( ) QΓ t,0( ) +M⎡⎣ ⎤⎦dt0

Δt

∫

! eFT tk−1Q I+ 1

2!Ftk−1[ ]+ 13! Ftk−1[ ]2 + ...⎛

⎝⎜⎞⎠⎟Gtk−1

⎡⎣⎢

⎤⎦⎥+M

⎧⎨⎩

⎫⎬⎭δ t

k=1

100

∑

R̂ ! R + ΓT tk−1( )M +MTΓ tk−1( ) + ΓT tk−1( )QΓ tk−1( )⎡⎣ ⎤⎦δ tk=1

100

∑68

Sampled-Data Cost Function Weighting Always Includes

State-Control Weighting

M̂ = ΦT t,tk( ) QΓ t,tk( ) +M⎡⎣ ⎤⎦dttk

tk+1

∫

= ΦT t,tk( )QΓ t,tk( )dttk

tk+1

∫ even if M = 0

Lk =12

ΔxkTQ̂Δxk + 2Δxk

TM̂Δuk + ΔukT R̂Δuk⎡⎣ ⎤⎦

Sampled-Data Lagrangian

69

Continuous-Time LQ Optimization via Dynamic

Programming

70

Dynamic Programming Approach to Continuous-

Time LQ Control

V (t0 ) =12ΔxT (t f )P(t f ) f Δx(t f )

+ 12


⎤⎦


⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

to

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

V (t1) =12ΔxT (t f )P(t f )Δx(t f )

− 12


⎤⎦


⎡

⎣⎢⎢

⎤

⎦⎥⎥

Δx(t)Δu(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

tf

t1

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Value Function at to

Value Function at t1

71

Dynamic Programming Approach to Continous-

Time LQ Control

∂V * Δx*(t1)[ ]∂t

=

−minΔu

12

Δx*T (t1)Q(t1)Δx*(t1)+ 2Δx*T (t1)M(t1)Δu(t1)+ ΔuT (t1)R(t1)Δu(t1)⎡⎣ ⎤⎦

+∂V * Δx*(t1)[ ]

∂ΔxF(t1)Δx*(t1)+G(t1)Δu(t1)[ ]

⎧

⎨⎪⎪

⎩⎪⎪

⎫

⎬⎪⎪

⎭⎪⎪

Time Derivative of Value Function

72

Dynamic Programming Approach to Continuous-

Time LQ Control

∂V * Δx*[ ]∂t

= −minΔu

H Δx*,Δu, ∂V *∂Δx

⎡⎣⎢

⎤⎦⎥,

V * Δx(t f )⎡⎣ ⎤⎦ = Δx*T (t f )P(t f )Δx*T (t f )

Hamiltonian

H Δx*,Δu, ∂V *∂Δx

⎡⎣⎢

⎤⎦⎥!

12

Δx*T QΔx*+2Δx*T MΔu+ ΔuTRΔu⎡⎣ ⎤⎦ +∂V * Δx*[ ]

∂ΔxFΔx*+GΔu[ ]

HJB Equation

73

Plausible Form for the Value Function

V * Δx * (t)[ ] = Δx *T (t)P(t)Δx *T (t)

∂V *∂t

= −12

Δx *T (t1) P(t1)Δx * (t1)⎡⎣ ⎤⎦

∂V *∂Δx

= Δx *T (t)P(t)

Time Derivative of the Value Function

Gradient of the Value Function with respect to the state

Quadratic Function of State Perturbation

74

Optimal Control Law and HJB Equation

∂H∂u

= ΔxTM + ΔuTR + ΔxTPG = 0

Δu(t) = −R−1 GTP+MT( )Δx(t)

ΔxT !PΔx =

ΔxT −Q+MR−1MT⎡⎣ ⎤⎦ − F −GR−1MT⎡⎣ ⎤⎦TP− P F −GR−1MT⎡⎣ ⎤⎦ + PGR

−1GTP{ }Δx

Optimal control law

Incorporate Value Function Model in HJB equation

Δx t( ) can be cancelled on left and right 75

Matrix Riccati Equation

!P t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦

− F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TP t( )

−P t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦+P t( )G(t)R−1(t)GT (t)P t( )

P t f( ) = φxx t f( )

76

Example: 1st-Order System with LQ Feedback Control

1st-order discrete-time dynamic system

LQ optimal control law

Dynamic system with LQ feedback controlΔxk+1 = φΔxk + γΔuk

= φΔxk + γ −ckΔx( )k= φ −γ ck( )Δxk

Δxk+1 = φΔxk + γΔuk

Δuk = −

m + φγ pk+1r + γ 2 pk+1

Δxk −ckΔxk pk = q + φ2 pk+1 −

m + φγ pk+1( )2

r + γ 2 pk+1

, pkf given

77

Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear...

Documents

Transcript of Neighboring-Optimal Control via Linear-Quadratic (LQ) FeedbackCubic Springs Force is a nonlinear...