Stochastic and Adaptive Optimal Control · 2018-05-21 · Stochastic Value Function for a Nonlinear...

Stochastic and Adaptive Optimal Control

Robert Stengel Optimal Control and Estimation, MAE 546

Princeton University, 2018

Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html

http://www.princeton.edu/~stengel/OptConEst.html

!  Nonlinear systems with random inputs and perfect measurements

!  Stochastic neighboring-optimal control!  Linear-quadratic (LQ) optimal control with

perfect measurements!  Adaptive Critic Control!  Information Sets and Expected Cost

1

Nonlinear Systems with Random Inputs and Perfect Measurements

Inputs and initial conditions are uncertain, but the state can be measured without error

x t( ) = f x t( ),u t( ),w t( ),t⎡⎣ ⎤⎦z t( ) = x t( )

E x 0( )⎡⎣ ⎤⎦ = x 0( )E x 0( )− x 0( )⎡⎣ ⎤⎦ x 0( )− x 0( )⎡⎣ ⎤⎦

T{ } = 0Assume that random disturbance

effects are small and additive

x t( ) = f x t( ),u t( ),t⎡⎣ ⎤⎦ + L t( )w t( )

E w t( )⎡⎣ ⎤⎦ = 0

E w t( )wT τ( )⎡⎣ ⎤⎦ =W t( )δ t −τ( )

2

Cost Must Be an Expected Value•  Deterministic cost function cannot be minimized because

–  disturbance effect on state cannot be predicted–  state and control are random variables

minu(t )

J = φ x(t f )⎡⎣ ⎤⎦ + L x(t),u(t)[ ]to

t f

∫ dt

minu(t )

J = E φ x(t f )⎡⎣ ⎤⎦ + L x(t),u(t)[ ]to

t f

∫ dt⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

•  However, the expected value of a deterministic cost function can be minimized

3

Stochastic Euler-Lagrange Equations?

There is no single optimal trajectoryExpected values of Euler-Lagrange necessary

conditions may not be well defined

1) E λ(t f )⎡⎣ ⎤⎦ = E∂φ[x(t f )]

∂x⎧⎨⎩

⎫⎬⎭

T

2) E λ(t)⎡⎣ ⎤⎦ = −E

∂H[x(t),u(t),λ(t),t]∂x

⎧⎨⎩

⎫⎬⎭

T

3) E∂H[x(t),u(t),λ(t),t]

∂u⎧⎨⎩

⎫⎬⎭= 0

4

Stochastic Value Function for a Nonlinear System

Expected values of terminal and integral cost are well defined

Apply Hamilton-Jacobi-Bellman (HJB) to expected terminal and integral costs

Principle of OptimalityOptimal expected value function at t1

V * t1( ) = E φ x*(t f )⎡⎣ ⎤⎦ − L x*(τ ),u*(τ )[ ]t f

t1

∫ dτ⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

= minuE φ x*(t f )⎡⎣ ⎤⎦ − L x*(τ ),u(τ )[ ]

t f

t1

∫ dτ⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪5

Rate of Change of the Value Function

dV *dt t= t1

= −E L x * (t1),u * (t1)[ ]{ }

x(t) and u(t) can be known precisely; therefore

Total time-derivative of V*

dV *dt t= t1

= −L x * (t1),u * (t1)[ ]

6

Incremental Change in the Value FunctionApply chain rule to total derivative

dV *dt

= E ∂V *∂t

+∂V *∂xx⎡

⎣⎢⎤⎦⎥

ΔV*= dV *dt

Δt = E ∂V *∂t

Δt + ∂V *∂x!xΔt + 1

2!xT ∂

2V *∂x2

!x⎛⎝⎜

⎞⎠⎟Δt 2 +"

⎡

⎣⎢

⎤

⎦⎥

= E ∂V *∂t

Δt + ∂V *∂x

f .( ) +Lw .( )( )Δt + 12 f .( ) +Lw .( )( )T ∂2V *∂x2

f .( ) +Lw .( )( )⎡

⎣⎢

⎤

⎦⎥Δt

2 +"⎧⎨⎩

⎫⎬⎭

Incremental change in value function,Expand to second degree

Next: cancel Δt7

Introduction of the TraceTrace of a matrix product is scalar

Tr ABC( ) = Tr CAB( ) = Tr BCA( )Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr •( )⎡⎣ ⎤⎦ = 1×1

dV *dt

≈ E ∂V *∂t

+ ∂V *∂x

f .( ) +Lw .( )( ) + 12 Tr f .( ) +Lw .( )⎡⎣ ⎤⎦T ∂2V *

∂x2f .( ) +Lw .( )⎡⎣ ⎤⎦

⎡

⎣⎢

⎤

⎦⎥Δt

⎧⎨⎩

⎫⎬⎭

= E ∂V *∂t

+ ∂V *∂x

f .( ) +Lw .( )( ) + 12 Tr∂2V *∂x2

f .( ) +Lw .( )⎡⎣ ⎤⎦ f .( ) +Lw .( )⎡⎣ ⎤⎦T⎡

⎣⎢

⎤

⎦⎥Δt

⎧⎨⎩

⎫⎬⎭

8

Toward the Stochastic HJB Equation

dV *dt

= E ∂V *∂t

+ ∂V *∂x

f .( ) +Lw .( )⎡⎣ ⎤⎦ +12Tr ∂2V *

∂x2f .( ) +Lw .( )⎡⎣ ⎤⎦ f .( ) +Lw .( )⎡⎣ ⎤⎦

T⎡

⎣⎢

⎤

⎦⎥Δt

⎧⎨⎩

⎫⎬⎭

= ∂V *∂t

+ ∂V *∂x

f .( ) + E ∂V *∂x

Lw .( ) + 12Tr ∂2V *

∂x2f .( ) +Lw .( )⎡⎣ ⎤⎦ f .( ) +Lw .( )⎡⎣ ⎤⎦

T⎡

⎣⎢

⎤

⎦⎥Δt

⎧⎨⎩

⎫⎬⎭

Because x(t) and u(t) can be measured,

1st two terms can be taken outside the expectation

9

Toward the Stochastic HJB Equation

E w t( )⎡⎣ ⎤⎦ = 0

E f .( )wT τ( )⎡⎣ ⎤⎦ = 0

dV *dt

= ∂V *∂t

+ ∂V *∂x

f .( ) + 12limΔt→0

Tr ∂2V *∂x2

E f .( )f .( )T( )Δt +LE w t( )w τ( )T( )LT⎡⎣

⎤⎦Δt

⎧⎨⎩

⎫⎬⎭

= ∂V *∂t

t( ) + ∂V *∂x

t( )f .( ) + 12Tr ∂2V *

∂x2t( )L t( )W t( )L t( )T⎡

⎣⎢

⎤

⎦⎥

Uncertain disturbance input can only increase the value function rate of change

Disturbance is assumed to be zero-mean white noise

10

E w t( )wT τ( )⎡⎣ ⎤⎦ =W t( )δ t −τ( )E w t( )wT τ( )⎡⎣ ⎤⎦Δt Δt→0⎯ →⎯⎯ W t( )

0

Stochastic Principle of Optimality (Perfect Measurements)

∂V *∂t

t( ) =

−minuE − dV *

dt+ ∂V *

∂xt( )f x* t( ),u t( ),t⎡⎣ ⎤⎦ +

12Tr ∂2V *

∂x2t( )L t( )W t( )L t( )T⎡

⎣⎢

⎤

⎦⎥

⎧⎨⎩

⎫⎬⎭

dV *dt

= ∂V *∂t

t( ) + ∂V *∂x

t( )f .( ) + 12Tr ∂2V *

∂x2t( )L t( )W t( )L t( )T⎡

⎣⎢

⎤

⎦⎥

!  Substitute for total derivative, dV*/dt = –L(x*,u*)!  Solve for the partial derivative, ∂V*/∂t!  Expected value of HJB Equation -> Stochastic HJB Equation

11

= −minuE L x * t( ),u t( ),t⎡⎣ ⎤⎦ +

∂V *∂x

t( )f x * t( ),u t( ),t⎡⎣ ⎤⎦ +12

Tr ∂2V *∂x2 t( )L t( )W t( )L t( )T⎡

⎣⎢

⎤

⎦⎥

⎧⎨⎩

⎫⎬⎭

Boundary (terminal) condition : V * t f( ) = E φ t f( )⎡⎣ ⎤⎦

Observations of Stochastic Principle of Optimality

(Perfect Measurements)∂V *∂t

t( ) =

−minuE L x* t( ),u t( ),t⎡⎣ ⎤⎦ +

∂V *∂x

t( )f x* t( ),u t( ),t⎡⎣ ⎤⎦ +12Tr ∂2V *

∂x2t( )L t( )W t( )L t( )T⎡

⎣⎢

⎤

⎦⎥

⎧⎨⎩⎪

⎫⎬⎭⎪

!  Control has no effect on the disturbance input!  Criterion for optimality is the same as for the

deterministic case!  Disturbance uncertainty increases the magnitude

of the total optimal value function, E[V*(0)]

12

Adaptive Critic Controller

13

Adaptive Critic Controller•  Nonlinear control law, c, takes the general form

•  On-line adaptive critic controller– Nonlinear control law (“action network”)–  “Criticizes” non-optimal performance via “critic network”

•  Adapts control gains to improve performance, respond to failures, and accommodate parameter variation

•  Adapts cost model to improve performance evaluation

�

u t( ) = c x(t),a,y * t( )[ ]x(t) : statea : parameters defining an operating point

y*(t) : command input

14

15

Ferrari, Stengel, “Model-Based Adaptive Critic Designs,” in Learning and Approximate Dynamic Programming, Si et al, ed., 2004

Adaptive Critic Iteration Cycle

16

Overview of Adaptive Critic Designs

17

Modular Description of Typical Iteration Cycle

Gain Scheduled Controller•  Design PI-LQ controllers (LQR with integral

compensation) that satisfy requirements at N operating points

•  Scheduling variable, a, identifies each operating point

u tk( ) = CF ai( )y* tk( ) +CB ai( )Δx tk( ) +CI ai( ) Δy τ( )dτtk−1

tk

∫≈ c x(tk ),ai ,y* tk( )⎡⎣ ⎤⎦, i = 1,N

18

Replace Gain Matrices by Neural Networks

Replace control gain matrices by sigmoidal neural networks

u tk( ) =NNF y* tk( ),a tk( )⎡⎣ ⎤⎦ +NNB x tk( ),a tk( )⎡⎣ ⎤⎦ +NN I Δy τ( )dτ∫ ,a tk( )⎡

⎣⎤⎦

= c x(tk ),a,y* tk( )⎡⎣ ⎤⎦

19

Sigmoidal Neuron Input-Output Characteristic

Sigmoid with two inputs, one output

Logistic sigmoid function

u = s(r) = 11+ e−r

u = s(r) = 11+ e− w1r1 +w2r2 +b( )

20

u = s(r) = er −1er +1

= tanh r2

or

Algebraically Trained Neural Control Law

•  Algebraic training of neural networks produces exact fit of PI-LQ control gains and trim conditions at N operating points –  Interpolation and gain scheduling via neural networks–  One node per operating point in each neural network–  See Ferrari, Stengel, JGCD, 2002 [on Blackboard]

21

On-line Optimization of Adaptive Critic Neural Network Controller

Critic adapts neural network weights to improve performance using approximate dynamic programming

22 Ferrari, Stengel, JGCD, 2004

Heuristic Dynamic Programming Adaptive Critic

•  Dual Heuristic, Dynamic Programming Adaptive Critic for receding-horizon optimization problem

V xa tk( )⎡⎣ ⎤⎦ = L xa tk( ),u tk( )⎡⎣ ⎤⎦ +V xa tk+1( )⎡⎣ ⎤⎦

∂V∂u

= ∂L∂u

+ ∂V∂xa

∂xa∂u

= 0∂V xa t( )[ ]∂x a t( )

= NNC xa t( ),a t( )[ ]

23

•  Critic and Action (i.e., Control) networks adapted concurrently

•  LQ-PI cost function applied to nonlinear problem

•  Modified resilient backpropagation for neural network training

Ferrari, Stengel, JGCD, 2004

on 2nd Time Step

Action Network On-line TrainingTrain action network, at time t, holding the critic parameters fixed

NNC

Aircraft Model •  Transition Matrices •  State Prediction

Utility Function Derivatives

NNA

xa(t)

a(t)

Optimality Condition

NNA Target

Target Generation 24

Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed

NNC(old)

Utility Function Derivatives

NNA

NNC Target

Target Generation

Aircraft Model •  Transition Matrices •  State Prediction

NNC

Target Cost Gradient

xa(t) a(t)

25

Adaptive Critic Flight Simulation

26

70-deg Bank Angle Command Multiple Failures (Thrust, Stabilator, Rudder)

Ferrari, Stengel, JGCD, 2004

Stochastic Linear-Quadratic Optimal Control

27

Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem

Linear dynamic constraint

V to( ) = E φ x(t f )⎡⎣ ⎤⎦ − L x(τ ),u(τ )[ ]t f

to

∫ dτ⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

= 12E xT (t f )S(t f )x(t f )− xT (t) uT (t)⎡

⎣⎤⎦

Q(t) M(t)MT (t) R(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥

x(t)u(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

t f

to

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

x t( ) = F(t)x(t) +G(t)u(t) + L(t)w(t)

Quadratic value function

28

Components of the LQ Value Function

Certainty-equivalent value function

V t( ) = 12xT (t)S(t)x(t) + v t( )

Quadratic value function has two parts

Stochastic value function increment VCE t( ) 1

2xT (t)S(t)x(t)

v t( ) = 12

Tr S τ( )L τ( )W τ( )L τ( )T⎡⎣ ⎤⎦dτt

t f∫29

Value Function Gradient and HessianCertainty-equivalent value function

Gradient with respect to the state VCE t( ) 1

2xT (t)S(t)x(t)

∂V∂x

t( ) = xT (t)S(t)

Hessian with respect to the state∂2V∂x2

t( ) = S(t)

30

Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation

(Perfect Measurements)Certainty-equivalent plus stochastic terms

∂V *∂t

= −minu

12E x*T Qx*+2x*T Mu+ uTRu( ) + x*T S Fx*+Gu( ) + Tr SLWLT( )⎡⎣ ⎤⎦

= −minu

12x*T Qx*+2x*T Mu+ uTRu( ) + x*T S Fx*+Gu( ) + Tr SLWLT( )⎡⎣ ⎤⎦

Terminal condition

V t f( ) = 12 xT (t f )S(t f )x(t f )

31

Optimal Control Law(Perfect Measurements)

Differentiate right side of HJB equation w.r.t. u and set equal to zero

∂ ∂V ∂t( )∂u

= 0 = xTM + uTR( ) + xTSG⎡⎣ ⎤⎦

u t( ) = −R−1 t( ) GT t( )S t( ) +MT t( )⎡⎣ ⎤⎦x t( )! −C t( )x t( )

Solve for u, obtaining feedback control law

32

LQ Optimal Control Law(Perfect Measurements)

u t( ) = −R−1 t( ) GT t( )S t( ) +MT t( )⎡⎣ ⎤⎦x t( ) −C t( )x t( )

Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law

33

Matrix Riccati Equation for ControlSubstitute optimal control law in HJB equation

Matrix Riccati equation provides S(t)

S t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TS t( )

−S t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ + S t( )G(t)R−1(t)GT (t)S t( ), S t f( ) = φxx t f( )

12xT Sx + v = 1

2xT −Q+MR−1MT( )− F −GR−1MT( )T S− S F −GR−1MT( ) + SGR−1GTS⎡

⎣⎤⎦x

+ 12Tr SLWLT( ) u t( ) = −R−1 t( ) GT t( )S t( ) +MT t( )⎡⎣ ⎤⎦x t( )

!  Stochastic value function increases cost due to disturbance!  However, its calculation is independent of the Riccati equation

v = 1

2Tr SLWLT( )

34

Evaluation of the Total Cost(Imperfect Measurements)

!  Stochastic quadratic cost function, neglecting cross terms

J =12Tr E xT (t f )S(t f )x(t f )⎡⎣ ⎤⎦ + E xT (t) uT (t)⎡

⎣⎤⎦Q(t) 00 R(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥

x(t)u(t)

⎡

⎣⎢⎢

⎤

⎦⎥⎥dt

to

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

=12Tr S(t f )E x(t f )x

T (t f )⎡⎣ ⎤⎦ + Q(t)E x(t)xT (t)⎡⎣ ⎤⎦ + R(t)E u(t)uT (t)⎡⎣ ⎤⎦{ }dtto

t f

∫

J =12Tr S(t f )P(t f ) + Q(t)P(t) + R(t)U t( )⎡⎣ ⎤⎦dt

to

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

where

P(t) E x(t)xT (t)⎡⎣ ⎤⎦U t( ) E u(t)uT (t)⎡⎣ ⎤⎦

or

35

Optimal Control Covariance

u t( ) = −C t( ) x̂ t( )Optimal control vector

U t( ) = C t( )P t( )CT t( )= R−1 t( )GT t( )S t( )P t( )S t( )G t( )R−1 t( )

Optimal control covariance

36

Express Cost with State and Adjoint Covariance Dynamics

Integration by parts to express S(tf)P(tf)

J =

12Tr S(to )P to( ) + Q(t)P t( ) + R(t)U t( ) + S(t)P(t) + S(t) P(t)⎡⎣ ⎤⎦dt

to

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

S(t)P(t) tot f = S(t)P(t)+ S(t) P(t)⎡⎣ ⎤⎦dt

to

t f

∫

S(t f )P(t f ) = S(to )P(to )+ S(t)P(t)+ S(t) P(t)⎡⎣ ⎤⎦dtto

t f

∫Rewrite cost function to incorporate initial cost

37

Evolution of State and Adjoint Covariance Matrices

(No Control)

State covariance response to random disturbance and initial uncertainty

Adjoint covariance response to state weighting and terminal cost weight

P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P to( ) given

S t( ) = −FT t( )S t( ) − S t( )F t( ) −Q t( ), S t f( ) given

u t( ) = 0; U t( ) = 0

38

Evolution of State and Adjoint Covariance Matrices

(Optimal Control)

P t( ) = F t( ) −G t( )C t( )⎡⎣ ⎤⎦P t( ) + P t( ) F t( ) −G t( )C t( )⎡⎣ ⎤⎦

T + L t( )W t( )LT t( )

S t( ) = −FT t( )S t( ) − S t( )F t( ) −Q t( ) − S t( )G t( )R−1 t( )GT t( )S t( )

Dependent on S(t)

Independent of P(t)

39

State covariance response to random disturbance and initial uncertainty

Adjoint covariance response to state weighting and terminal cost weight

With no control

Jno control =12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt

to

t f

∫⎡

⎣⎢⎢

⎤

⎦⎥⎥

With optimal control, the equation for the cost is the same

Joptimal control =12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt

to

t f

∫⎡

⎣⎢⎢

⎤

⎦⎥⎥

... but evolutions of S(t) and S(to) are different in each case

Total Cost With and Without Control

40

Information Sets and Expected Cost

41

!  Sigma algebra(Wikipedia definitions)!  The collection of sets over which a measure is defined!  The collection of events that can be assigned probabilities!  A measurable space

!  Information available at current time, t1!  All measurements from initial time, to!  All control commands from initial time

I to,t1[ ] = z to,t1[ ],u to,t1[ ]{ }

The Information Set, I

!  Plus available model structure, parameters, and statistics

I to,t1[ ] = z to,t1[ ],u to,t1[ ], f •( ),Q,R,{ }

42

!  Measurements may be directly useful, e.g.,!  Displays!  Simple feedback control

!  ... or they may require processing, e.g.,!  Transformation!  Estimation

!  Example of a derived information set!  History of mean and covariance from a state estimator

ID to,t1[ ] = x̂ to,t1[ ],P to,t1[ ],u to,t1[ ]{ }

A Derived Information Set, ID

43

!  Markov derived information set!  Most current mean and covariance from

a state estimator

IMD t1( ) = x̂ t1( ),P t1( ),u t1( ){ }

Additional Derived Information Sets

!  Multiple model derived information set!  Parallel estimates of current mean, covariance,

and hypothesis probability mass function

IMM t1( ) =x̂A t1( ),PA t1( ),u t1( ),Pr HA( )⎡⎣ ⎤⎦, x̂B t1( ),PB t1( ),u t1( ),Pr HB( )⎡⎣ ⎤⎦,!{ }

44

!  Optimal control requires propagation of information back from the final time!  Hence, it requires the entire information set, extending from to

to tf

Required and Available Information Sets for Optimal Control

I to,t f⎡⎣ ⎤⎦

!  Separate information set into knowable and future sets

I to,t f⎡⎣ ⎤⎦ = I to,t1[ ] + I t1,t f⎡⎣ ⎤⎦

!  Knowable set has been received!  Future set is based on current knowledge of model, current estimates,

and statistics of future uncertainties

45

Expected Values of State and Control

Expected values of the state and control are conditioned on the information set

E x t( ) | ID⎡⎣ ⎤⎦ = x̂ t( )E x t( )− x̂ t( )⎡⎣ ⎤⎦ x t( )− x̂ t( )⎡⎣ ⎤⎦

T | ID{ } = P t( )

... where the conditional expected values are estimates from an optimal filter

46

Dependence of the Stochastic Cost Function on the Information Set

Expand the state covariance

J = 12E E Tr S(t f )x(t f )x

T (t f )⎡⎣ ⎤⎦ | ID⎡⎣ ⎤⎦ + E Tr Qx t( )xT t( )⎡⎣ ⎤⎦{ }dt0

t f

∫ + E Tr Ru t( )uT t( )⎡⎣ ⎤⎦{ }dt0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

P t( ) = E x t( )− x̂ t( )⎡⎣ ⎤⎦ x t( )− x̂ t( )⎡⎣ ⎤⎦T | ID{ }

= E x t( )xT t( )− x̂ t( )xT t( )− x t( ) x̂T t( ) + x̂ t( ) x̂T t( )⎡⎣ ⎤⎦ | ID{ } E x t( ) x̂T t( )⎡⎣ ⎤⎦ | ID{ } = E x̂ t( )xT t( )⎡⎣ ⎤⎦ | ID{ } = x̂ t( ) x̂T t( )

P t( ) = E x t( )xT t( )⎡⎣ ⎤⎦ | ID{ } − x̂ t( ) x̂T t( )or

E x t( )xT t( )⎡⎣ ⎤⎦ | ID{ } = P t( ) + x̂ t( ) x̂T t( )

... where the conditional expected values are

obtained from an optimal filter

47

Certainty-Equivalent and Stochastic Incremental Costs

J = 12E Tr S(t f ) P t f( ) + x̂(t f )x̂T (t f )⎡⎣ ⎤⎦{ }+ Tr Q P t( ) + x̂ t( ) x̂T t( )⎡⎣ ⎤⎦{ }dt

0

t f

∫ + Tr Ru t( )uT t( )⎡⎣ ⎤⎦dt0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪ JCE + JS

JCE =12E Tr S(t f )x̂(t f )x̂

T (t f )⎡⎣ ⎤⎦ + Tr Qx̂ t( ) x̂T t( ){ }dt0

t f

∫ + Tr Ru t( )uT t( )⎡⎣ ⎤⎦dt0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

JS =12E Tr S(t f )P t f( )⎡⎣ ⎤⎦ + Tr QP t( )⎡⎣ ⎤⎦dt

0

t f

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

!  Cost function has two parts!  Certainty-equivalent cost!  Stochastic increment cost

48

Expected Cost of the Trajectory

V * to( ) J * t f( ) = E φ x * (t f )⎡⎣ ⎤⎦ + L x * (τ ),u * (τ )[ ]

t0

tF

∫ dτ⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

E ξ( ) = E ξ | I to,t1[ ]( )Pr I to,t1[ ]{ }+ E ξ | I t1,t f⎡⎣ ⎤⎦( )Pr I t1,t f⎡⎣ ⎤⎦{ }= E E ξ |I( )⎡⎣ ⎤⎦

Law of total expectation

Optimized cost function

Because the past is established at t1

E J *( ) = E J* | I to,t1[ ]( ) 1[ ]+ E J* | I t1,t f⎡⎣ ⎤⎦( )Pr I t1,t f⎡⎣ ⎤⎦{ }= E J* | I to,t1[ ]( ) + E J* | I t1,t f⎡⎣ ⎤⎦( )Pr I t1,t f⎡⎣ ⎤⎦{ }

49

Expected Cost of the Trajectory For planning

50

E J *( ) =E J* | I t0,t1( )( ) + E J* | I t1,t f( )⎡⎣ ⎤⎦Pr I t1,t f( ){ }

Known Estimated

For real-time control, t1 ≤ tf

For post-trajectory analysis (with perfect measurements)

E J *( ) = E J* | I t0,t f( )⎡⎣ ⎤⎦Pr I t0,t f( ){ }

E J *( ) = E J* | I t0,t f( )( )Pr I t0,t f( ){ } = E J* | I t0,t f( )( ) 1[ ]

Dual Control(Fel’dbaum, 1965)

!  Nonlinear system!  Uncertain system parameters to be estimated!  Parameter estimation can be aided by test inputs

!  Approach: Minimize value function with three increments!  Nominal control!  Cautious control!  Probing control

minuV* =

minu

V *nominal + V *cautious + V *probing( )

Estimation and control calculations are coupled and necessarily recursive

51

Next Time:Linear-Quadratic-Gaussian

Regulators

52

Stochastic and Adaptive Optimal Control · 2018-05-21 · Stochastic Value Function for a Nonlinear...

Documents

Transcript of Stochastic and Adaptive Optimal Control · 2018-05-21 · Stochastic Value Function for a Nonlinear...