Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen

Ch 17. Optimal control theory and the linear Bellman equation

HJ Kappen

BTSM Seminar12.07.19.(Thu)

Summarized by Joon Shik Kim

Introduction• Optimising a sequence of actions to attain some fu-

ture goal is the general topic of control theory.• In an example of a human throwing a spear to kill an

animal, a sequence of actions can be assigned a cost consists of two terms.

• The first is a path cost that specifies the energy con-sumption to contract the muscles.

• The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it.

• The optimal control solution is a sequence of motor commands that results in killing the animal by throw-ing the spear with minimal physical effort.

Discrete Time Control (1/3)• where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that spec-ifies the control or action at time t. • A cost function that assigns a cost to each sequence of

controls

where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.

1 ( , , ),t t t tx x f t x u 0,1,..., 1,t T

0 0: 10

( , ) ( ) ( , , )T

T T t tt

C x u x R t x u

Discrete Time Control (3/3)• The problem of optimal control is to

find the sequence u0:T-1 that min-imises C(x0, u0:T-1).

• The optimal cost-to-go: 1

( , ) min ( ) ( , , )t T

t T s su s t

J t x x R s x u

min( ( , , ) ( 1, ( , , ))).t

t t t t tuR t x u J t x f t x u

Discrete Time Control (1/3)• The algorithm to compute the opti-

mal control, trajectory, and the cost is given by

• 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for

x compute

• 3. Forwards: For t=0,…,T-1 compute

( , ) ( ).J T x x

*( ) argmin{ ( , , ) ( 1, ( , , ))},tu

u x R t x u J t x f t x u * *( , ) ( , , ) ( 1, ( , , )).t tJ t x R t x u J t x f t x u

* * * * *1 ( , , ( )).t t t t tx x f t x u x

The HJB Equation (1/2)•

• (Hamilton-Ja-cobi-Belman equation)

• The optimal control at the current x, t is given by

• Boundary condition is

( , ) min( , , ) ( , ( , , ) )),u

J t x R x u dt J t dt x f x u t dt

min( ( , , ) ( , ) ( , ) ( , ) ( , , ) ),t xuR t x u dt J t x J t x dt J t x f x u t dt

( , ) min( ( , , ) ( , , ) ( , )).t xuJ t x R t x u f x u t J x t

( , ) argmin( , , ) ( , , ) ( , )).xuu x t R u t f x u t J t x

( , ) ( ).J x T x

The HJB Equation (2/2)Optimal control of mass on a spring

Stochastic Differential Equations (1/2)

• Consider the random walk on the line

with x0=0. • In a closed form, .• • In the continuous time limit we define

• The conditional probability distribution

1 ,t t tx x ,t

tt iix

0,tx 2 .tx t

0( )1( , | ,0) exp .22x xx t x

t t dt tdx x x d (Wiener Process)

Stochastic Optimal Control Theory (2/2)

• • dξ is a Wiener process with .• Since <dx2> is of order dt, we must

make a Taylor expansion up to order dx2.

( ( ), ( ), )dx f x t u t t dt d

( , , )i j ijd d t x u dt

21( , ) min ( , , ) ( , , ) ( , ) ( , , ) ( , ) .2t x xu

J t x R t x u f x u t J x t t x u J x t

Stochastic Hamilton-Jacobi-Bellman equation

( , , )dx f x u t dt 2 ( , , )dx t x u dt : drift : diffusion

Path Integral Control (1/2)• In the problem of linear control and

quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transforma-tion of the cost-to-go.( , ) log ( , ).J x t x t

21( , ) ( ) .2

Vx t f Tr g g

HJB becomes

Path Integral Control (2/2)• Let describe a diffusion

process for defined Fokker-Planck equation

( , | , )y x t t

21( ) ( ) .2

T TV f Tr g g

( , ) ( , | , ) exp( ( ) / ).x t dy y T x y y

The Diffusion Process as a Path In-tegral (1/2)

• Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ.

• Sampling process and Monte Carlo( , ) ( , ) ,dx f x t dt g x t d

,x x dx With probability 1-V(x,t)dt/λ,†,ix with probability V(x,t)/λ, in this case, path is killed.

1( , ) ( , | , ) exp( ( ) / ) exp( ( ( )) ).ii alive

x t dy y T x t y x TN

The Diffusion Process as a Path In-tegral (2/2)

where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.

1 1( ( ) | , ) exp ( ( )) .( , )

p x t T x t S x t Tx t

Discussion• One can extend the path integral con-

trol of formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other to maximise a common reward function.

• The path integral method has great potential for application in robotics.

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen

Documents

Transcript of Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen

Bert Kappen, Wim Wiegerinck SNN University of Nijmegen ...wimw/dirPRforAI/sheets_prml.pdf · Introduction to Pattern Recognition Bert Kappen, Wim Wiegerinck SNN University of Nijmegen,

BELLMAN AUDIO DOMINO - Harris Communications · 2009-07-22 · Bellman Audio Domino Personal Hearing System, BE2210/BE2230 Introduction Thank you for choosing products from Bellman

Geophysics Applications for Environmental & Engineering ... · Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 HJ-1 HJ-2 HJ-3 HJ-4 HJ-5 HJ-6 HJ-7 HJ-8 HJ-9 HJ-10

Visit telephone transmitter - Bellman

DP by Bellman Functional Equation

Bellman Ford's Algorithm

Fiscal & monetaryBook · Hj Mohd Zaki Hj Hassanol As’shari (RID) Dk. Faadzilah Pg. DP Hj Abu Bakar (BIFC) CONTRIBUTORS Hj Jefri bin Hj Md Salleh (RID) P.A. Huda P.A. Hj Idris (BIFC)

HJ UMAR ALI HJ ABDULLAH, ACTING CEO NOVEMBER 2011

Bellman Writing Analysis

Bellman ford (part-i)

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim.

MOVING FORWARD WITH THE COMMUNITY - muis.gov.sg · Hj Raja Mohamad Maiden, Hj Shafawi Ahmad, Hj Abdul Razak Hassan Maricar, Hj Mohammad Alami Musa, Dr Mohamed Fatris Bakaram, ...

Hamilton-Jacobi-Bellman Equations

Weighted Bellman Equations and their Applications …dimitrib/wbe_YB.pdfweighting of Bellman mappings, several other novel results: A new feature of our projected Bellman equation

Sam Bellman Institution Task

Brooke kappen

Antony and Cleopatra [James F. Bellman, Kathryn Bellman]

Algorithms Bellman-Ford and Floyd

Dato’ Hj Rais Hussin Hj Mohamed Ariff

Interpreter Training Programs - .: HJ:. About HJ