Reinforcement Learning in Control Theory

36
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie

Transcript of Reinforcement Learning in Control Theory

Page 1: Reinforcement Learning in Control Theory

Reinforcement Learning in

Control TheoryFarnaz Adib Yaghmaie

Page 2: Reinforcement Learning in Control Theory

1 Introduction to RL

2 RL for Continuous-time Systems

3 Simulation results

4 Open Problems

Page 3: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 2 / 34

Introduction to RL

Page 4: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 3 / 34

Machine Learning

• Supervised Learning

• Unsupervised Learning

• Reinforcement Learning

Finding suitable actions to take in a given

situation in order to maximize a reward 1.

1Richard S Sutton & Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT

press Cambridge, 1998.

Page 5: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 4 / 34

Beating Atari champions2

2Mnih et al. Playing Atari with Deep Reinforcement Learning, arXiv preprint, 2013.

Page 6: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 5 / 34

Make a Robot Walk 3

3Schulman et al. Trust Region Policy Optimization, arXiv preprint, 2017.

Page 7: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 6 / 34

A graphical representation

Photo Credit: @en.wikipedia.org/wiki/Reinforcement_learning

Page 8: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 7 / 34

Optimality or adaptivity or both?

Optimal Design

• Offline• Model-base

Adaptive Design

• Online• Non-optimal• Model-free

Reinforcement Learning: Optimality +Adaptivity 4.

4Lewis et al. Reinforcement learning and feedback control, IEEE Control Systems Magazine,

2012.

Page 9: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 8 / 34

Markov Decision Process (MDP)

• States

• Actions

• Transition

probability

• Reward

• Discount factor

Photo Credit: @ https://en.wikipedia.org/wiki/Markov_decision_process

Page 10: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 9 / 34

RL for Continuous-timeSystems

Page 11: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 10 / 34

Optimal control problem

• Agent

u = k(x).• Environment

x = f(x) + g(x)u = F (x, u).• Average value function

Va(x(t), u(t)) = limT →∞

1T

∫ t+T

tr(x(τ), u(τ))dτ.

Optimal average cost

λ∗.

Page 12: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 11 / 34

Optimal control problem

Optimize the value function

Va(x(t), u(t)) = limT →∞

1T

∫ t+T

tr(x(τ), u(τ))dτ.

with respect to

x = f(x) + g(x)u = F (x, u).

Page 13: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 12 / 34

Do I sell optimal control problem again with the fancy

name RL???

Page 14: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 13 / 34

Why to use RL at all?

• No analytical solution in general

• Difficult to solve

• Computation is offline

• Exact knowledge of dynamics is required

Page 15: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 14 / 34

The key to RL

Bellman Principle ofOptimality

&Finding Fixed point equations!

λ∗ = minu

[r(x(t), u(t)) + ∇V ∗T∞ F (x, u)].

Page 16: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 15 / 34

Integral Reinforcement Learning (IRL)

Bellman in integral form 5 6

V ∗(x(t)) = minu

∫ t+T

tr(x(τ), u(τ)) − λ∗dτ + V ∗(x(t + T ))).

No dynamics is involved.

5Vrabie et al. Adaptive optimal control for continuous-time linear systems based on policy

iteration, Automatica, 20096Adib Yaghmaie et al. Reinforcement Learning for Average Cost Optimal Control Problem with

an Application in Tracking, TAC, Under revision, 2018.

Page 17: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 16 / 34

Methods to solve Bellman equation

• Monte-Carlo

• Temporal Difference Learning

• Least squares Temporal Difference Learning

• Their variations/combinations

Page 18: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 17 / 34

Algorithm 1 Policy Iteration Algoritm

1: Initialize: u(0), k = 0.2: repeat

3: Step 1: Policy evaluation

V (k)∞ (x(t)) =

∫ t+∆T

tQ(x) + u(k)T Ru(k)dτ

+ V (k)∞ (x(t + ∆T )) − λ(k)∆T.

4: Step 2: Policy improvement

u(k+1) = −12R−1gT ∇V (k)

∞ .

5: until Convergence

Page 19: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 18 / 34

Simulation results

Page 20: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 19 / 34

Tracking problem

• System

xs = fs(xs) + gs(xs)u.

• Reference

xr = fr(xr).

• Design u such that

xs − xr → 0.

Page 21: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 20 / 34

Solution to tracking problem

Theorem 3.1 If a feedback controller ufb stabilizes

the system and

fr(xr) = fs(xr) + gs(xr)uff (xr),

then the following controller solves the tracking

problem

u(xr, xr) = uff (xr) + ufb(xs − xr).

Page 22: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 21 / 34

System Model

System 6

xs =[

0 1−5 −0.5

]xs +

[01

]u.

Reference

xr =[

0 1−5 0

]xr, xr0 = [0, 0.5

√5]T .

Tracking problem

limt→∞

e = limt→∞

(xs − xr) → 0.

6Adib Yaghmaie et al. Reinforcement Learning for Average Cost Optimal Control Problem with

an Application in Tracking, TAC, Under revision, 2018.

Page 23: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 22 / 34

Define x = [eT , xTr ]T

x =

0 1 0 0

−5 −0.5 0 −0.50 0 0 10 0 −5 0

x +

0100

u.

Running cost

r(x, u) = xT

100 0 0 00 100 0 00 0 0 00 0 0 0

x + u2.

Page 24: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 23 / 34

Analytical solution

u = kfbe + kff xr.

Page 25: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 24 / 34

Offline solution-Feedback gain

ARE([

0 1−5 −0.5

],

[01

],

[100 00 100

], 1)

Feedback controller

ufb(e) = kfbe =[−6.18 −10.11

]e.

Page 26: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 25 / 34

Offline solution-Feedforward gain

Tracking condition

fr(xr) = fs(xr) + gs(xr)uff (xr).

Feedforward controller

uff (xr) = kff xr =[0 0.5

]xr.

Page 27: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 26 / 34

Offline solution-Tracking controller

u =[−6.18 −10.11

]e +

[0 0.5

]xr

Page 28: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 27 / 34

RL framework

Optimize the value function

Va(x(t), u(t)) = limT →∞

1T

∫ t+T

teT Qe + u2dτ.

with respect to

[exr

]=

0 1 0 0

−5 −0.5 0 −0.50 0 0 10 0 −5 0

[

exr

]+

0100

u

which is unknown.

Page 29: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 28 / 34

RL framework

Value function V ∗∞ = wT Φ(x)

Φ(x) = [x21, x1x2, x1x3, x1x4, x2

2, x2x3, x2x4,

x23, x3x4, x2

4].

Optimal controller

u∗ = −12R−1gT ∇Φw.

Page 30: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 29 / 34

RL solution and comparison with offline method

Analytical

w∗ = [116.14 12.36 # #10.11 # # #

# #]

K∗ = [-6.18 -10.11 0 0.5]

λ∗ = 0.1563

RL Algorithm

w(9) = [116.14 12.36 -4.51 -0.06

10.11 0.034 -0.96 0

-0.12 0]

K(9) = [-6.18 -10.11 -0.01 0.48]

λ(9) = 0.1559

Page 31: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 30 / 34

State and control

0 5 10 15 20 25 30 35 40

t

-5

-4

-3

-2

-1

0

1

2

3

4

x

e1

e2

Figure: state e

0 5 10 15 20 25 30 35 40

t

-200

-150

-100

-50

0

50

100

150

200

250

u

Figure: Control

Page 32: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 31 / 34

Weights

Figure: The weight w(k)

0 5 10 15 20 25 30 35 40

t

0

10

20

30

40

50

60

70

80

35 400.1559250.15593

Figure: The average cost λ(k)

Page 33: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 32 / 34

Tracking from random initial condition

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x1

-1.5

-1

-0.5

0

0.5

1

1.5

2x 2

Page 34: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 33 / 34

Open Problems

Page 35: Reinforcement Learning in Control Theory

Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 34 / 34

• Stability of dynamic system

• Really model-free?

• Deep RL in control?