Optimal Control, Trajectory Optimization, and...
Transcript of Optimal Control, Trajectory Optimization, and...
![Page 1: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/1.jpg)
Optimal Control, Trajectory Optimization, and Planning
CS 294-112: Deep Reinforcement Learning
Week 2, Lecture 2
Sergey Levine
![Page 2: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/2.jpg)
1. Assignment 1 will be out next week
2. Friday section• Review of automatic differentiation, SGD, training neural nets
• Try the MNIST TensorFlow tutorial – if you’re having trouble, come to the section!
• Fri 1/27 at 10 am
• Sutardja Dai Hall 240
• Chelsea Finn will teach the section
Announcements
![Page 3: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/3.jpg)
1. Last lecture: imitation learning from a human teacher
2. Today: can the machine make its own decisions?a. How can we choose actions under perfect knowledge of the system dynamics?
b. Optimal control, trajectory optimization, planning
3. Next week: how can we learn unknown dynamics?
4. How can we then also learn policies? (e.g. by imitating optimal control)
Overview
policy
system dynamics
![Page 4: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/4.jpg)
1. Making decisions under known dynamics• Definitions & problem statement
2. Trajectory optimization: backpropagation through dynamical systems
3. Linear dynamics: linear-quadratic regulator (LQR)
4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative LQR
5. Discrete systems: Monte-Carlo tree search (MCTS)
6. Case study: imitation learning from MCTS
• Goals:• Understand the terminology and formalisms of optimal control
• Understand some standard optimal control & planning algorithms
Today’s Lecture
![Page 5: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/5.jpg)
1. run away
2. ignore
3. pet
Terminology & notation
![Page 6: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/6.jpg)
Trajectory optimization
![Page 7: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/7.jpg)
Shooting methods vs collocation
shooting method: optimize over actions only
![Page 8: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/8.jpg)
Shooting methods vs collocation
collocation method: optimize over actions and states, with constraints
![Page 9: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/9.jpg)
Linear case: LQR
linear quadratic
![Page 10: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/10.jpg)
Linear case: LQR
![Page 11: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/11.jpg)
Linear case: LQR
![Page 12: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/12.jpg)
Linear case: LQR
linear linearquadratic
![Page 13: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/13.jpg)
Linear case: LQR
linear linearquadratic
![Page 14: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/14.jpg)
Linear case: LQR
![Page 15: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/15.jpg)
Some useful definitions
![Page 16: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/16.jpg)
Stochastic dynamics
![Page 17: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/17.jpg)
Nonlinear case: DDP/iterative LQR
![Page 18: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/18.jpg)
Nonlinear case: DDP/iterative LQR
![Page 19: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/19.jpg)
Nonlinear case: DDP/iterative LQR
![Page 20: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/20.jpg)
Nonlinear case: DDP/iterative LQR
![Page 21: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/21.jpg)
Nonlinear case: DDP/iterative LQR
![Page 22: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/22.jpg)
Nonlinear case: DDP/iterative LQR
![Page 23: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/23.jpg)
Additional reading
1. Mayne, Jacobson. (1970). Differential dynamic programming. • Original differential dynamic programming algorithm.
2. Tassa, Erez, Todorov. (2012). Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization.• Practical guide for implementing non-linear iterative LQR.
3. Levine, Abbeel. (2014). Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics.• Probabilistic formulation and trust region alternative to deterministic line search.
![Page 24: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/24.jpg)
Case study: nonlinear model-predictive control
![Page 25: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/25.jpg)
![Page 26: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/26.jpg)
Discrete case: Monte Carlo tree search (MCTS)
![Page 27: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/27.jpg)
Discrete case: Monte Carlo tree search (MCTS)
e.g., random policy
![Page 28: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/28.jpg)
Discrete case: Monte Carlo tree search (MCTS)
+10 +15
![Page 29: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/29.jpg)
Discrete case: Monte Carlo tree search (MCTS)
Q = 10N = 1
Q = 12N = 1
Q = 16N = 1
Q = 22N = 2Q = 38N = 3
Q = 10N = 1
Q = 12N = 1
Q = 22N = 2Q = 30N = 3
Q = 8N = 1
![Page 30: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/30.jpg)
Additional reading
1. Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, Colton. (2012). A Survey of Monte Carlo Tree Search Methods.• Survey of MCTS methods and basic summary.
![Page 31: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/31.jpg)
Case study: imitation learning from MCTS
![Page 32: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/32.jpg)
Case study: imitation learning from MCTS
Why train a policy?
• In this case, MCTS is too slow for real-time play
• Other reasons – perception, generalization, etc.: more on this later
![Page 33: Optimal Control, Trajectory Optimization, and Planningrll.berkeley.edu/deeprlcourse/docs/week_2_lecture... · 4. Nonlinear dynamics: differential dynamic programming (DDP) & iterative](https://reader033.fdocuments.in/reader033/viewer/2022052801/5f13c9c4f04d9f11495f0788/html5/thumbnails/33.jpg)
What’s wrong with known dynamics?
Next time: learning the dynamics model