A Rank Minimization Approach to Learning Dynamical Systems ...
Learning Dynamical System Models from...
Transcript of Learning Dynamical System Models from...
![Page 1: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/1.jpg)
Learning Dynamical System Models from Data
CS 294-112: Deep Reinforcement Learning
Week 3, Lecture 1
Sergey Levine
![Page 2: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/2.jpg)
1. Before: learning to act by imitating a human
2. Last lecture: choose good actions autonomously by backpropagating (or planning) through known system dynamics (e.g. known physics)
3. Today: what do we do if the dynamics are unknown?a. Fitting global dynamics models (“model-based RL”)
b. Fitting local dynamics models
4. Wednesday: putting it all together to learn to “imitate” without a human (e.g. by imitating optimal control), so that we can train deep network policies autonomously
Overview
![Page 3: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/3.jpg)
What’s wrong with known dynamics?
This lecture: learning the dynamics model
![Page 4: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/4.jpg)
1. Overview of model-based RL• Learn only the model
• Learn model & policy
2. What kind of models can we use?
3. Global models and local models
4. Learning with local models and trust regions
• Goals:• Understand the terminology and formalism of model-based RL
• Understand the options for models we can use in model-based RL
• Understand practical considerations of model learning
Today’s Lecture
![Page 5: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/5.jpg)
Why learn the model?
![Page 6: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/6.jpg)
Why learn the model?
![Page 7: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/7.jpg)
Why learn the model?
![Page 8: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/8.jpg)
Does it work? Yes!
• Essentially how system identification works in classical robotics
• Some care should be taken to design a good base policy
• Particularly effective if we can hand-engineer a dynamics representation using our knowledge of physics, and fit just a few parameters
![Page 9: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/9.jpg)
Does it work? No!
• Distribution mismatch problem becomes exacerbated as we use more expressive model classes
go right to get higher!
![Page 10: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/10.jpg)
Can we do better?
![Page 11: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/11.jpg)
What if we make a mistake?
![Page 12: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/12.jpg)
Can we do better?ev
ery
N s
tep
s
![Page 13: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/13.jpg)
That seems like a lot of work…ev
ery
N s
tep
s
![Page 14: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/14.jpg)
Backpropagate directly into the policy?
backprop backpropbackprop
easy for deterministic policies, but also possible for stochastic policy (more on this later)
![Page 15: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/15.jpg)
Summary
• Version 0.5: collect random samples, train dynamics, plan• Pro: simple, no iterative procedure• Con: distribution mismatch problem
• Version 1.0: iteratively collect data, replan, collect data• Pro: simple, solves distribution mismatch• Con: open loop plan might perform poorly, esp. in stochastic domains
• Version 1.5: iteratively collect data using MPC (replan at each step)• Pro: robust to small model errors• Con: computationally expensive, but have a planning algorithm available
• Version 2.0: backpropagate directly into policy• Pro: computationally cheap at runtime• Con: can be numerically unstable, especially in stochastic domains (more on this later)
![Page 16: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/16.jpg)
Case study: model-based policy search with GPs
![Page 17: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/17.jpg)
Case study: model-based policy search with GPs
![Page 18: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/18.jpg)
Case study: model-based policy search with GPs
![Page 19: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/19.jpg)
![Page 20: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/20.jpg)
What kind of models can we use?
Gaussian process neural network
image: Punjani & Abbeel ‘14
other
video prediction?more on this later in the course
![Page 21: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/21.jpg)
Case study: dynamics with recurrent networks
![Page 22: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/22.jpg)
Case study: dynamics with recurrent networks
![Page 23: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/23.jpg)
Other related work on learning human dynamics
• Conditional Restricted Boltzmann Machines (Taylor et al.)
• GPs and GPLVMs (Wang et al.)
• Linear and switching linear dynamical systems (Hsu & Popovic)
• Many others…
• Will compare:• ERD (this work)
• LSTM with three layers
• CRBM (probabilistic model trained with contrastive divergence)
• Simple n-gram baseline
• GP model
![Page 24: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/24.jpg)
![Page 25: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/25.jpg)
The trouble with global models
• Planner will seek out regions where the model is erroneously optimistic
• Need to find a very good model in most of the state space to converge on a good solution
![Page 26: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/26.jpg)
The trouble with global models
• Planner will seek out regions where the model is erroneously optimistic
• Need to find a very good model in most of the state space to converge on a good solution
• In some tasks, the model is much more complex than the policy
![Page 27: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/27.jpg)
Local models
![Page 28: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/28.jpg)
Local models
![Page 29: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/29.jpg)
Local models
![Page 30: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/30.jpg)
What controller to execute?
![Page 31: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/31.jpg)
What controller to execute?
![Page 32: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/32.jpg)
What controller to execute?
![Page 33: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/33.jpg)
Local models
![Page 34: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/34.jpg)
How to fit the dynamics?
Can we do better?
![Page 35: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/35.jpg)
What if we go too far?
![Page 36: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/36.jpg)
How to stay close to old controller?
![Page 37: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/37.jpg)
KL-divergences between trajectories
• Not just for trajectory optimization – really important for model-free policy search too! More on this in later lectures
dynamics & initial state are the same!
![Page 38: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/38.jpg)
KL-divergences between trajectories
![Page 39: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/39.jpg)
KL-divergences between trajectories
negative entropy
![Page 40: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/40.jpg)
KL-divergences between trajectories
![Page 41: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/41.jpg)
Digression: dual gradient descent
how to maximize? Compute the gradient!
![Page 42: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/42.jpg)
Digression: dual gradient descent
![Page 43: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/43.jpg)
Digression: dual gradient descent
![Page 44: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/44.jpg)
DGD with iterative LQR
![Page 45: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/45.jpg)
DGD with iterative LQR
this is the hard part, everything else is easy!
![Page 46: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/46.jpg)
DGD with iterative LQR
![Page 47: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/47.jpg)
DGD with iterative LQR
![Page 48: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/48.jpg)
Trust regions & trajectory distributions
•Bounding KL-divergences between two policies or controllers, whether linear-Gaussian or more complex (e.g. neural networks) is really useful
•Bounding KL-divergence between policies is equivalent to bounding KL-divergences between trajectory distributions
•We’ll use this later in the course in model-free RL too!
![Page 49: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/49.jpg)
Case study: local models & iterative LQR
![Page 50: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/50.jpg)
![Page 51: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/51.jpg)
![Page 52: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/52.jpg)
Case study: local models & iterative LQR
![Page 53: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/53.jpg)
Case study: combining global and local models
![Page 54: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/54.jpg)
![Page 55: Learning Dynamical System Models from Datarll.berkeley.edu/.../week_3_lecture_1_dynamics_learning.pdf1. Before: learning to act by imitating a human 2. Last lecture: choose good actions](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f3678fd198f6d7f583a146b/html5/thumbnails/55.jpg)