The Independent Payment Advisory Board (IPAB): Frequently ...
Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS...
Transcript of Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS...
![Page 1: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/1.jpg)
Machine Learning for Robot Control
Robotics: Science and Systems
Zhibin (Alex) LiSchool of Informatics
University of Edinburgh
![Page 2: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/2.jpg)
Content
2
● Why learning?
● Common learning paradigms
● Deep learning for robot control
● A brief summary
* only cover the machine learning cases for robot control only
![Page 3: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/3.jpg)
Why learning?
3
![Page 4: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/4.jpg)
What are missing in DRC?
4
1. Operators manually put footsteps for robots, which introduces human errors
2. No use of hands during locomotion
3. No reaction while falling, no autonomous behaviors
4. ⋮⋮
© DARPAtv
![Page 5: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/5.jpg)
Lesson learned from DRC: Robots need to be more autonomous
How to fill the gap?
5
High level human supervision
Low level control for task execution
Limited communication
High level human supervision
Low level control for task execution
Limited communication
Machine learning
![Page 6: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/6.jpg)
Why learning?Suitable to handle cases that are:1. stochastic, uncertainty2. a system or process that is hard to be modeled, strongly
nonlinear or state-dependant (time-varying)3. multiple scenarios (difficult to be numerated), multi-modality4. high dimensional action space (multiple actions can lead towards
the same goal, need to predict future)
6
![Page 7: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/7.jpg)
Why learning?Advantage:1. automate the process of designing policies, free people from
tedious routines of coding;2. explore some new solutions, uncover our blind spots, create more
inventions;3. potentially to facilitate progress of science and technology, if
used appropriately.
7
![Page 8: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/8.jpg)
● Supervised learning: know how to classify data (labeled), train neural network to learn how to classify it.
● Unsupervised learning: don’t know how to classify data (unlabeled), train neural network to find out the underlying principles how to classify it.
● Reinforcement learning: don’t know how to classify data, but a reward will be given to if end result is good, or a penalty if the result is bad (eg, Atari video game).
Common learning paradigms
8
![Page 9: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/9.jpg)
Go beyond video games, a case study of using deep learning for robot balancing and locomotion control.
Here the example uses Deep Deterministic Policy Gradient (DDPG), a model free, actor-critic reinforcement learning algorithm.
Deep learning for robot control
9Chuanyu Yang, Taku Komura, Zhibin Li, ”Emergence of Human-Comparable Balancing Behaviors by Deep Reinforcement Learning ,” in IEEE-RAS International Conference on Humanoid Robots, 2017.
![Page 10: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/10.jpg)
Deep learningIt consists two networks:
1. Actor network learns the policy that generates the optimal action;2. Critic network evaluates the performance of the policy learnt by
the actor network.
10
Actor networkCritic network
![Page 11: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/11.jpg)
Deep Deterministic Policy Gradient (DDPG)
11
Stateinput
Actor network
Actionoutput
Stateinput
Input from Action
Critic network
Q-Valueoutput
![Page 12: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/12.jpg)
Learn an optimal policy
12
Actor networkCritic network
![Page 13: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/13.jpg)
Deep learning (DL) for push recovery
13
![Page 14: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/14.jpg)
DL for rough terrain locomotionNext step:
● from balancing to walking, stepping, running, jumping, and leaping;● using perception for coordinating control strategies.
Adding recurrent neural network, mix DDPG with Recurrent Deterministic Policy Gradients (RDPG).
14
![Page 15: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/15.jpg)
Feedback: body orientation, horizontal & vertical speed, joint position & velocity, foot-ground contact.Perception: partially observable height map by 10 lidar scans.
DL for rough terrain locomotion
15Doo Re Song, Chuanyu Yang, Christopher McGreavy, Zhibin Li, “Recurrent Network-based Deterministic Policy Gradient for Solving Bipedal Walking Challenge on Rugged Terrains”, submitted to 'IEEE International Conference on Robotics and Automation, 2018.
![Page 16: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/16.jpg)
Learning optimal policies
16
Actor networkCritic network
![Page 17: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/17.jpg)
DL locomotion: obstacles
17
![Page 18: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/18.jpg)
DL locomotion: stairs
18
![Page 19: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/19.jpg)
DL locomotion: pitfalls
19
![Page 20: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/20.jpg)
QuizPreviously, in the inverse kinematics (IK) problem, we haveObjective function for optimization
So what the reward should be, if we would like to use reinforcement learning to solve the IK problem?
20
![Page 21: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/21.jpg)
QuizPreviously, in the inverse kinematics (IK) problem, we haveObjective function for optimization
21
![Page 22: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/22.jpg)
Is learning everything? A brief summary
22
![Page 23: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/23.jpg)
A right attitude towards machine learningWhat a future super intelligence would think about nowadays’ learning and math- or model-based tools?
23
● Mathematical tools, physics laws and so on are created by the top intellectuals in human history, it is unwise not to use these physics rules, from a super intelligence point of view, to use a week AI instead to learn from scratch in a very primitive way.
![Page 24: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;](https://reader035.fdocuments.in/reader035/viewer/2022071216/6047489736ab7636a040f8f7/html5/thumbnails/24.jpg)
A right attitude towards machine learning● Learning from scratch can be okay for non-physical learning process,
eg computer graphics or playing GO inside computer, 30 iteration and 30,000 or 30 million iteration does not make a big difference, if time is not a problem.
● However, for real physical systems, eg robots, it can be hard because robots do break before learning is done. We, resilient biological systems, also can be injured before reaching successful performance.
24
Right tool for the right thing! It is complementary to other control tools we have learned.