Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning
description
Transcript of Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning
![Page 1: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/1.jpg)
Purposive Behavior Acquisition for a real robot by vision based
Reinforcement Learning
Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda
Presented by:
Subarna Sadhukhan
![Page 2: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/2.jpg)
Reinforced learning
• Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this.
• The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes.
• Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot
• Robot learns through purposive behavior to achieve a given goal
![Page 3: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/3.jpg)
• Environment – Ball, Goal
• Robot- Mobile and has a camera
• Nothing about the system is known
• Assume robot can discriminate the set S of states and take A actions on the world
![Page 4: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/4.jpg)
Q-learningLet Q*(s,a) be the expected return
for taking action a in situation s.
Where T(s,a,s’) be probability of transition from s to s’,
r(s,a) is the reward for state-action pair s-a
γ is discounting factor
Since T and r are not known we can write
Where r is the actual reward for taking a. s’ is the next state and α is the learning rate
![Page 5: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/5.jpg)
State Set• 9*27+27+9 states
(3*3 of ball*3*3*3 of goal+no goal+no ball)
![Page 6: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/6.jpg)
Action set
• Two motors
• Each motor – forward, stop, back
• 9 actions in all.
• State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image
![Page 7: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/7.jpg)
Learning from Early Missions
• Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state
• Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions
![Page 8: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/8.jpg)
Complexity analysis
• K states, m possible actions• Q-learning for first , for second hence
• LEM m*k : Get reward at each step
![Page 9: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/9.jpg)
Implementing LEM
Rough ordering of easy situations
Small -> medium -> large
(sizes of ball roughly means reaching the goal)
State space is categorized into
sub-states such as ball size, position and so on.
n = size of state space, m = number of ordered sets
Apply LEM with m ordered states takes
As opposed to
![Page 10: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/10.jpg)
When to shift
• S1 is nearest to goal, next is S2 and so on.• Shifting occurs when
Where
Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors
![Page 11: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/11.jpg)
• From previous Q-learning equation if Q converges
• Thus
![Page 12: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/12.jpg)
LEM
![Page 13: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/13.jpg)
Experiments
![Page 14: Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning](https://reader036.fdocuments.in/reader036/viewer/2022082819/56813e48550346895da83328/html5/thumbnails/14.jpg)