Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

Purposive Behavior Acquisition for a real robot by vision based

Reinforcement Learning

Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda

Presented by:

Subarna Sadhukhan

Reinforced learning

• Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this.

• The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes.

• Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot

• Robot learns through purposive behavior to achieve a given goal

• Environment – Ball, Goal

• Robot- Mobile and has a camera

• Nothing about the system is known

• Assume robot can discriminate the set S of states and take A actions on the world

Q-learningLet Q*(s,a) be the expected return

for taking action a in situation s.

Where T(s,a,s’) be probability of transition from s to s’,

r(s,a) is the reward for state-action pair s-a

γ is discounting factor

Since T and r are not known we can write

Where r is the actual reward for taking a. s’ is the next state and α is the learning rate

State Set• 9*27+27+9 states

(3*3 of ball*3*3*3 of goal+no goal+no ball)

Action set

• Two motors

• Each motor – forward, stop, back

• 9 actions in all.

• State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image

Learning from Early Missions

• Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state

• Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions

Complexity analysis

• K states, m possible actions• Q-learning for first , for second hence

• LEM m*k : Get reward at each step

Implementing LEM

Rough ordering of easy situations

Small -> medium -> large

(sizes of ball roughly means reaching the goal)

State space is categorized into

sub-states such as ball size, position and so on.

n = size of state space, m = number of ordered sets

Apply LEM with m ordered states takes

As opposed to

When to shift

• S1 is nearest to goal, next is S2 and so on.• Shifting occurs when

Where

Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors

• From previous Q-learning equation if Q converges

• Thus

Experiments

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

Documents

Transcript of Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning