Download - Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Learning Prospective Robot Behavior

Shichao Ou and Roderic Grupen

Laboratory for Perceptual RoboticsUniversity of Massachusetts Amherst


A Developmental Approach

• Infant Learning– In stages

• Maturation processes

– Parents provide constrained learning contexts

• Protect• EasyComplex

– Motion mobile for newborns– Use brightly colored, easy to

pick up objects– Use building blocks– Association of words and

objects


Application in Robotics

• Framework for Robot Developmental Learning – Role of teacher: setup learning contexts that make target concept

conspicuous– Role of robot: acquire concepts, generalize to new contexts by autonomous

exploration, provide feedback

• Control Basis – Robot actions are created using combinations of <σ,ф,τ> – Establish stages of learning by time-varying constraints on resources

• Easy Complex


Example• Learning to Reach for

Objects– Stage 1: SearchTrack

• Focus attention using single brightly colored object (σ)

• Limit DOF (τ) to use head ONLY

– Stage 2: ReachGrab• Limit DOF (τ) to use

one arm ONLY

– Stage 3: Handedness, Scale-Sensitive

Hart et. al, 2008


Prospective Learning

• Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy


Robot Prospective Learning with Human Guidance

S0 S1 Si SnSja0 a1 ai-1 ai aj-1 aj an-1

S0 S1 Si SnSj

Si1 SinSij

g : 0 1

sub-task

a0 a1 ai-1 ai aj-1 aj an-1

S0 S1 Si SnSj

f

g(f)=1 g(f)=0

a0 a1 ai-1 ai aj-1 aj an-1Challenge


A 2D Navigation Domain Problem

• 30x30 map• 6 doors,

randomly closed• 6 buttons• 1 start and 1

goal• 3-bit door sensor

on robot


Flat Learning Results

• Flat Q-Learning– 5-bit state

• (x,y, door-bit1, door-bit2, door-bit3)

– 4 actions • up, down, left, right

– Reward• 1 for reaching the goal• -0.01 for every step taken

– Learning parameter• α=0.1, γ=1.0, ε=0.1

• Learned solutions after 30,000 episodes


Prospective Learning

• Stage 1– All doors open– Constrain resources to

use only (x,y) sensors– Allow agent learn a

policy from start to goal

S0 S1 Si SnSjRight Down Right Right Up Right Right


Prospective Learning• Stage 2

– Close 1 door– Robot learns the cause of

the failure

– Robot back tracks and finds an earlier indicator of this cause




the failure– Robot back tracks and

finds an earlier indicator of this cause

– Create a sub-task– Learn a new policy to sub-

task




the failure– Robot back tracks and

finds an earlier indicator of this cause

– Create a sub-task– Learn a new policy to sub-

task– Resume original policy


Prospective Learning Results

Learned solutions < 2000 episodes


Humanoid Robot Manipulation Domain

• Benefits of Prospective Learning

– Adapt to new contexts by maintaining majority of the existing policy

– Automatically generates sub-goals

– Sub-task can be learned in a completely different state space.

– Supports interactive learning


Conclusion

• A developmental view to robot learning

• A framework enables interactive incremental learning in stages

• Extension to the control basis learning framework using the idea of prospective learning