Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning

Learning Behavioral Parameterization

Using Spatio-Temporal Case-Based Reasoning

Maxim Likhachev, Michael Kaess, and Ronald C. Arkin

Mobile Robot Laboratory

Georgia Tech

This research was funded under the DARPA MARS program.

Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 2

Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning

Motivation• Constant parameterization of robotic behavior results in

inefficient robot performance

• Manual selection of “right” parameters is difficult and tedious work



Motivation (cont’d) • Use of Case-Based Reasoning (CBR) methodology

– an automatic selection of optimal parameters at run-time (ICRA’01)– each case is a set of behavioral parameters indexed by environmental

features

“front-obstructed

” ca

se

“clear-to-goal

” ca

se



Motivation for the Current Research• The CBR module

– improves robot performance (in simulations and on real robots)

– avoids the manual configuration of behavioral parameters

• The CBR module still required the creation of a case library which– is dependent on a robot architecture– needs extensive experimentation to optimize cases– requires good understanding of how CBR works

• Solution: to extend the CBR module to learn– new cases from scratch or optimize existing cases– in a separate training process or during missions



Related Work• Use of Case-Based Reasoning in the selection of

behavioral parameters– ACBARR [Georgia Tech ’92] , SINS [Georgia Tech

’93]

– KINS [Chagas and Hallam]

• Automatic optimization of behavioral parameters – genetic programming (e.g., GA-ROBOT [Ram, et. al.])

– reinforcement learning (e.g., Learning Momentum [Lee, et. al.])



Behavioral Control and CBR Module

CBR Module controls (case output parameters): Weights for each behavior BiasMove Vector

Noise Persistence Obstacle Sphere



Case Indices: Environmental FeaturesSpatial features: traversability vector• split environment into K = 4 angular regions• compute obstacle density within each region• transform the density into traversability

Temporal features:• Short-term velocity towards the goal• Long-term velocity towards the goal

f0=0.92

f1=0.58

f2=1.0

f3=0.68

f0=0.02

f1=0.22

f2=0.63

f3=0.02

Vspatial:

f0=0.92 f1=0.58f2=1.00 f3=0.68

Vtemporal

ShortTerm: Rs=1.0LongTerm: Rl=0.7

Vtemporal

ShortTerm: Rs=0.01LongTerm: Rl=1.0

Vspatial:

f0=0.02 f1=0.22f2=0.63 f3=0.02



Overview of non-learning CBR Module

Case switchingDecision tree

CaseAdaptation

currentenvironment

FeatureIdentification

spatial & temporal

feature vectors

Spatial Features Vector Matching

(1st stage of Case Selection)

Temporal Features Vector Matching

(2nd stage of Case Selection)

set ofspatiallymatching

cases

set of spatially and temporally

matching cases

Case Library

all the casesin the library

best matching orcurrently used case

CaseApplication

case ready for application

case output parameters(behavioral assemblage

parameters)

Random Selection Process

(3rd stage of Case Selection)

best matchingcase



Making CBR Module to Learn

Case output parameters ( behavioral assemblage parameters)

Random Selection Biased by Case Success

and Spatial and Temporal

Similarities

best matchingor currently used case

case ready for

application

last K cases

new or existing best

matching case

currentenvironment

FeatureIdentification

spatial & temporal

feature vectors

Spatial Features Vector Matching

(1st stage of Case Selection)

Temporal Features Vector Matching

(2nd stage of Case Selection)

set ofspatiallymatching

cases

set of spatially and temporally

matching cases

Case switchingDecision tree

best matchingcase

last K cases

with adjustedperformance

history

Case Library

all the casesin the library

Old Case Performance Evaluation

New Case Creation

(if necessary)

Case Adaptation

Case Application

best matchingor currently used case



• Random selection of cases with the probability of the selection proportional to:– spatial similarity with the environment ( 1st step)

– temporal similarity with the environment (2nd step)

– weighted sum of the case past performance and spatial and temporal similarities (3rd step)

Extensive Exploration of Cases: Modified Case Selection Process

set of spatially & temporally

matching cases:

{C1,,C4}

C1

spatial similarity

1.00.0

1.0P(selection)

C2C4 C3C5

set of spatially matching

cases:{C1, C2, C4}

temporal similarity

1.00.0

1.0P(selection)

C1 C4C2

weighted sum of spatial and temporal similarities and case success

0.0

1.0P(selection)

C1C4

best matching

case:C1



Positive and Negative Reinforcement: Case Performance Evaluation

• Criteria for the evaluation of the case performance :

the average velocity with which the robot approaches its goal during the application of the case– opportunities for intermediate case performance evaluations

– may not always be the right criteria• such cases exhibit no positive velocity towards the goal

• the evaluation of the performance is delayed by K (=2) cases

– case_success (represents case performance) is:• increased if the average velocity is increased or sustained high

• decreased otherwise



Maximization of Reinforcement: Case Adaptation

• Maximize case_success as a noisy function of case output parameters (behavioral assemblage parameters)– maintain the adaptation vector A(C) for each case C– if the last series of adaptations result in the increase of

case_success then continue the adaptation: O(C) = O(C) + A(C)

– otherwise switch the direction of the adaptation, add a random component and scale proportionally to case_success:

A(C) = -·A(C) + ·R O(C) = O(C) + A(C)



Maximization of Reinforcement: Case Adaptation (cont’d)

• Incorporate prior knowledge into the search:– fixed adaptation of the Noise_Gain and Noise_Persistence

parameters based on the short- and long-term velocities of the robot

• Constrain the search:– limit Obstacle_Gain to be higher than the sum of the other

schema gains (to avoid collisions)



The Growth of the Case Library: Case Creation Decision

• To avoid divergence a new case is created whenever:– case_success of the selected case is high and spatial and

temporal similarities with the environment are low to moderate

– case_success of the selected case is low to moderate and spatial and temporal similarities are low

• Limit the maximum size of the library (10 in this work)• New case is initialized with:

– the spatial and temporal features of the environment

– the output parameter values of the selected case



Experimental Analysis: Example Learning CBR: first run (starting with an empty library)



Experimental Analysis: Example Learning CBR: a run after 54 training runs on various environments• library of ten cases was learned • 36 percent shorter travel distance

A case of a

“clear-to-goal”

strategy is

learned for

such

environments

A case of a

“squeezing”

strategy is

learned for

such

environments



0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

1 2 3

Experiments: Statistical Results Simulation results (after 250 training runs for learning CBR system)

12

3

15% Obstacle density

20% Obstacle density0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

3500.00

4000.00

4500.00

Heterogeneous environmentHomogeneous environment

Ave

rage

num

ber

of s

teps

Mis

sion

co

mpl

etio

n ra

te

0.0

500.0

1000.0

1500.0

2000.0

2500.0

3000.0

3500.0

1 2 3

lear

ning

CB

R

CB

R

non-

adap

tive

lear

ning

CB

R

CB

R

non-

adap

tive

lear

ning

CB

R

CB

R

non-

adap

tive

non-

adap

t.

CB

R

lear

n



Real Robot Experiments: In Progress

• RWI ATRV-Jr• Sensors:

– SICK laser scanners in front and back

– Compass– Gyroscope

• Experiments in progress, no statistical results yet



Conclusions

• New and existing cases are learned and optimized during a training process or as part of mission executions

• Performance:– substantially better than that of a non-adaptive system

– comparable to a non-learning CBR system

• Neither manual selection of behavioral parameters nor careful creation and optimization of case library is required from a user

• Future Work– real robot experiments

– case “forgetting” component

– integration with other adaptation & learning methods (e.g., Learning Momentum, RL for Behavioral Assemblage Selection)

Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning

Documents

Transcript of Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning