Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

Robot Swarms as Ensembles ofCooperating Components

Matthias HölzlWith contributions from Martin Wirsing, Annabelle Klarl

AWASSLucca, June 24, 2013

www.ascens-ist.eu

The Task

Robots cleaning an exhibition area

Matthias Hölzl 2

marXbot

Miniature mobile robot developed by EPFLRough-terrain mobilityRobots can dock to other robotsMany sensors

Proximity sensorsGyroscope3D accelerometerRFID readerCameras. . .

ARM-based Linux systemGripper for picking up items

Matthias Hölzl 3

Swarm Robotics

Matthias Hölzl 4

Problems

Noise, sensor resolutionExtracting information from sensor dataUnforeseen situationsUncertainty about the environmentPerforming complex actions when intermediateresults are uncertain. . .

Matthias Hölzl 5

Action Logics

Logics that can represent change over timeProbabilistic behavior can be modeled (but is cumbersome)

Matthias Hölzl 6

Markov Decision Processes

pos = (x,y)

pos = (x,y+1) pos = (x+1,y+1)

pos = (x + 1,y)e / ...

s, n / ...

s / 0.9 / -0.1e,w / 0.025 / -0.1

w / ...s,n / ...

e / ...s, n / ...

w/ ...s,n / ...

n / 0.9 / -0.1e, w / 0.025 / -0.1

s,n,w,e / 0.05 / -0.1

Matthias Hölzl 7

Markov Decision Processes

watchTV

goToClub

DecideActivity

In Club

Watching TV

Dancing Alone

Drinking

Dancing With

Partner

dance p = 0.5

dance p = 0.5

drinkBeer

Oh, oh

Oh, no

flirt

p = 0.05

p = 0.95

Matthias Hölzl 8

MDPs: Strategies

watchTV

goToClub

DecideActivity

In Club

Watching TV

Dancing Alone

Drinking

Dancing With

Partner

dance p = 0.5

dance p = 0.5

drinkBeer

Oh, oh

Oh, no

flirt

p = 0.05

p = 0.95

State TV CDB CDFDA watchTV goToClub goToClubIC drinkBeer drinkBeer danceDWP flirt flirt flirtUtility 0.1 −0.05(∗) −1.975(∗)

(∗) 0.05+ (−0.1)(∗∗) 0.05+ (0.5 × 0.2) + 0.5 × (0.25+ (0.05 × 5) + (0.95 × −5))

Matthias Hölzl 9

Reinforcement Learning

General idea:Figure out the expectedvalue of each actionin each statePick the action withthe highest expectedvalue (most of thetime)Update the expectationsaccording to the actualrewards

Matthias Hölzl 10

How well does this work?

Rather well for small problemsBut: state explosion

Matthias Hölzl 11

Solutions

DecompositionHierarchyPartial programs

Matthias Hölzl 12

POEM

Action languageFirst-order reasoningHierarchical reinforcement learning

Learns completions for partial programsConcurrency

Reflection / meta-object protocols· · ·

Matthias Hölzl 13

Iliad: A POEM Implementation

Common Lisp-based programming languageFull first-order reasoning

Operations on logical theories: U(C)NA, domain closure, . . .Resolution, hyperresolution, DPLL, etc.Conditional answer extractionProcedural attachment, constraint solving

Hierarchical reinforcement learningBased on Concurrent ALispPartial programsThreadwise and temporal state abstractionHierarchically Optimal Yet Local (HOLY) Q-learningHierarchically Optimal Recursively Decomposed (HORD) Q-learning

Matthias Hölzl 14

Planned Contents

Introduction to CALisp/PoemSimple TD-learning: banditsFlat reinforcement learning: navigationHierarchical reinforcement learning:collecting items individuallyThreadwise decomposition for hierarchicalreinforcement learning: learning collaboratively

Matthias Hölzl 15

n-armed Bandits

S

search / 1.0 / N(0.1, 1.0)

coll-known / 1.0 / N(0.3, 3.0)

Choice between n actionsReward depends probabilisticly onthe action choiceNo long-term consequencesSimplest form of TD-learning

Matthias Hölzl 16

Flat Learning

XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)XXTT XX XXXX XXXXXXXXXX XXXX Choices: (N E S W)XXXXXX XX XX XX Q-values:XX XX XXXXXXXX XX #((N (Q -1.8))XX XXXXXX XX XX (E (Q -1.8))XX XX XXXXXXXX (S (Q -2.25))XXXXXXXXXX XX XX (W (Q 2.76)))XX XX XXXX XX Recommended choice is WXX XX XX XX XX XXXX XX RR XXXXXXXXXXXXXXXXXXXXXXXXXX

Matthias Hölzl 17

Flat Learning

(defun simple-robot ()(call (nav (target-loc (robot-env)))))

(defun nav (loc)(until (equal (robot-loc) loc)

(with-choice navigate-choice (dir ’(N E S W))(action navigate-move dir))))

Matthias Hölzl 18

Hierarchical Learning

(defun waste-removal ()(loop

(choose choose-waste-removal-action(action sleep ’SLEEP)(call (pickup-waste))(call (drop-waste)))))

(defun pickup-waste ()(call (nav (waste-source)))(action pickup-waste ’PICKUP))

Matthias Hölzl 19

Multi-Robot Learning

XXXXXXXXXXXXXXXXXXXXXXXXXXTT XX RR XXXX XX XXXXXXXX XX XXXX XX RW RR XXXX XXXX RR WW XXXX XXXX RR XXXX XXXX XXXXXXXXXXXXXXXXXXXXXXXXXX

Matthias Hölzl 20

Thank you!

Any Questions?

[email protected]

Matthias Hölzl 21

[email protected]

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

Technology

Transcript of Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl