An Introduction to Diib d biistributed Robotic Systems · – In simulation: reproduces the real...

An Introduction to i ib d b iDistributed Robotic Systems

Alcherio Martinoli

School of Architecture Civil and Environmental EngineeringSchool of Architecture, Civil and Environmental EngineeringDistributed Intelligent Systems and Algorithms Laboratory

École Polytechnique Fédérale de LausanneCH-1015 Lausanne, SwitzerlandCH 1015 Lausanne, Switzerland

http://disal.epfl.ch/

[email protected]@ p

EPFL, Topics in Autonomous Robotics, April 23, 2012

Outline – Part I (9:15 am - noon)

• Distributed Robotics: Background• Machine-Learning Methods forMachine Learning Methods for

Distributed Control– Particle Swarm Optimization (PSO)– Noise-resistant algorithms– Collective-specific issues for

distributed control design anddistributed control design and optimization

– Particle Swarm Optimization applied to multi-robot systems

• Conclusion Part I

Outline – Part II (2:15 - 5 p m )Outline Part II (2:15 - 5 p.m.)

• Model-Based Methods– Bottom-up and top down approaches– Multi-level approach

• Model Calibration• Model Calibration• Examples

– Linear exampleLinear example– Nonlinear example

• Conclusion Part II

Background:Background:Multi-Robot Systems Multi Robot Systems

Typical Tasks for Multirobot Teams

• Mapping and exploration• Target trackingg g• Carrying objects• Pla ing team games• Playing team games• Construction

Caloud et al; 1990

CMPack; 2002

• Inspection• Sensing and monitoringg g

From N. Kalra, PhD CMU, 2006

Why Multiple Robots?

• Faster execution• More robustMore robust• Simpler robots

M l i• More solutions• Task requires it


Why Not Multiple Robots?

• Need comms• More complexityMore complexity• Harder to test

N h bl• N x the trouble• Expensive


Goal of a Multirobot Team

Coordinate to distribute tasks and resourcesCoordinate to distribute tasks and resourcesin a way that leads to efficient and robust mission completionmission completion.

Challenges

• Dynamic events• Changing task demandsg g• Resource failures• Ad ersaries• Adversaries• Limited sensing, planning, and actuation• Limited communication• Diverse team

M hi L i Machine-Learning: Rationale and TerminologyRationale and Terminology

Why Machine-Learning?y g• Complementary to a model-based/engineering

h h l l l d ilapproaches: when low-level details matter (optimization) and/or good models do not exist (design)!(design)!

• When the design/optimization space is too big (infinite)/too computationally expensive (e g NP-(infinite)/too computationally expensive (e.g. NPhard) to be systematically searched

• Automatic design and optimization techniquesAutomatic design and optimization techniques• Role of the engineer refocused to performance

specification, problem encoding, and p , p g,customization of algorithmic parameters (and perhaps operators)

Why Machine-Learning?y g

• There are design and optimization techniques• There are design and optimization techniques robust to noise, nonlinearities, discontinuitiesP ibl h l SW• Possible search spaces: parameters, rules, SW or HW structures/architectures

• Individual real-time adaptation to new environmental conditions; i.e. increased individual flexibility when environmental conditions are not known/cannot predicted a priori

ML Techniques: Classification Axis 1ML Techniques: Classification Axis 1

– Supervised learning: off-line, a teacher is available

– Unsupervised learning: off-line, teacher not available

– Reinforcement-based (or evaluative) learning: on-line, no pre-established training and evaluation data sets

ML T h i Cl ifi i A i 2ML Techniques: Classification Axis 2

– In simulation: reproduces the real scenario in simulation and applies there machine-learning techniques; the learned

l ti th d l d d t l h d hsolutions are then downloaded onto real hardware when certain criteria are met

– Hybrid: most of the time in simulation (e.g. 90%), last period (e.g. 10%) of the learning process on real hardware

– Hardware-in-the-loop: from the beginning on real hardware (no simulation). Depending on the algorithm more or less

idrapid

ML T h i Cl ifi i A i 3ML Techniques: Classification Axis 3ML algorithms require often fairly important g q y pcomputational resources (in particular for population-based algorithms), therefore a further classification is:

– On-board: machine-learning algorithm run on the system to be learned (no external unit)be learned (no external unit)

– Off-board: the machine-learning algorithm runs off-board and the system to be learned just serves as embodied implementation of a candidate solution

ML T h i Cl ifi ti A i 4ML Techniques: Classification Axis 4− Population-based (“multi-agent”): a population of

did l i i i i d b h l i h fcandidate solutions is maintained by the algorithm, for instance Genetic Algorithms (individuals), Particle Swarm Optimization (particles), Particle Filter (particle), Ant

l i i i ( l i ) i fColony Optimization (tour solution set, etc.); it can refer also to the fact that multiple agents are needed to build the solution set, for instance in virtual ants as core multi-agent principle in Ant Colony Optimization

− Hill-climbing (“single-agent”): the machine-learning algorithm work on a single candidate solution and try to improve on it; for instance, (stochastic) gradient-based methods, reinforcement learning algorithms

Particle Swarm OptimizationParticle Swarm Optimization

Flocking and Swarming InspirationFlocking and Swarming Inspiration

Principles:Principles:

• Imitate

• Evaluate

• Comparep

PSO: TerminologyPSO: Terminology• Population: set of candidate solutions tested in one time step, consists of m

particles (e.g., m = 20)

• Particle: represents a candidate solution; it is characterized by a velocity vector v and a position vector x in the hyperspace of dimension Dvector v and a position vector x in the hyperspace of dimension D

• Evaluation span: evaluation period of each candidate solution during a single time step (algorithm iteration); as in GA the evaluation span might take more or less time depending on the experimental scenariotake more or less time depending on the experimental scenario

• Life span: number of iterations a candidate solution survives

• Fitness function: measurement of efficacy of a given candidate solutionFitness function: measurement of efficacy of a given candidate solution during the evaluation span

• Population manager: update velocities and position for each particle di t th i PSO laccording to the main PSO loop

Evolutionary Loop: Several GenerationsGenerations

Start

Initialize particles

Perform main PSO loopPerform main PSO loop

E d it i t?Ex. of end criteria:

End criterion met?N

Y• # of time steps

• best solution performance

End

p

•…

Initialization: Positions and Velocities

The Main PSO Loop – ParametersThe Main PSO Loop Parameters and Variables

• Functions– rand ()= uniformly distributed random number in [0,1]

• Parameters– w: velocity inertia (positive scalar)– cp: personal best coefficient/weight (positive scalar)– cn: neighborhood best coefficient/weight (positive scalar)

• Variables– xij(t): position of particle i in the j-th dimension at time step t (j = [1,D])– vij(t): velocity particle i in the j-th dimension at time step t

iti f ti l i i th j th di i ith i l fit t)(*– : position of particle i in the j-th dimension with maximal fitness up to iteration t

– : position of particle i’ in the j-th dimension having achieved the maximal fitness up to iteration t in the neighborhood of particle i

)(* txij

)(* tx ji′p g p

The Main PSO Loop (Eberhart, Kennedy, and Shi, 1995, 1998)

At each time step tfor each particle i

for each component j

At each time step t

update

for each component j

)()1( ijij twvtv +=+pthe

velocity )()()()(

)()(**

ijjinijijp

ijij

xxrandcxxrandc −+− ′

( ) ( )1)1( ++=+ tvtxtx ijijijthen move ( ) ( ))( ijijijthen move

The main PSO Loop- Vector Visualization

*ix My position for optimal

Here I am!

fitness up to date

xi *xThe position with

optimal fitness of my neighbors up to date

iix ′

g p

vi

Neighborhood Types• Size:

– Neighborhood index considers also the particle itself in the countingthe counting

– Local: only k neighbors considered over m particles in the population (1 < k < m); k=1 means no information from other particles used in velocity updatep y p

– Global: m neighbors• Topology:

G hi l– Geographical– Social– Indexed– Random– …

Neighborhood Examples: Geographical vs. Social

geographical social

Neighborhood Example: Indexed and Circular (lbest)

11

8 2Particle 1’s 3-

neighbourhood

7 3

Virtual circle

7

Virtual circle

5

64

A Simple PSO Animation

© M. Clerc

GA vs. PSO in Absence of Noise

GA vs. PSO – OverviewGA vs. PSO Overview• According to most recent research, PSO g

outperforms GA on most (but not all!) continuous optimization problemsp p

• GA still much more widely used in general research community and robust to continuousresearch community and robust to continuous and discrete optimization problems

• Because of random aspects very difficult to• Because of random aspects, very difficult to analyze either metaheuristic or make guarantees about performanceguarantees about performance

Benchmark Functions• Goal: minimization of a given f(x)

S d d b h k f i i h hi ( 30) d• Standard benchmark functions with thirty terms (n = 30) and a fixed number of iterations

• All xi constrained to [-5.12, 5.12]All xi constrained to [ 5.12, 5.12]

Performance ith Small Pop lationsPerformance with Small Populations

• GA: Roulette Wheel for selection, mutation applies numerical adjustment to gene

• PSO: lbest ring topology with neighborhood of size 3• PSO: lbest ring topology with neighborhood of size 3• Algorithm parameters used (but not thoroughly optimized):

Performance with Small PopulationsBold: best results; 30 runs; no noise on the performance function

Performance with Small Populations

Function (no noise)

GA (mean ± std dev)

PSO(mean ± std dev)( ) ( ) ( )

Sphere 0.02 ± 0.01 0.00 ± 0.00

Generalized Rosenbrock

34.6 ± 18.9 7.38 ± 3.27

Rastrigin 157 ±21.8 48.3 ± 14.4

Griewank 0 01 ± 0 01 0 01 ± 0 03Griewank 0.01 ± 0.01 0.01 ± 0.03

Expensive Optimization and Expensive Optimization and Noise ResistanceNoise Resistance

Expensive Optimization ProblemsExpensive Optimization ProblemsTwo fundamental reasons making robot control design and optimization expensive in terms of time:

1. Time for evaluation of candidate solutions (e.g., t f d ) >> ti f li ti f

p p

tens of seconds) >> time for application of metaheuristic operators (e.g., milliseconds)

2 Noisy performance evaluation disrupts the2. Noisy performance evaluation disrupts the adaptation process and require multiple evaluations for actual performanceo actua pe o a ce

– Multiple evaluations at the same point in the search space yield different results

– Noise causes decreased convergence speed and residual error

Reducing Evaluation TimeReducing Evaluation TimeGeneral recipe: exploit more abstracted, calibrated representations (models and sim lation tools)representations (models and simulation tools)

See also part II

Dealing with Noisy Evaluations• Better information about candidate solution can be obtained by

combining multiple noisy evaluationsW ld l i ll h did l i f• We could evaluate systematically each candidate solution for afixed number of times → not efficient from a computational perspectiveW t t d di t t ti l ti t l t• We want to dedicate more computational time to evaluate promising solutions and eliminate as quickly as possible the “lucky” ones → each candidate solution might have been evaluated a different number of times when comparedevaluated a different number of times when compared

• In GA good and robust candidate solutions survive over generations; in PSO they survive in the individual memory

• Use dedicated functions for aggregating multiple evaluations:• Use dedicated functions for aggregating multiple evaluations: e.g., minimum and average or generalized aggregation functions (e.g., quasi-linear weighted means), perhaps combined with a statistical test for comparing resulting co b ed w t a stat st ca test o co pa g esu t gaggregated performances

GA PSO

Testing Noise-Resistant GA andTesting Noise Resistant GA and PSO on Benchmarks

• Benchmark 1 : generalized Rosenbrock function– 30 real parameters– Minimize objective function– Expensive only because of noise

• Benchmark 2: obstacle avoidance on a robot22 l– 22 real parameters

– Maximize objective functionExpensive because of noise and evaluation time– Expensive because of noise and evaluation time

Benchmark 1: Gaussian Additive Noise on Generalized Rosenbrockon Generalized Rosenbrock

Fair test: samenumber of evaluations candidate solutions for allsolutions for all algorithms (i.e. n generations/ i i f d diterations of standard versions compared with n/2 of the noise-resistant ones))

Benchmark 2:Obstacle Avoidance on a Mobile Robot

• Similar to [Floreano and Mondada 1996]– Discrete-time, single-layer, artificial

recurrent neural network controller– Shaping of neural weights and biases

(22 l t )

V = average wheel speed, Δv = difference between wheel speeds, i = value of

(22 real parameters)– fitness function: rewards speed,

straight movement, avoiding obstacles

p ,most active proximity sensor

obstacles• Different from [Floreano and

Mondada 1996]E i t b d d– Environment: bounded open-space of 2x2 m instead of a maze

A t l lt [P h J EPFL PhD Th i NActual results: [Pugh J., EPFL PhD Thesis No. 4256, 2008]

Similar results: [Pugh et al, IEEE SIS 2005]

Baseline Experiment: Extended-Time Adaptation

• Compare the basic algorithms ith their corresponding noisewith their corresponding noise-

resistant version• Population size 100, 100

iterations, evaluation span 300 d ( f iseconds (150 s for noise-

resistant algorithms)→ 34.7 days

• Fair test: same total evaluation time for all the algorithms

• Realistic simulation (Webots)• Best evolved solutions averaged

over 30 evolutionary runsover 30 evolutionary runs• Best candidate solution in the

final pool selected based on 5 runs of 30 s each; performance

d 40 f 30 htested over 40 runs of 30s each• Similar performance for all

algorithms[Pugh J., EPFL PhD Thesis No. 4256, 2008]

Where Can Noise-Resistant Algorithms Make the Difference?

• Limited adaptation time • Sudden change in noise distribution• Large amount of noise

Notes:Notes:• all examples from shaping obstacle avoidance behavior• best evolved solution averaged over multiple runs

f i l f l i i f ll h• fair tests: same total amount of evaluation time for all the different algorithms (standard and noise-resistant)

Limited-Time AdaptationLimited Time Adaptation• Total adaptation time

= 8 3 hours (1/100 ofEspecially good with

= 8.3 hours (1/100 of previous learning time)

small populations

time)• Trade-offs: population

size, number of iterations, evaluation span, number of re-evaluationsevaluations

• Realistic simulation (Webots)Varying population size vs (Webots)Varying population size vs.

number of iterations[Pugh J., EPFL PhD Thesis No. 4256, 2008]

Sudden Change of Noise DistributionsSudden Change of Noise Distributions• Move from realistic simulation (Webots) to real

robots after 90% learning (even faster evolution)• Noise-resistance helps speeding up learning after

transition

[Pugh J., EPFL PhD Thesis No. 4256, 2008]

Increased Noise Level – Set-UpIncreased Noise Level Set Up

• Scenario 1: One robot1x1 m arena, PSO, 50 iterations, scenario 3

• Scenario 1: One robot learning obstacle avoidance

• Scenario 2: One robotScenario 2: One robot learning obstacle avoidance, one robot running pre-evolved obstacle avoidance

• Scenario 3: Two robots co-learning obstacle avoidance

Idea: more robots more noise (as perceived from an individual robot); no “standard” com bet een the robots b t in scenario 3 information sharing“standard” com between the robots but in scenario 3 information sharing through the population manager!

[Pugh et al, IEEE SIS 2005]

Increased Noise Level – Sample Resultsp

[Pugh et al, IEEE SIS 2005]

F Si l t From Single to Multi Unit Systems:Multi-Unit Systems:Co-Adaptation in a Co-Adaptation in a

Shared WorldShared World

Adaptation in Multi-Robot ScenariosAdaptation in Multi Robot Scenarios

• Collective: fitnessCollective: fitness become noisy due to partial perception, independent parallelindependent parallel actions

Credit Assignment ProblemCredit Assignment ProblemWith limited communication, no communication at all, or partial perception:

Co-Adaptation in a pCollaborative Framework

Co-Shaping Collaborative Behaviorp g

Three orthogonal axes to consider (extremities and b l d l i ibl )balanced solutions are possible):

1. Performance evaluation: individual vs. group fitness or reinforcement

l i h i2. Solution sharing: private vs. public policies

3 T di i3. Team diversity: homogeneous vs. heterogeneous learning

Multi-Agent Algorithms for g gMulti-Robot Systems

Policy Performance Sharing Diversity

i-pr-he individual private heterogeneous

i-pr-ho individual private homogeneous

i-pu-he individual public heterogeneouspi-pu-ho individual public homogeneous

g-pr-he group private heterogeneousg pr he g p p g

g-pr-ho group private homogeneous

g-pu-he group public heterogeneousg pu he group public heterogeneous

g-pu-ho group public homogeneous

MA Algorithms for MR Systems

Do not make sense (inconsistent)

Interesting (consistent)

Possible but not scalable

Policy Performance Sharing Diversity

i pr he individual private heterogeneous

g ( )

i-pr-he individual private heterogeneous

i-pr-ho individual private homogeneous

i pu he individual public heterogeneousi-pu-he individual public heterogeneous

i-pu-ho individual public homogeneous

h i t h tg-pr-he group private heterogeneous

g-pr-ho group private homogeneous

g-pu-he group public heterogeneous

g-pu-ho group public homogeneous

MA Algorithms for MR Systems

Example of collaborative

g y

Example of collaborative co-learning with binary encoding of 100 candidate solutionscandidate solutions and 2 robots

Co-Shaping Competitive BehaviorCo Shaping Competitive Behavior

fi f ≠ fi ffitness f1 ≠ fitness f2

Co-Shaping Obstacle Co Shaping Obstacle Avoidance

• Effects of robotic group sizeEffects of robotic group size• Effects of communication constraints

MA Algorithms for MR Systemsg y

Distributed Robotic Adaptation• For an individual-public-heterogeneous policy:

Standard solution: off board adaptation using a centralized– Standard solution: off-board adaptation using a centralizedpopulation manager

– New solution [Pugh, SIJ 2009]: on-board on-board adaptationNew solution [Pugh, SIJ 2009]: on board on board adaptation using a distributed population manager

• Fully scalable solution, no single-point of failurey , g p• Why PSO:

appears interesting since this metaheuristic works well with– appears interesting since this metaheuristic works well with small pools of candidate solutions: candidate pool size ≈ robot team size

– limited particle neighborhood sizes → scalable, on-board operation

Varying the Robotic Group SizeVarying the Robotic Group Size

Var ing the Robotic Gro p Si eVarying the Robotic Group Size

Varying the Robotic Group Size –E l i T ti E i tEvolving vs. Testing Environment

• PSO only; numberPSO only; number of particles = 10

• Gradually increase ynumber of robots on team

• Up to 10x faster learning with little performance loss

• Arena 3x3 m

[Pugh and Martinoli, Swarm Intelligence J., 2009]

Communication-Based Neighborhoods

• Default neighborhood ring topology 2• Default neighborhood - ring topology, 2 neighbors for each robot

• Problem for real robots: neighbor could be• Problem for real robots: neighbor could be very far away

• Possible solutions – use two closest robots in• Possible solutions – use two closest robots in the arena (capacity limitation), use all robots within some radius r (range limitation); w t so e ad us ( a ge tat o );reality is affected often by both

• How will this affect the performance?ow w s ec e pe o ce?

Communication-Based Neighborhoods

Communication-Based N i hb h dNeighborhoods

Ring Topology - Standard

Communication-Based NeighborhoodsNeighborhoods

2-closest – Model A

Communication-Based N i hb h dNeighborhoods

Within range r – Model B

Communication-Based NeighborhoodsCommunication Based Neighborhoods• Realistic simulation results

(Webots)(Webots)• 10 Khepera III robots (one

particle per robot)• Standard neighborhood is 2• Standard neighborhood is 2

fixed particles• New particle

neighborhoods defined byneighborhoods defined by communication limitations:– Model A: particles from

closest two robots– Model B: particles from all

robots within range r = 120 cm

• No drawback to• No drawback to communication-based neighborhoods[Pugh and Martinoli, Swarm Intelligence J., 2009]

V i th C i ti RVarying the Communication RangeArena: 3x3 m, 10 robots total,

Varying the Communication Range• Realistic simulation results (Webots)• Low communication ranges → learning too slowly

High comm nication ranges fitness platea s too earl• High communication ranges → fitness plateaus too early• Limited communication range gives best performance →

scalable solution


Distributed Adaptation with Real RobotsEnabling Technology

• On-board relative positioning system• Principle:

- Enabling Technology

• Principle: – belt of IR emitters (LED) and receivers

(photodiode)IR LED used as antennas; modulated light– IR LED used as antennas; modulated light (carrier 10.7 MHz)

– RF chip behind, measured RSSI– Measure range & bearing of the next robot– Measure range & bearing of the next robot

(relative positioning) can be coupled with RF channel (e.g. 802.11) for heading assessment

– Can also be used for 20 kbit/s com channel

• [Pugh et al., IEEE Trans. on Mechatronics, 2009]2009]

Distributed Adaptation with Real Robots- Sample Results

• All operations on-board using relative positioning system• Effective learning of obstacle avoidance behavior• Effective learning of obstacle avoidance behavior• Similar results for communication-based neighborhoods• Arena 3 x 3 m


Distributed Adaptation withR l R b tReal Robots

Before adaptation (5x speed-up)

Distributed Adaptation withR l R b tReal Robots

After adaptation (5x speed-up)

ConclusionConclusion

Take Home Messages• Evaluative machine-learning techniques are key for

robotic adaptation• Computationally efficient, noise-resistant algorithms can

be obtained with a simple aggregation criterion in the i l i h i lmain algorithmic loop

• Several successful strategies have been designed for dealing with collective-specific problems; theydealing with collective specific problems; they differentiate along three axes: public/private; homogeneous/heterogeneous, individual/group

• Multi robot platforms can be exploited for testing in• Multi-robot platforms can be exploited for testing in parallel multiple candidate solutions.

• PSO appears to be well suited for fully distributed on-pp yboard operation and strategies based local proximity-based neighborhood have been successfully tested

Additional Literature – Part IBooks• Kennedy J. and Eberhart R. C. with Y. Shi, “Swarm Intelligence”. Morgan

Kaufmann Publisher, 2001. ,• Clerc M., “Particle Swarm Optimization”. ISTE Ltd., London, UK, 2006.• Engelbrecht A. P., “Fundamentals of Computational Swarm Intelligence”. John

Wiley & Sons, 2006.

Papers• Pugh J. and Martinoli A., “Distributed Scalable Multi-Robot Learning using

Particle Swarm Optimization”. Swarm Intelligence Journal, 3(3): 203-222,Particle Swarm Optimization . Swarm Intelligence Journal, 3(3): 203 222, 2009.

• Pugh J., Raemy X., Favre C., Falconi R., and Martinoli A., “A Fast On-Board Relative Positioning Module for Multi-Robot Systems”. Special issue on Mechatronics in Multi Robot Systems Chow M Y Chiaverini S Kitts CMechatronics in Multi-Robot Systems, Chow M.-Y., Chiaverini S., Kitts C., editors, IEEE Trans. on Mechatronics, 14(2): 151-162, 2009.

• Murciano A. and Millán J. del R., "Specialization in Multi-Agent Systems Through Learning". Biological Cybernetics, 76: 375-382, 1997.

• Mataric, M. J. “Learning in behavior-based multi-robot systems: Policies, models, and other agents”. Special Issue on Multi-disciplinary studies of multi-agent learning, Ron Sun, editor, Cognitive Systems Research, 2(1):81-93, 2001.

An Introduction to Diib d biistributed Robotic Systems · – In simulation: reproduces the real...

Documents

Transcript of An Introduction to Diib d biistributed Robotic Systems · – In simulation: reproduces the real...