Optimal Design for Synchronization & Games on Communication … talks/2011 UTD Lewis- optimal... ·...

UTA Research Institute (UTARI)The University of Texas at Arlington

F.L. LewisMoncrief-O’Donnell Endowed Chair

Head, Controls & Sensors Group

Optimal Design for Synchronization &Games on Communication Graphs

Supported by AFOSR, NSF, AROThanks to Jie Huang


F.L. Lewis

http://ARRI.uta.edu/acs

Cooperative Control Synchronization: Optimal Design and Games on Communication Graphs

Supported by :NSF ‐ PAUL WERBOSARO, AFOSR

Invited by Xu Shengyuan

J.J. Finnigan, Complex science for a complex world

The Internet

ecosystem ProfessionalCollaboration network

Barcelona rail network

Structure of Natural andManmade Systems

Local nature of Physical LawsPeer-to-Peer Relationships

in networked systems

Clusters of galaxies

Motions of Biological Groups

Fishschool

Birdsflock

Locustsswarm

Firefliessynchronize

Local / Peer-to-Peer Relationships in socio-biological systems

Outline

Optimal Design for Synchronization of Cooperative Systems

Distributed Observer and Dynamic Regulator

Discrete-time Optimal Design for Synchronization

Graphical Games

Control Design Methodsfor Multi‐Agent Systems

Acks. to:Guanrong Chen – Pinning controlLihua Xie - Local nbhd. tracking errorZhihua Qu - Lyapunov eq. for di-graphs

Communication Graph

N nodes

G=(V,E)

State at node i is ( )ix t

Synchronization problem

( ) ( ) 0i jx t x t

Communication Graph

1

2

3

4

56

N nodes

[ ]ijA a

0 ( , )ij j i

i

a if v v E

if j N

oN1

Noi ji

jd a

Out-neighbors of node iCol sum= out-degree

42a

Adjacency matrix0 0 1 0 0 01 0 0 0 0 11 1 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 1 0

A

iN1

N

i ijj

d a

In-neighbors of node iRow sum= in-degreei

G=(V,E)

i

Social Standing

Standard Control Protocol with Linear Integrator SystemEach node has an associated state i ix uStandard local voting protocol ( )

i

i ij j ij N

u a x x

( )u Dx Ax D A x Lx L=D‐A = graph Laplacian matrix

x Lx If x is an n‐vector then ( )nx L I x

1

N

uu

u

1

N

dD

d

Closed‐loop dynamics

i

j

1

N

xx

x

L has e‐val at zero, simple if exists a spanning tree

Type I system

Modal decomposition 2 1 22 2 1 1 2 2

1( ) (0) (0) (0) 1 (0)

Nt t tT T T

j jj

x t v e w x v e w x v e w x x

2 determines the rate of convergence ‐ Fiedler e‐value

1 1 2Tw determines the consensus value in terms of the initial conditions

No freedom to determine the consensus value

We call this the Cooperative Regulator Problem

Standard Control Protocol with Linear Integrator SystemEach node has an associated state i ix u

Standard local voting protocol ( )i

i ij j ij N

u a x x

1

1i i

i i ij ij j i i i iNj N j N

N

xu x a a x d x a a

x

( )u Dx Ax D A x Lx L=D‐A = graph Laplacian matrix

x Lx

x

1

N

uu

u

1

N

dD

d

Closed‐loop dynamics

i

j

L has row sum zero implies 1st right eigenvector is 1 1 1 1 Tv

1 1

( ) (0) (0) (0)i i

N Nt tLt T T

i i i ij j

x t e x w e v x w x e v

Convergence Value and Rate

x Lx Closed‐loop system with local voting protocol

Modal decomposition

Let be simple. Then for large t1 0

2 1 22 2 1 1 2 2

1( ) (0) (0) (0) 1 (0)

Nt t tT T T

j jj

x t v e w x v e w x v e w x x

2 determines the rate of convergence ‐ Fiedler e‐value

1 1 2Tw determines the consensus value in terms of the initial conditions

Depends on Communication Graph TopologyNo freedom to determine the consensus value

L has e‐val at zero

We call this the Cooperative Regulator Problem

is simple if the graph is strongly connected1 0

We want Design Freedom that overcomes graph topology constraints

SVFB Design for Synchronization

Cooperative Observers

Duality on Graphs

Dynamic Regulators

Decouple Control Design from Graph Topology constraints

For Directed graphs

Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback,observer and output feedback”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.

i i ix Ax Bu

A. State Feedback Design for Cooperative Systems on Graphs

Cooperative Regulator vs. Cooperative Tracker problemN nodes with dynamics

Synchronization Tracker design problem 0( ) ( ),ix t x t i

0 0x AxControl node or Command generator (Exosystem)

0( ) ( )i

i ij j i i ij N

e x x g x x

0n ne L G I x x L G I

1 2 ,TT T T nN

Ne R 0 0 ,nNx Ix R 1 nN nnI I R

0nNx x R

Local neighborhood tracking error

Overall error vector

Consensus or synchronization error

where

= Local quantity

= Global quantity

i

j

, ,n mi ix R u R x0(t)

Ron Chen- pinning control Lihua Xie- error

0n ne L G I x x L G I

Local quantity Global quantity

Local control objectives imply global performance

Local Neighborhood Tracking Error

0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

0( ) ( )i

i i i i ij j i i ij N

x Ax Bu Ax cBK e x x g x x

0( ) ( ) ( )Nx I A c L G BK x c L G BK x

0Gr Grc cx A x B x

( ) ( ) GN cI A c L G BK A

Closed loop system

Overall c.l. dynamics

Global synch. error dynamics

Fax and Murray 2004

1 2 ,TT T T nN

Nx x x x R Overall state

Graph structure Control structure

Coop. nbhd SVFB

MIXES UP CONTROL DESIGN AND GRAPH STRUCTURE

Lewis and Syrmos1995

OPTIMALDesign at Each node


DECOUPLES CONTROL DESIGN FROM COMMUNICATION GRAPH STRUCTURE


Emre Tuna 2008 paper online

OPTIMAL Design at each node gives global guaranteed performance on any strongly connected communication graph


Example: Unbounded Region of Consensus for Optimal Feedback Gains.

2 1 1,

2 1 0A B

0.5 0.5K

b. Unbounded Consensus Region forOptimal SVFB Gain

a. Bounded Consensus Region forArbitrarily Chosen Stabilizing SVFB Gain

Q=I, R=1

1.544 1.8901K

Example from [Li, Duan, Chen 2009]

Im{ }

Re{ }

Im{ }

Re{ }

A c BK E-vals of (L+G)

1

2 3

4 5 6

12

3

4 5

6

Graph Eigenvalues for Different Communication Topologies

Directed Tree-Chain of command

Directed Ring-Gossip networkOSCILLATIONS

1. Cooperative State Feedback, Observers, Duality, and Optimal Design for Synchronization

i i ix Ax Bu

0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

0( ) ( ) ( )Nx I A c L G BK x c L G BK x

Thm 1. Design of SVFB Gain for Coop. Tracking Stability

1 10 ,T T TA P PA Q PBR B P K R B P

1min Re( ( ))ii

cL G

Unbounded Region of Consensus for Optimal Gains if coupling gain is

i iy Cx

0 ˆ( ) ( ) ,i

oi ij j i i i i i i i

j Ne y y g y y y y y

ˆ ˆ oi i i ix Ax Bu cF

ˆ ˆ( ) ( ) ( ) ( )N Nx I A c L G FC x c L G F y I B u

Thm 2. Design of Observer Gain for Coop. Estimation

1 10 ,T T TAP PA Q PC R CP F PC R

Cooperative system node dynamics

State feedback with local neighborhood tracking error

Overall Cooperative Team Dynamics

Output measurements at each node

Local nbhd. estimation error

Use local coop. observer dynamics at each node

Overall Team Observer/Sensor Fusion Dynamics

Use OPTIMAL feedback gain

Then synchronization is achieved for ANY strongly connected digraph

Use OPTIMAL observer gain

Then estimates converge for ANY strongly connected digraph

CONTROL DISTRIBUTED ESTIMATION & SENSOR FUSION

Results: Distributed dynamic regulator for synchronization of teams using only output measurementsDuality structure theory extended to networked cooperative feedback systems on graphsOptimal Design yields synchronization on ANY strongly connected Communication di‐graph

Pinning control node

Duality

B. Observer Design for Cooperative Systems on Graphs

,i i ix Ax Bu i iy Cx

ˆ ˆ,i i i i i ix x x y y y

ˆ ( ) ,nix t R ˆ ˆ( ) ( ) p

i iy t Cx t R

0 0 ,x Ax 0 0y Cx

Cooperative Observer design problem

N nodes with dynamics

State and output estimates

State and output estimation errors

Control node or Command generator dynamics

( ) 0,ix t i ˆ ( ) ( ),i ix t x t i or

0( ) ( )i

oi ij j i i i

j Ne y y g y y

Local neighborhood estimation error

0o

n ne L G I y y L G I y L G C x Ren; Beard, Kingston 2008

Overall estimation error

i

j

Local node observersˆ ˆ o

i i i ix Ax Bu cF

0ˆ ˆ ( ) ( )i

i i i ij j i i ij N

x Ax Bu cF e y y g y y

0ˆ ˆ ( ) ( )i

i i i ij j i i ij N

x Ax Bu cFC e x x g x x

ˆ ˆ( ) ( ) ( ) ( )N Nx I A c L G FC x c L G F y I B u

1 2ˆ ˆ ˆ ˆTT T T nN

Nx x x x R

ˆ ˆ ( )Gr Gro o Nx A x F y I B u

( ) ( ) GrN ox I A c L G FC x A x

Overall observer dynamics

where

Overall estimation error dynamics

Li, Duan, Ron Chen 2009

c.f. Finsler Lemma design in Li, Duan, Ron Chen 2009

Emre Tuna 2008 paper online(without observer design)


OPTIMAL Design at Each node gives guaranteed performanceOn any strongly connected communication di-graph topology

C. Control/Observer Duality on Graphs

Converse, reverse, or transpose graph = reverse edge arrows

0( ) ( ) ,i

i i ij j i i ij N

u cK cK e x x g x x

0ˆ ˆ ( ) ( )

i

i i i ij j i i ij N

x Ax Bu cFC e x x g x x

Must use local nbhd. Tracking error and local nbhd. Estimation erroror duality does not work

SVFB Observer

D. Dynamic Tracker for Synchronization of Cooperative Systems Using Output Feedback

,i i ix Ax Bu i iy Cx

0 0 ,x Ax 0 0y Cx

ˆ ˆ oi i i ix Ax Bu cF

0ˆ ˆ ( ) ( )i

i i i ij j i i ij N


0ˆ ˆ ˆ ˆ( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

0ˆ ˆ ˆ( ) ( )i

i i ij j i i ij N

x Ax cBK e x x g x x

0ˆ( ) ( ) ( )Nx I A x c L G BKx c L G BK x

0

ˆ ˆ( ) ( ) ( )

ˆ( ) ( )Nx I A c L G FC x c L G F y

c L G BKx c L G BK x

N nodes with dynamics

Command generator dynamics (exosystem)

Observers at each node

Estimated SVFB

Closed‐loop systems

Overall system/observer dynamics

Must use local nbhd. Tracking error and local nbhd. Estimation erroror it is not nice.

OPTIMAL Design at Each node gives guaranteed performanceOn any strongly connected communication di-graph topology

Three Regulator Designs

0ˆ ˆ ( ) ( )i

i i i ij j i i ij N


0ˆ ˆ ˆ ˆ( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

Nbhd Observers

Nbhd Controls

1. Neighborhood Controller and Neighborhood Observer

2. Neighborhood Controller and Local Observer

0ˆ ˆ ˆ ˆ( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

Local Observers

Nbhd Controls

ˆ ˆ .i i i ix Ax Bu cF y

3. Local Controller and Neighborhood Observer

ˆi iu Kx

0ˆ ˆ ( ) ( )i

i i i ij j i i ij N


Nbhd Observers

Local Controls

i

j

i

j

i

j

i

j

ij

ij

Distributed Systems

1i i ix k Ax k Bu k

0 01x k Ax k

0i

i ij j i i ij N

e x x g x x

11i i i iu c d g K

kBKgdckAxkx iiiii 111

11 c Nk A k I A c I D G L G BK k

GLGDI 1

, 1,k k N

Discrete‐Time Optimal Design for Synchronization

Distributed systems

Command generator

Local Nbhd Tracking Error

Local closed‐loop dynamics

Local cooperative SVFB – weighted

0 ( )k x k x k Global disagreement error dynamics

Weighted Graph Matrix

Weighted graph eigenvalues

Work with Kristian Movric, Lihua Xie, Keyou You ‐ Automatica

Decouple controls design from graph topology

1

r

c0

r0

Covering circle of graph eigenvalues

Synchronization region contains this circle

Ctrldesign

GraphProps.

Kristian Movric

Single‐Input case with Real Graph Eigenvalues

1/21/2 1 1/20max

0

( ( ) )T T Tr r Q A PB B PB B PAQc

0 max min

0 max min

.rc

If graph eigenvalues are real

u

u Ar

1For SI systems, for proper choice of Q

Mahler measure

2log uii

A intrinsic entropy rate = minimum data rate in a networked control system that enables stabilization of an unstable system – Baillieul and others.

min max/ Eigen‐ratio = ‘condition number’ of the communication graph

condition

Work on log quantization- Elia & Mitter, Lihua Xie

1

2 3

4 5 6

12

3

4 5

6


Directed Tree-Chain of command

Directed Ring-Gossip networkOSCILLATIONS


Directed graph-Better conditioned

Undirected graph-More ill-conditioned

65

34

2

1

4

5

6

2

3

1

E-vals of

B. Cooperative State Feedback, Observers, Duality, and Optimal Design for Synchronization

101 ( ) ( )

i

i i i ij j i i ij N

u c d g K e x x g x x

Thm 1. Design of SVFB Gain for Coop. Tracking Stability

( ) ( )i iy k Cx k

0 ˆ( ) ( ),i

oi ij j i i i i i i

j Ne y y g y y y y y

Thm 2. Design of Observer Gain for Coop. Estimation

Cooperative system node dynamics

State feedback with local neighborhood tracking error

Overall Cooperative Team Error Dynamics

Output measurements at each node

Local nbhd. estimation error

Use local coop. observer dynamics at each node

Overall Team Observer Error Dynamics

CONTROL DISTRIBUTED ESTIMATION & SENSOR FUSIONDuality

1i i ix k Ax k Bu k

11 c Nk A k I A c I D G L G BK k

0 ( )k x k x k

GLGDI 1, 1,k k N

1( ) 0T T T TA PA P Q A PB B PB B PA

1/21/2 1 1/2max: ( ( ) )T T Tr Q A PB B PB B PAQ

0

0

r rc

0 0,C c r

PABPBBK TT 1

0

1cc

11ˆ ˆ1 1 o

i i i i i ix k Ax k Bu k c d g F k

ˆ( ) ( ) ( )k x k x k

111 Nk I A c I D G L G FC k

10T T T TAPA P Q APC CPC CPA

1/211/2 1/2

max: T T Tobsr Q APC CPC CPA Q

obsrcr

0

0

1T TF APC CPC

10

1cc

Control Graph and Observer Graph need not be the same

C. Control/Observer Duality on Graphs

Converse, reverse, or transpose graph Gr’ = reverse edge arrows

0( ) ( ) ,i

i i ij j i i ij N

u cK cK e x x g x x

Non‐normalized SVFB and ObserverMust use local nbhd. Tracking error and local nbhd. Estimation error

or duality does not work

SVFB Observer

1ˆ ˆ1 oi i i ix k Ax k Bu k c F k

Let SVFB K stabilize coop. tracking error 1 c Nk A k I A c L G BK k

11 TT TNk I A c L G FB k

on graph Gr

Then observer gain F=KT stabilizes coop estimation error

on the reverse graph Gr’

D. Three Coop. Regulator Designs – Coop. Separation Principle

10ˆ ˆ ˆ1 ( ) ( )

i

i i i ij j i i ij N


Nbhd Observers

Nbhd Controls

1. Neighborhood Controller and Neighborhood Observer

2. Neighborhood Controller and Local Observer

Local Observers

Nbhd Controls

( 1) ( ) ( ) )ˆ (ˆi i i ik kx Ax B k y ku cF

3. Local Controller and Neighborhood Observer

ˆi iu Kx

Nbhd Observers

Local Controls

i

j

i

j

i

j

i

j

ij

ij

11 0ˆ ˆ1 1 ( ) ( )

i

i i i i i ij j i i ij N

x k Ax k Bu k c d g F e y y g y y

10ˆ ˆ ˆ1 ( ) ( )

i

i i i ij j i i ij N


11 0ˆ ˆ1 1 ( ) ( )

i

i i i i i ij j i i ij N

x k Ax k Bu k c d g F e y y g y y

Kristian Movric

max min

max min

( ) u

u

A A

Single‐Input case with Real Graph Eigenvalues

Is equivalent to max

min

( ) 1( ) 1AA

1i i ix k Ax k Bu k 0i

i ij j i i ij N

u cK e x x g x x

0( )i

i ij j i i ij N

u cF z K e x x g x x

Add stable filter

2max

min

( ) 1( ) 1AA

Filtered protocol gives synch. if

select ( )A 2

2 2 1min 2

min

( (1 ) ) ,T T TP A P I BB P A B PB

the stabilizing solution to0P

2 2 1min min( (1 ) )T TK I B PB B PA

1min min( ) ( )T z K zI A BK B

1 2

2

(1 )( )1 ( )

F zT z

Complementary sensitivity

Guoxiang Gu &Lewis – IEEE TAC

Improvement

Results:

Distributed dynamic regulator for synchronization of teams using only output measurements

Duality structure theory extended to networked cooperative feedback systems on graphs

Riccati Design yields synchronization Decouples Controls Design from Graph Properties.

max

min

( )G Graph Condition Number

Like to have ( ) 1G

min large means fast convergence

eigenratio= min

max

L.R. Varshney, “Distributed inference with costly wires”

max

min

( ) 1( ) 1AA


F.L. Lewis

http://ARRI.uta.edu/acs

Games on Communication Graphs

Supported by :NSF ‐ PAUL WERBOSARO, AFOSR

,i i i ix Ax B u

0 0x Ax

0( ) ( ),ix t x t i

0( ) ( ),i

i ij i j i ij N

e x x g x x

( ) ,nix t ( ) im

iu t

Graphical GamesSynchronization‐ Cooperative Tracker Problem

Node dynamics

Target generator dynamics


Local neighborhood tracking error (Lihua Xie)

Pinning gains (Ron Chen)0ig

x0(t)

K. Vamvoudakis and F.L. Lewis, “Graphical Games for Synchronization”Automatica, to appear.

,i i i ix Ax B u

0 0x Ax

0( ) ( ),ix t x t i

0( ) ( ),i

i ij i j i ij N

e x x g x x

( ) ,nix t ( ) im

iu t

1 2 ,TT T T nN

N 0 0nNx Ix

0 ,n nL G I x x L G I

/ ( )L G

0nNx x

1 2 ,TT T T nN

Nx x x x

Graphical GamesSynchronization‐ Cooperative Tracker Problem

Node dynamics

Target generator dynamics


Local neighborhood tracking error (Lihua Xie)

Global neighborhood tracking error

Lemma. Let graph be strongly connected and at least one pinning gain nonzero. Then

and agents synchronize iff ( ) 0t

Pinning gains (Ron Chen)0ig

x0(t)

Standard way =

( )i

i i i i i i ij j jj N

A d g B u e B u

0( ) ( )i

i ij i j i ij N

e x x g x x

12

0

( (0), , ) ( )i

T T Ti i i i i ii i i ii i j ij j

j N

J u u Q u R u u R u dt

12

0

( ( ), ( ), ( ))i i i iL t u t u t dt

12( ( )) ( )

i

T T Ti i i ii i i ii i j ij j

j Nt

V t Q u R u u R u dt

1

N

i ii

z Az B u

12

10

( (0), , ) ( )N

T Ti i i j ij j

j

J z u u z Qz u R u dt

( ) { : }i j iu t u j N

1

( , , )

( , ),

( ,{ : })

TN

i i j i

G U v

G V E v v v

v U U j N R

Graphical Game: Games on GraphsLocal nbhd. tracking error dynamics

Define Local nbhd. performance index

Local value functions for fixed policies iu

Static Graphical Game

Standard N‐player differential game

Values depend on all other agents

Value depends only on neighbors

Local agent dynamics driven by neighbors’ controls

Dynamics depend on all other agents

Values driven by neighbors’ controls

Kyriakos Vamvoudakis

1 1 11 1 2 3 1 2 1 3 13 3 3( ) ( ) ( ) coi

teamJ J J J J J J J J J

1 1 12 1 2 3 2 1 2 3 23 3 3( ) ( ) ( ) coi


1 1 13 1 2 3 3 1 3 2 33 3 3( ) ( ) ( ) coi


The objective functions of each player can be written as a team average term plus a conflict of interest term:

1 1

1 1

( ) , 1,N N

coii j i j team iN N

j j

J J J J J J i N

For N-player zero-sum games, the first term is zero, i.e. the players have no goals in common.

For N-players

Team Interest vs. Self InterestCooperation vs. Collaboration

( ), { : }i j iu t u j N

{ : , }G i ju u j N j i

* * *1 2, ,...,u u u

* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N

* *( )i i iJ J u

( ) ( , ) ( , ' ),i i i i G i i i G iJ u J u u J u u i

* 12( ( )) min ( )

ii

T T Ti i i ii i i ii i j ij ju

j Nt


Problems with Nash Equilibrium Definition on Graphical GamesGame objective

Neighbors of node i

All other nodes in graphDef: Nash equilibrium

are in Nash equilibrium if

Counterexample. Disconnected graph

Let each node play his optimal control

Then, each agent’s cost does not depend on any other agent

Then all agents are in Nash equilibriumNote‐ this Nash is also coalition‐proof

Another example

Define

Def. Local Best response.is said to be agent i’s local best response to fixed policies of its neighbors if

* * *( , ) ( , ), ,i j G j i j G jJ u u J u u i j N

A restriction on what sorts of performance indices can be selected in multi‐player graph games.

A condition on the reaction curves (Basar and Olsder) of the agents

This rules out the disconnected counterexample.

*( , ) ( , ),i i i i i i iJ u u J u u u

*iu iu

New Definition of Nash Equilibrium for Graphical Games

* * *1 2, ,...,u u u

* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N

Def: Interactive Nash equilibrium

are in Interactive Nash equilibrium if

2. There exists a policy such that ju

1.

That is, every player can find a policy that changes the value of every other player.

i.e. they are in Nash equilibrium

1 1 12 2 2( , , , ) ( ) 0

i i

TT T Ti i

i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N

V VH u u A d g B u e B u Q u R u u R u

10 ( ) Ti ii i i ii i

i i

H Vu d g R B

u

2 1 2 1 11 1 12 2 2( ) ( ) 0,

i

TT Tj jc T T Ti i i

i i ii i i i i ii i j j j jj ij jj ji i i j jj N

V VV V VA Q d g B R B d g B R R R B i N

2 1 1( ) ( ) ,i

jc T Tii i i i i ii i ij j j j jj j

i jj N

VVA A d g B R B e d g B R B i N

* *( , , , ) 0ii i i i

i

VH u u

* 2 11 1 12 2 20 ( , , , ) ( )

i

T Tc T T Ti i i i

i i i i i i ii i i i i ii i j ij ji i i i j N

V V V VH u u A Q d g B R B u R u

2 1( )

i

c T ii i i i i ii i ij j j

i j N

VA A d g B R B e B u

12( ( )) ( )

i

T T Ti i i ii i i ii i j ij j

j Nt


Value function

Differential equivalent (Leibniz formula) is Bellman’s Equation

Stationarity Condition

1. Coupled HJ equations

2. Best Response HJ Equations – other players have fixed policies

where

where

Graphical Game Solution Equations

ju

Theorem 3. Let (A,Bi) be reachable for all i. Let agent i be in local best response

Then are in global Interactive Nash iff the graph is strongly connected.

*( , ) ( , ),i i i i i iJ u u J u u i

* * *1 2, ,...,u u u

i

k

( ) (( ) ) ( ) (( ) )0( ) ( )

N n i i n kk kT

ii N

I A L G I diag B K L G I Bv A Bv

p pp diag Q I A

1( ) ( ) T ii i i i i ii i i i

i

Vu u V d g R B K p

k k k ku K p v

2B AB A B

Picks out the shortest path from node k to node i

A

BAB

Online Solution of Graphical Games

Use Reinforcement Learning Convergence Results

POLICY ITERATION


Motions of Biological Groups

Fishschool

Birdsflock

Locustsswarm

Firefliessynchronize

Local / Peer-to-Peer Relationships in socio-biological systems


Use Reinforcement Learning Convergence Results

POLICY ITERATION


System

Action network

Policy Evaluation(Critic network)cost

The Adaptive Critic Architecture

Adaptive Critics

Value update

Control policy update

Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control

Policy Iteration is Reinforcement Learning

F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEECircuits & Systems Magazine, Invited Feature Article, pp. 32‐50, Third Quarter 2009.

1 1 12 2 2( , , , ) ( ) 0

i i

TT T Ti i

i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N

V VH u u A d g B u e B u Q u R u u R u

1( ) T ii i i ii i

i

Vu d g R B


Policy Iteration gives structure needed for online graph games

Solve simultaneously online:

ˆ ˆ Ti i iV W

112

ˆˆ ( )T

T ii i i ii i i N

iu d g R B W

Weierstrass Approximator structures‐ 2 at each node

actor

critic

2 11 14 4

ˆ ˆ ˆ ˆ ˆ( )i

Tj jT T T T T

i i i ii i i N i i N j j j N j jj ij jj j j Nj jj N

W Q W DW d g W B R R R B W

ˆ ˆ( ( ) )i

ii i i i i i ij j j

i j N

A d g B u e B u

Bellman equation

Bellman equation becomes an algebraic equation in the parameters

Approximate values by a Critic Network at each node i

ˆ ˆ Ti i iV W

Approximate control policies by an Actor Network at each node i

112

ˆˆ ( )T

T ii N i i ii i i N

iu d g R B W

1

2 11 14 42

ˆˆ

ˆ ˆ ˆ ˆ ˆ[ ( ) ](1 )

i

i ii

Tj jT T T T Ti

i i i i ii i i N i i N j j j N j jj ij jj j j Ni i j jj N

EW a

W

a W Q W DW d g W B R R R B W

Online Solution of Graphical Games Using Value Function Aproximation

Tuning law for Critic parameters

Tuning law for Actor parameters

2 11

1 1ˆ ˆ ˆ ˆ ˆ ˆ ˆ{( ) ( ) }4 4

ii

TT Tj jT T T Ti N i N

i N i N i i N i i N i i i N i i N j j j j jj ij jj jsi s j jj N

j i

W a F W F W DW W W d g W B R R R Bm m

Converges to solution of coupled HJ equations, and Nash equilibriumand keeps states stable while learningNeed PE of ˆ ˆ( ( ) )

i

ii i i i i i ij j j

i j N

A d g B u e B u

System

Action network

Policy Evaluation(Critic network)cost

The Adaptive Critic Architecture

Adaptive Critics

Value update

Control policy update

Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control

A new adaptive control architecture

Optimal Adaptive Control

ˆ ˆ Ti i iV W

112

ˆˆ ( )T

T ii i i ii i i N

iu d g R B W

Can avoid knowledge of drift term f(x) by using Integral Reinforcement Learning (IRL)

Draguna Vrabie

Then HJ equations are solved online without knowing f(x)

Coupled AREs are solved online without knowing A

Lemma 1 – Draguna Vrabie

( ( )) ( , ) ( , )Td V x V f x u r x u

dt x

Proof:

( , ) ( ( )) ( ( )) ( ( ))t T t T

t t

r x u d d V x V x t V x t T

Solves Lyapunov equation without knowing f(x,u)

( ( )) ( , ) ( ( )), (0) 0t T

t

V x t r x u d V x t T V

0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V

x x

Allows definition of temporal difference error for CT systems

( ) ( ( )) ( , ) ( ( ))t T

t

e t V x t r x u d V x t T

Another form for the CT Bellman eq.

Is equivalent to

Integral Reinforcement Form of Bellman Equation

Draguna Vrabie

1. Natural biological groups do not measure synchronization errors !

Agents sense rough relative positions or motions

and adjust their motion accordingly

2. We want FINITE-TIME synchronization

0sgn( ) sgn( )i

i ij j i i ij N

u a x x b x x

Propose a Signum protocol

Binary, 1-bit quantization, reduced information needed

Discontinuous- makes a big mess.

Gang Chen, College of Automation Chongqing University

11

1

2 node consensus example in Wassim Haddad paper

sgn( )x uu x

u

t

1

-1

x

t

Linear state evolution

( ) ( ( ))x t f x t nx R f(x) Lebesgue measurable and locally essentially bounded

Filippov set-valued map0 ( ) 0

[ ]( ) { ( ( ) \ )}S

K f x co f B x S

Filippov solution on

An absolutely continuous function that satisfies the differential inclusion

0 1[ , ]t t

0 1: [ , ] nx t t R

( ) [ ]( )x t K f x for almost all 0 1[ , ]t t t

A Filippov solution is maximal if it cannot be extended forward in time

Set is weakly (resp. strongly) invariant if it contains a (resp. all) maximal solutions0x M nM R

Same as lim ( ) : :i i i fiK f x co f x x x y N N

,

Cortes 2008

Bacciotti & Ceragioli, 1999, Shevitz & Paden, 1994

Wassim Haddad 2008

Paden &Sastry 1987

Differential Equations with Discontinuous Right-Hand Sides

NOT necessarily Lipschitz

Caratheodory solutions work, in fact. But we need Filippov set-valued map on the next page.

: nV R R a locally Lipschitz function

the set of measure zero where gradient does not existVN

( ) {lim ( ) : , } [ ]( )i i i ViV x co V x x x x N N K V x

Clarke generalized gradient

Set-valued Lie Derivative [ ]( ) { | [ ]( ) s.t. , ( )}TfL V x a R K f x a V x

Cortes 2008


Wassim Haddad 2008Paden &Sastry 1987

Lyapunov Analysis for Discontinuous Dynamical Systems

Lyapunov function second derivative is strictly negative implies finite-time consensus

, 1, 2, ,i ix u i n

1 2 0( ) ( ) ( ) ,nx t x t x t x t T

0sgn( ) sgn( )i

i ij j i i ij N

u a x x b x x

1

-1

(

)

0sgn( ) sgn( )i

i ij j i i ij N

x a x x b x x

0i ie x x

sgn( ) sgn( )i

i ij j i i ij N

e a e e b e

Error dynamics

( ) ( )i

i ij j i i ij N

e a SGN e e b SGN e

Paden and Sastry 1987- calculus for Filippov map

sgn(.)

1

-1 SGN(.)

Node dynamics

Finite-time Synchronization in time T

Signum protocol

Closed-loop system

Synch. error

Binary, 1-bit quantization, reduced information needed

Cortes 2006, 2008


Wassim Haddad 2008

Filippov solutions (Caratheodory works)Fillipov set-valued map

11 1 1 1

1sgn( )2

n n n nT

ij i j i ij j ii j i j

a x x x a x x

Lemma 1. For undirected network one has

Theorem 1. Let the communication graph be undirected and connected. If there is at least one pinned node,

then the signum protocol solves the controlled consensus problem asymptotically.

Proof. Choose the candidate Lyapunov function

12

TV e e

1 2[ , , , ]T T T Tne e e e

[ ] [ ]TfL V K e e

1( sgn( ) sgn( ))

i

nTi ij j i i i

i j NK e a e e b e

1. Undirected Graphs Gang Chen, College of Automation Chongqing University

Interesting fact: the Lyapunov function and derivative are continuous for undirected graphs.

Proof: check out the second derivative of Lyapunov function

20[ ( )]L V t

12

TV e e

Convergence time bounded by

Depends on the initial synchronization errors

Must use Clarke generalized gradient and set-valued Lie derivative

Undirected Graphs – Finite Time Convergence

max{ }i inb d max din = max in-degree


2[ ( )]( ) 0f fL L V x Cortes 2006

2. Directed Graphs

, ,i ij j jip a p a i j

Detail-balanced

There exist such that0ip

Sum over j. Then 1 2 Np p p p is a left e-vector of L for zero e-val


11

12 22

0LF

F F

Let there exist a spanning tree.

Frobenius form of graph Laplacian L

11L

22F

1S

2S

3S

4S

2,max{ }ind

1,max{ }ind3

4,jll S

a some j S

Proof: first show that the leaders reach consensus in finite time, then show followers do so also.2

1 0 1, 1[ ( )] iL V t p Convergence times bounded like

leaders

followers

Must use Clarke generalized gradient and set-valued Lie derivative

Finite Time Tracking Control

0 0( , )x g t xCommand generator node (exosystem)

0 01 sgn( ) , sgn( )

i

i

i ij j j i i ij Nij i

j N

u a u x x b g t x x xa b

Idea from Lihua Xie and Liu Shuai -allows tracking a moving leader with FIRST-ORDER dynamics

i ix u

0 0 01 sgn ( ) ( ) ( ( , ) sgn(( ) ( )))

i

i

i ij j j j i i i i ij Nij i

j N


Finite Time Formation Control

Idea from Lihua Xie and Liu Shuai -formation position offsets


Node dynamics

Control protocol

Finite Time Tracking Control

0 0( , )x g t xCommand generator node (exosystem)

0 01 sgn( ) , sgn( )

i

i

i ij j j i i ij Nij i

j N


i ix u

0 0 01 sgn ( ) ( ) ( ( , ) sgn(( ) ( )))

i

i

i ij j j j i i i i ij Nij i

j N


Finite Time Formation Control


Node dynamics

Control protocol

What does Finite-Time Consensus Look Like? - Simulation

leaders

followers

Finite Time Consensus!

Communication Topology


Leaders reach consensus

Optimal Design for Synchronization & Games on Communication … talks/2011 UTD Lewis- optimal... ·...

Documents

Transcript of Optimal Design for Synchronization & Games on Communication … talks/2011 UTD Lewis- optimal... ·...