Optimal Design for Synchronization & Games on Communication … talks/2011 UTD Lewis- optimal... ·...
Transcript of Optimal Design for Synchronization & Games on Communication … talks/2011 UTD Lewis- optimal... ·...
UTA Research Institute (UTARI)The University of Texas at Arlington
F.L. LewisMoncrief-O’Donnell Endowed Chair
Head, Controls & Sensors Group
Optimal Design for Synchronization &Games on Communication Graphs
Supported by AFOSR, NSF, AROThanks to Jie Huang
UTA Research Institute (UTARI)The University of Texas at Arlington
F.L. Lewis
http://ARRI.uta.edu/acs
Cooperative Control Synchronization: Optimal Design and Games on Communication Graphs
Supported by :NSF ‐ PAUL WERBOSARO, AFOSR
Invited by Xu Shengyuan
J.J. Finnigan, Complex science for a complex world
The Internet
ecosystem ProfessionalCollaboration network
Barcelona rail network
Structure of Natural andManmade Systems
Local nature of Physical LawsPeer-to-Peer Relationships
in networked systems
Clusters of galaxies
Motions of Biological Groups
Fishschool
Birdsflock
Locustsswarm
Firefliessynchronize
Local / Peer-to-Peer Relationships in socio-biological systems
Outline
Optimal Design for Synchronization of Cooperative Systems
Distributed Observer and Dynamic Regulator
Discrete-time Optimal Design for Synchronization
Graphical Games
Control Design Methodsfor Multi‐Agent Systems
Acks. to:Guanrong Chen – Pinning controlLihua Xie - Local nbhd. tracking errorZhihua Qu - Lyapunov eq. for di-graphs
Communication Graph
N nodes
G=(V,E)
State at node i is ( )ix t
Synchronization problem
( ) ( ) 0i jx t x t
Communication Graph
1
2
3
4
56
N nodes
[ ]ijA a
0 ( , )ij j i
i
a if v v E
if j N
oN1
Noi ji
jd a
Out-neighbors of node iCol sum= out-degree
42a
Adjacency matrix0 0 1 0 0 01 0 0 0 0 11 1 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 1 0
A
iN1
N
i ijj
d a
In-neighbors of node iRow sum= in-degreei
G=(V,E)
i
Social Standing
Standard Control Protocol with Linear Integrator SystemEach node has an associated state i ix uStandard local voting protocol ( )
i
i ij j ij N
u a x x
( )u Dx Ax D A x Lx L=D‐A = graph Laplacian matrix
x Lx If x is an n‐vector then ( )nx L I x
1
N
uu
u
1
N
dD
d
Closed‐loop dynamics
i
j
1
N
xx
x
L has e‐val at zero, simple if exists a spanning tree
Type I system
Modal decomposition 2 1 22 2 1 1 2 2
1( ) (0) (0) (0) 1 (0)
Nt t tT T T
j jj
x t v e w x v e w x v e w x x
2 determines the rate of convergence ‐ Fiedler e‐value
1 1 2Tw determines the consensus value in terms of the initial conditions
No freedom to determine the consensus value
We call this the Cooperative Regulator Problem
Standard Control Protocol with Linear Integrator SystemEach node has an associated state i ix u
Standard local voting protocol ( )i
i ij j ij N
u a x x
1
1i i
i i ij ij j i i i iNj N j N
N
xu x a a x d x a a
x
( )u Dx Ax D A x Lx L=D‐A = graph Laplacian matrix
x Lx
x
1
N
uu
u
1
N
dD
d
Closed‐loop dynamics
i
j
L has row sum zero implies 1st right eigenvector is 1 1 1 1 Tv
1 1
( ) (0) (0) (0)i i
N Nt tLt T T
i i i ij j
x t e x w e v x w x e v
Convergence Value and Rate
x Lx Closed‐loop system with local voting protocol
Modal decomposition
Let be simple. Then for large t1 0
2 1 22 2 1 1 2 2
1( ) (0) (0) (0) 1 (0)
Nt t tT T T
j jj
x t v e w x v e w x v e w x x
2 determines the rate of convergence ‐ Fiedler e‐value
1 1 2Tw determines the consensus value in terms of the initial conditions
Depends on Communication Graph TopologyNo freedom to determine the consensus value
L has e‐val at zero
We call this the Cooperative Regulator Problem
is simple if the graph is strongly connected1 0
We want Design Freedom that overcomes graph topology constraints
SVFB Design for Synchronization
Cooperative Observers
Duality on Graphs
Dynamic Regulators
Decouple Control Design from Graph Topology constraints
For Directed graphs
Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback,observer and output feedback”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.
i i ix Ax Bu
A. State Feedback Design for Cooperative Systems on Graphs
Cooperative Regulator vs. Cooperative Tracker problemN nodes with dynamics
Synchronization Tracker design problem 0( ) ( ),ix t x t i
0 0x AxControl node or Command generator (Exosystem)
0( ) ( )i
i ij j i i ij N
e x x g x x
0n ne L G I x x L G I
1 2 ,TT T T nN
Ne R 0 0 ,nNx Ix R 1 nN nnI I R
0nNx x R
Local neighborhood tracking error
Overall error vector
Consensus or synchronization error
where
= Local quantity
= Global quantity
i
j
, ,n mi ix R u R x0(t)
Ron Chen- pinning control Lihua Xie- error
0n ne L G I x x L G I
Local quantity Global quantity
Local control objectives imply global performance
Local Neighborhood Tracking Error
0( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
0( ) ( )i
i i i i ij j i i ij N
x Ax Bu Ax cBK e x x g x x
0( ) ( ) ( )Nx I A c L G BK x c L G BK x
0Gr Grc cx A x B x
( ) ( ) GN cI A c L G BK A
Closed loop system
Overall c.l. dynamics
Global synch. error dynamics
Fax and Murray 2004
1 2 ,TT T T nN
Nx x x x R Overall state
Graph structure Control structure
Coop. nbhd SVFB
MIXES UP CONTROL DESIGN AND GRAPH STRUCTURE
Lewis and Syrmos1995
OPTIMALDesign at Each node
Lewis and Syrmos1995
DECOUPLES CONTROL DESIGN FROM COMMUNICATION GRAPH STRUCTURE
Lewis and Syrmos1995
Emre Tuna 2008 paper online
OPTIMAL Design at each node gives global guaranteed performance on any strongly connected communication graph
OPTIMALDesign at Each node
Example: Unbounded Region of Consensus for Optimal Feedback Gains.
2 1 1,
2 1 0A B
0.5 0.5K
b. Unbounded Consensus Region forOptimal SVFB Gain
a. Bounded Consensus Region forArbitrarily Chosen Stabilizing SVFB Gain
Q=I, R=1
1.544 1.8901K
Example from [Li, Duan, Chen 2009]
Im{ }
Re{ }
Im{ }
Re{ }
A c BK E-vals of (L+G)
1
2 3
4 5 6
12
3
4 5
6
Graph Eigenvalues for Different Communication Topologies
Directed Tree-Chain of command
Directed Ring-Gossip networkOSCILLATIONS
1. Cooperative State Feedback, Observers, Duality, and Optimal Design for Synchronization
i i ix Ax Bu
0( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
0( ) ( ) ( )Nx I A c L G BK x c L G BK x
Thm 1. Design of SVFB Gain for Coop. Tracking Stability
1 10 ,T T TA P PA Q PBR B P K R B P
1min Re( ( ))ii
cL G
Unbounded Region of Consensus for Optimal Gains if coupling gain is
i iy Cx
0 ˆ( ) ( ) ,i
oi ij j i i i i i i i
j Ne y y g y y y y y
ˆ ˆ oi i i ix Ax Bu cF
ˆ ˆ( ) ( ) ( ) ( )N Nx I A c L G FC x c L G F y I B u
Thm 2. Design of Observer Gain for Coop. Estimation
1 10 ,T T TAP PA Q PC R CP F PC R
Cooperative system node dynamics
State feedback with local neighborhood tracking error
Overall Cooperative Team Dynamics
Output measurements at each node
Local nbhd. estimation error
Use local coop. observer dynamics at each node
Overall Team Observer/Sensor Fusion Dynamics
Use OPTIMAL feedback gain
Then synchronization is achieved for ANY strongly connected digraph
Use OPTIMAL observer gain
Then estimates converge for ANY strongly connected digraph
CONTROL DISTRIBUTED ESTIMATION & SENSOR FUSION
Results: Distributed dynamic regulator for synchronization of teams using only output measurementsDuality structure theory extended to networked cooperative feedback systems on graphsOptimal Design yields synchronization on ANY strongly connected Communication di‐graph
Pinning control node
Duality
B. Observer Design for Cooperative Systems on Graphs
,i i ix Ax Bu i iy Cx
ˆ ˆ,i i i i i ix x x y y y
ˆ ( ) ,nix t R ˆ ˆ( ) ( ) p
i iy t Cx t R
0 0 ,x Ax 0 0y Cx
Cooperative Observer design problem
N nodes with dynamics
State and output estimates
State and output estimation errors
Control node or Command generator dynamics
( ) 0,ix t i ˆ ( ) ( ),i ix t x t i or
0( ) ( )i
oi ij j i i i
j Ne y y g y y
Local neighborhood estimation error
0o
n ne L G I y y L G I y L G C x Ren; Beard, Kingston 2008
Overall estimation error
i
j
Local node observersˆ ˆ o
i i i ix Ax Bu cF
0ˆ ˆ ( ) ( )i
i i i ij j i i ij N
x Ax Bu cF e y y g y y
0ˆ ˆ ( ) ( )i
i i i ij j i i ij N
x Ax Bu cFC e x x g x x
ˆ ˆ( ) ( ) ( ) ( )N Nx I A c L G FC x c L G F y I B u
1 2ˆ ˆ ˆ ˆTT T T nN
Nx x x x R
ˆ ˆ ( )Gr Gro o Nx A x F y I B u
( ) ( ) GrN ox I A c L G FC x A x
Overall observer dynamics
where
Overall estimation error dynamics
Li, Duan, Ron Chen 2009
c.f. Finsler Lemma design in Li, Duan, Ron Chen 2009
Emre Tuna 2008 paper online(without observer design)
OPTIMALDesign at Each node
OPTIMAL Design at Each node gives guaranteed performanceOn any strongly connected communication di-graph topology
C. Control/Observer Duality on Graphs
Converse, reverse, or transpose graph = reverse edge arrows
0( ) ( ) ,i
i i ij j i i ij N
u cK cK e x x g x x
0ˆ ˆ ( ) ( )
i
i i i ij j i i ij N
x Ax Bu cFC e x x g x x
Must use local nbhd. Tracking error and local nbhd. Estimation erroror duality does not work
SVFB Observer
D. Dynamic Tracker for Synchronization of Cooperative Systems Using Output Feedback
,i i ix Ax Bu i iy Cx
0 0 ,x Ax 0 0y Cx
ˆ ˆ oi i i ix Ax Bu cF
0ˆ ˆ ( ) ( )i
i i i ij j i i ij N
x Ax Bu cF e y y g y y
0ˆ ˆ ˆ ˆ( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
0ˆ ˆ ˆ( ) ( )i
i i ij j i i ij N
x Ax cBK e x x g x x
0ˆ( ) ( ) ( )Nx I A x c L G BKx c L G BK x
0
ˆ ˆ( ) ( ) ( )
ˆ( ) ( )Nx I A c L G FC x c L G F y
c L G BKx c L G BK x
N nodes with dynamics
Command generator dynamics (exosystem)
Observers at each node
Estimated SVFB
Closed‐loop systems
Overall system/observer dynamics
Must use local nbhd. Tracking error and local nbhd. Estimation erroror it is not nice.
OPTIMAL Design at Each node gives guaranteed performanceOn any strongly connected communication di-graph topology
Three Regulator Designs
0ˆ ˆ ( ) ( )i
i i i ij j i i ij N
x Ax Bu cF e y y g y y
0ˆ ˆ ˆ ˆ( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
Nbhd Observers
Nbhd Controls
1. Neighborhood Controller and Neighborhood Observer
2. Neighborhood Controller and Local Observer
0ˆ ˆ ˆ ˆ( ) ( )i
i i ij j i i ij N
u cK cK e x x g x x
Local Observers
Nbhd Controls
ˆ ˆ .i i i ix Ax Bu cF y
3. Local Controller and Neighborhood Observer
ˆi iu Kx
0ˆ ˆ ( ) ( )i
i i i ij j i i ij N
x Ax Bu cF e y y g y y
Nbhd Observers
Local Controls
i
j
i
j
i
j
i
j
ij
ij
Distributed Systems
1i i ix k Ax k Bu k
0 01x k Ax k
0i
i ij j i i ij N
e x x g x x
11i i i iu c d g K
kBKgdckAxkx iiiii 111
11 c Nk A k I A c I D G L G BK k
GLGDI 1
, 1,k k N
Discrete‐Time Optimal Design for Synchronization
Distributed systems
Command generator
Local Nbhd Tracking Error
Local closed‐loop dynamics
Local cooperative SVFB – weighted
0 ( )k x k x k Global disagreement error dynamics
Weighted Graph Matrix
Weighted graph eigenvalues
Work with Kristian Movric, Lihua Xie, Keyou You ‐ Automatica
Decouple controls design from graph topology
1
r
c0
r0
Covering circle of graph eigenvalues
Synchronization region contains this circle
Ctrldesign
GraphProps.
Kristian Movric
Single‐Input case with Real Graph Eigenvalues
1/21/2 1 1/20max
0
( ( ) )T T Tr r Q A PB B PB B PAQc
0 max min
0 max min
.rc
If graph eigenvalues are real
u
u Ar
1For SI systems, for proper choice of Q
Mahler measure
2log uii
A intrinsic entropy rate = minimum data rate in a networked control system that enables stabilization of an unstable system – Baillieul and others.
min max/ Eigen‐ratio = ‘condition number’ of the communication graph
condition
Work on log quantization- Elia & Mitter, Lihua Xie
1
2 3
4 5 6
12
3
4 5
6
Graph Eigenvalues for Different Communication Topologies
Directed Tree-Chain of command
Directed Ring-Gossip networkOSCILLATIONS
Graph Eigenvalues for Different Communication Topologies
Directed graph-Better conditioned
Undirected graph-More ill-conditioned
65
34
2
1
4
5
6
2
3
1
E-vals of
B. Cooperative State Feedback, Observers, Duality, and Optimal Design for Synchronization
101 ( ) ( )
i
i i i ij j i i ij N
u c d g K e x x g x x
Thm 1. Design of SVFB Gain for Coop. Tracking Stability
( ) ( )i iy k Cx k
0 ˆ( ) ( ),i
oi ij j i i i i i i
j Ne y y g y y y y y
Thm 2. Design of Observer Gain for Coop. Estimation
Cooperative system node dynamics
State feedback with local neighborhood tracking error
Overall Cooperative Team Error Dynamics
Output measurements at each node
Local nbhd. estimation error
Use local coop. observer dynamics at each node
Overall Team Observer Error Dynamics
CONTROL DISTRIBUTED ESTIMATION & SENSOR FUSIONDuality
1i i ix k Ax k Bu k
11 c Nk A k I A c I D G L G BK k
0 ( )k x k x k
GLGDI 1, 1,k k N
1( ) 0T T T TA PA P Q A PB B PB B PA
1/21/2 1 1/2max: ( ( ) )T T Tr Q A PB B PB B PAQ
0
0
r rc
0 0,C c r
PABPBBK TT 1
0
1cc
11ˆ ˆ1 1 o
i i i i i ix k Ax k Bu k c d g F k
ˆ( ) ( ) ( )k x k x k
111 Nk I A c I D G L G FC k
10T T T TAPA P Q APC CPC CPA
1/211/2 1/2
max: T T Tobsr Q APC CPC CPA Q
obsrcr
0
0
1T TF APC CPC
10
1cc
Control Graph and Observer Graph need not be the same
C. Control/Observer Duality on Graphs
Converse, reverse, or transpose graph Gr’ = reverse edge arrows
0( ) ( ) ,i
i i ij j i i ij N
u cK cK e x x g x x
Non‐normalized SVFB and ObserverMust use local nbhd. Tracking error and local nbhd. Estimation error
or duality does not work
SVFB Observer
1ˆ ˆ1 oi i i ix k Ax k Bu k c F k
Let SVFB K stabilize coop. tracking error 1 c Nk A k I A c L G BK k
11 TT TNk I A c L G FB k
on graph Gr
Then observer gain F=KT stabilizes coop estimation error
on the reverse graph Gr’
D. Three Coop. Regulator Designs – Coop. Separation Principle
10ˆ ˆ ˆ1 ( ) ( )
i
i i i ij j i i ij N
u c d g K e x x g x x
Nbhd Observers
Nbhd Controls
1. Neighborhood Controller and Neighborhood Observer
2. Neighborhood Controller and Local Observer
Local Observers
Nbhd Controls
( 1) ( ) ( ) )ˆ (ˆi i i ik kx Ax B k y ku cF
3. Local Controller and Neighborhood Observer
ˆi iu Kx
Nbhd Observers
Local Controls
i
j
i
j
i
j
i
j
ij
ij
11 0ˆ ˆ1 1 ( ) ( )
i
i i i i i ij j i i ij N
x k Ax k Bu k c d g F e y y g y y
10ˆ ˆ ˆ1 ( ) ( )
i
i i i ij j i i ij N
u c d g K e x x g x x
11 0ˆ ˆ1 1 ( ) ( )
i
i i i i i ij j i i ij N
x k Ax k Bu k c d g F e y y g y y
Kristian Movric
max min
max min
( ) u
u
A A
Single‐Input case with Real Graph Eigenvalues
Is equivalent to max
min
( ) 1( ) 1AA
1i i ix k Ax k Bu k 0i
i ij j i i ij N
u cK e x x g x x
0( )i
i ij j i i ij N
u cF z K e x x g x x
Add stable filter
2max
min
( ) 1( ) 1AA
Filtered protocol gives synch. if
select ( )A 2
2 2 1min 2
min
( (1 ) ) ,T T TP A P I BB P A B PB
the stabilizing solution to0P
2 2 1min min( (1 ) )T TK I B PB B PA
1min min( ) ( )T z K zI A BK B
1 2
2
(1 )( )1 ( )
F zT z
Complementary sensitivity
Guoxiang Gu &Lewis – IEEE TAC
Improvement
Results:
Distributed dynamic regulator for synchronization of teams using only output measurements
Duality structure theory extended to networked cooperative feedback systems on graphs
Riccati Design yields synchronization Decouples Controls Design from Graph Properties.
max
min
( )G Graph Condition Number
Like to have ( ) 1G
min large means fast convergence
eigenratio= min
max
L.R. Varshney, “Distributed inference with costly wires”
max
min
( ) 1( ) 1AA
UTA Research Institute (UTARI)The University of Texas at Arlington
F.L. Lewis
http://ARRI.uta.edu/acs
Games on Communication Graphs
Supported by :NSF ‐ PAUL WERBOSARO, AFOSR
,i i i ix Ax B u
0 0x Ax
0( ) ( ),ix t x t i
0( ) ( ),i
i ij i j i ij N
e x x g x x
( ) ,nix t ( ) im
iu t
Graphical GamesSynchronization‐ Cooperative Tracker Problem
Node dynamics
Target generator dynamics
Synchronization problem
Local neighborhood tracking error (Lihua Xie)
Pinning gains (Ron Chen)0ig
x0(t)
K. Vamvoudakis and F.L. Lewis, “Graphical Games for Synchronization”Automatica, to appear.
,i i i ix Ax B u
0 0x Ax
0( ) ( ),ix t x t i
0( ) ( ),i
i ij i j i ij N
e x x g x x
( ) ,nix t ( ) im
iu t
1 2 ,TT T T nN
N 0 0nNx Ix
0 ,n nL G I x x L G I
/ ( )L G
0nNx x
1 2 ,TT T T nN
Nx x x x
Graphical GamesSynchronization‐ Cooperative Tracker Problem
Node dynamics
Target generator dynamics
Synchronization problem
Local neighborhood tracking error (Lihua Xie)
Global neighborhood tracking error
Lemma. Let graph be strongly connected and at least one pinning gain nonzero. Then
and agents synchronize iff ( ) 0t
Pinning gains (Ron Chen)0ig
x0(t)
Standard way =
( )i
i i i i i i ij j jj N
A d g B u e B u
0( ) ( )i
i ij i j i ij N
e x x g x x
12
0
( (0), , ) ( )i
T T Ti i i i i ii i i ii i j ij j
j N
J u u Q u R u u R u dt
12
0
( ( ), ( ), ( ))i i i iL t u t u t dt
12( ( )) ( )
i
T T Ti i i ii i i ii i j ij j
j Nt
V t Q u R u u R u dt
1
N
i ii
z Az B u
12
10
( (0), , ) ( )N
T Ti i i j ij j
j
J z u u z Qz u R u dt
( ) { : }i j iu t u j N
1
( , , )
( , ),
( ,{ : })
TN
i i j i
G U v
G V E v v v
v U U j N R
Graphical Game: Games on GraphsLocal nbhd. tracking error dynamics
Define Local nbhd. performance index
Local value functions for fixed policies iu
Static Graphical Game
Standard N‐player differential game
Values depend on all other agents
Value depends only on neighbors
Local agent dynamics driven by neighbors’ controls
Dynamics depend on all other agents
Values driven by neighbors’ controls
Kyriakos Vamvoudakis
1 1 11 1 2 3 1 2 1 3 13 3 3( ) ( ) ( ) coi
teamJ J J J J J J J J J
1 1 12 1 2 3 2 1 2 3 23 3 3( ) ( ) ( ) coi
teamJ J J J J J J J J J
1 1 13 1 2 3 3 1 3 2 33 3 3( ) ( ) ( ) coi
teamJ J J J J J J J J J
The objective functions of each player can be written as a team average term plus a conflict of interest term:
1 1
1 1
( ) , 1,N N
coii j i j team iN N
j j
J J J J J J i N
For N-player zero-sum games, the first term is zero, i.e. the players have no goals in common.
For N-players
Team Interest vs. Self InterestCooperation vs. Collaboration
( ), { : }i j iu t u j N
{ : , }G i ju u j N j i
* * *1 2, ,...,u u u
* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N
* *( )i i iJ J u
( ) ( , ) ( , ' ),i i i i G i i i G iJ u J u u J u u i
* 12( ( )) min ( )
ii
T T Ti i i ii i i ii i j ij ju
j Nt
V t Q u R u u R u dt
Problems with Nash Equilibrium Definition on Graphical GamesGame objective
Neighbors of node i
All other nodes in graphDef: Nash equilibrium
are in Nash equilibrium if
Counterexample. Disconnected graph
Let each node play his optimal control
Then, each agent’s cost does not depend on any other agent
Then all agents are in Nash equilibriumNote‐ this Nash is also coalition‐proof
Another example
Define
Def. Local Best response.is said to be agent i’s local best response to fixed policies of its neighbors if
* * *( , ) ( , ), ,i j G j i j G jJ u u J u u i j N
A restriction on what sorts of performance indices can be selected in multi‐player graph games.
A condition on the reaction curves (Basar and Olsder) of the agents
This rules out the disconnected counterexample.
*( , ) ( , ),i i i i i i iJ u u J u u u
*iu iu
New Definition of Nash Equilibrium for Graphical Games
* * *1 2, ,...,u u u
* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N
Def: Interactive Nash equilibrium
are in Interactive Nash equilibrium if
2. There exists a policy such that ju
1.
That is, every player can find a policy that changes the value of every other player.
i.e. they are in Nash equilibrium
1 1 12 2 2( , , , ) ( ) 0
i i
TT T Ti i
i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N
V VH u u A d g B u e B u Q u R u u R u
10 ( ) Ti ii i i ii i
i i
H Vu d g R B
u
2 1 2 1 11 1 12 2 2( ) ( ) 0,
i
TT Tj jc T T Ti i i
i i ii i i i i ii i j j j jj ij jj ji i i j jj N
V VV V VA Q d g B R B d g B R R R B i N
2 1 1( ) ( ) ,i
jc T Tii i i i i ii i ij j j j jj j
i jj N
VVA A d g B R B e d g B R B i N
* *( , , , ) 0ii i i i
i
VH u u
* 2 11 1 12 2 20 ( , , , ) ( )
i
T Tc T T Ti i i i
i i i i i i ii i i i i ii i j ij ji i i i j N
V V V VH u u A Q d g B R B u R u
2 1( )
i
c T ii i i i i ii i ij j j
i j N
VA A d g B R B e B u
12( ( )) ( )
i
T T Ti i i ii i i ii i j ij j
j Nt
V t Q u R u u R u dt
Value function
Differential equivalent (Leibniz formula) is Bellman’s Equation
Stationarity Condition
1. Coupled HJ equations
2. Best Response HJ Equations – other players have fixed policies
where
where
Graphical Game Solution Equations
ju
Kyriakos Vamvoudakis
Theorem 3. Let (A,Bi) be reachable for all i. Let agent i be in local best response
Then are in global Interactive Nash iff the graph is strongly connected.
*( , ) ( , ),i i i i i iJ u u J u u i
* * *1 2, ,...,u u u
i
k
( ) (( ) ) ( ) (( ) )0( ) ( )
N n i i n kk kT
ii N
I A L G I diag B K L G I Bv A Bv
p pp diag Q I A
1( ) ( ) T ii i i i i ii i i i
i
Vu u V d g R B K p
k k k ku K p v
2B AB A B
Picks out the shortest path from node k to node i
A
BAB
Online Solution of Graphical Games
Use Reinforcement Learning Convergence Results
POLICY ITERATION
Kyriakos Vamvoudakis
Motions of Biological Groups
Fishschool
Birdsflock
Locustsswarm
Firefliessynchronize
Local / Peer-to-Peer Relationships in socio-biological systems
Online Solution of Graphical Games
Use Reinforcement Learning Convergence Results
POLICY ITERATION
Kyriakos Vamvoudakis
System
Action network
Policy Evaluation(Critic network)cost
The Adaptive Critic Architecture
Adaptive Critics
Value update
Control policy update
Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control
Policy Iteration is Reinforcement Learning
F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEECircuits & Systems Magazine, Invited Feature Article, pp. 32‐50, Third Quarter 2009.
1 1 12 2 2( , , , ) ( ) 0
i i
TT T Ti i
i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N
V VH u u A d g B u e B u Q u R u u R u
1( ) T ii i i ii i
i
Vu d g R B
Online Solution of Graphical Games
Policy Iteration gives structure needed for online graph games
Solve simultaneously online:
ˆ ˆ Ti i iV W
112
ˆˆ ( )T
T ii i i ii i i N
iu d g R B W
Weierstrass Approximator structures‐ 2 at each node
actor
critic
2 11 14 4
ˆ ˆ ˆ ˆ ˆ( )i
Tj jT T T T T
i i i ii i i N i i N j j j N j jj ij jj j j Nj jj N
W Q W DW d g W B R R R B W
ˆ ˆ( ( ) )i
ii i i i i i ij j j
i j N
A d g B u e B u
Bellman equation
Bellman equation becomes an algebraic equation in the parameters
Approximate values by a Critic Network at each node i
ˆ ˆ Ti i iV W
Approximate control policies by an Actor Network at each node i
112
ˆˆ ( )T
T ii N i i ii i i N
iu d g R B W
1
2 11 14 42
ˆˆ
ˆ ˆ ˆ ˆ ˆ[ ( ) ](1 )
i
i ii
Tj jT T T T Ti
i i i i ii i i N i i N j j j N j jj ij jj j j Ni i j jj N
EW a
W
a W Q W DW d g W B R R R B W
Online Solution of Graphical Games Using Value Function Aproximation
Tuning law for Critic parameters
Tuning law for Actor parameters
2 11
1 1ˆ ˆ ˆ ˆ ˆ ˆ ˆ{( ) ( ) }4 4
ii
TT Tj jT T T Ti N i N
i N i N i i N i i N i i i N i i N j j j j jj ij jj jsi s j jj N
j i
W a F W F W DW W W d g W B R R R Bm m
Converges to solution of coupled HJ equations, and Nash equilibriumand keeps states stable while learningNeed PE of ˆ ˆ( ( ) )
i
ii i i i i i ij j j
i j N
A d g B u e B u
System
Action network
Policy Evaluation(Critic network)cost
The Adaptive Critic Architecture
Adaptive Critics
Value update
Control policy update
Critic and Actor tuned simultaneouslyLeads to ONLINE FORWARD‐IN‐TIME implementation of optimal control
A new adaptive control architecture
Optimal Adaptive Control
ˆ ˆ Ti i iV W
112
ˆˆ ( )T
T ii i i ii i i N
iu d g R B W
Can avoid knowledge of drift term f(x) by using Integral Reinforcement Learning (IRL)
Draguna Vrabie
Then HJ equations are solved online without knowing f(x)
Coupled AREs are solved online without knowing A
Lemma 1 – Draguna Vrabie
( ( )) ( , ) ( , )Td V x V f x u r x u
dt x
Proof:
( , ) ( ( )) ( ( )) ( ( ))t T t T
t t
r x u d d V x V x t V x t T
Solves Lyapunov equation without knowing f(x,u)
( ( )) ( , ) ( ( )), (0) 0t T
t
V x t r x u d V x t T V
0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V
x x
Allows definition of temporal difference error for CT systems
( ) ( ( )) ( , ) ( ( ))t T
t
e t V x t r x u d V x t T
Another form for the CT Bellman eq.
Is equivalent to
Integral Reinforcement Form of Bellman Equation
Draguna Vrabie
1. Natural biological groups do not measure synchronization errors !
Agents sense rough relative positions or motions
and adjust their motion accordingly
2. We want FINITE-TIME synchronization
0sgn( ) sgn( )i
i ij j i i ij N
u a x x b x x
Propose a Signum protocol
Binary, 1-bit quantization, reduced information needed
Discontinuous- makes a big mess.
Gang Chen, College of Automation Chongqing University
11
1
2 node consensus example in Wassim Haddad paper
sgn( )x uu x
u
t
1
-1
x
t
Linear state evolution
( ) ( ( ))x t f x t nx R f(x) Lebesgue measurable and locally essentially bounded
Filippov set-valued map0 ( ) 0
[ ]( ) { ( ( ) \ )}S
K f x co f B x S
Filippov solution on
An absolutely continuous function that satisfies the differential inclusion
0 1[ , ]t t
0 1: [ , ] nx t t R
( ) [ ]( )x t K f x for almost all 0 1[ , ]t t t
A Filippov solution is maximal if it cannot be extended forward in time
Set is weakly (resp. strongly) invariant if it contains a (resp. all) maximal solutions0x M nM R
Same as lim ( ) : :i i i fiK f x co f x x x y N N
,
Cortes 2008
Bacciotti & Ceragioli, 1999, Shevitz & Paden, 1994
Wassim Haddad 2008
Paden &Sastry 1987
Differential Equations with Discontinuous Right-Hand Sides
NOT necessarily Lipschitz
Caratheodory solutions work, in fact. But we need Filippov set-valued map on the next page.
: nV R R a locally Lipschitz function
the set of measure zero where gradient does not existVN
( ) {lim ( ) : , } [ ]( )i i i ViV x co V x x x x N N K V x
Clarke generalized gradient
Set-valued Lie Derivative [ ]( ) { | [ ]( ) s.t. , ( )}TfL V x a R K f x a V x
Cortes 2008
Bacciotti & Ceragioli, 1999, Shevitz & Paden, 1994
Wassim Haddad 2008Paden &Sastry 1987
Lyapunov Analysis for Discontinuous Dynamical Systems
Lyapunov function second derivative is strictly negative implies finite-time consensus
, 1, 2, ,i ix u i n
1 2 0( ) ( ) ( ) ,nx t x t x t x t T
0sgn( ) sgn( )i
i ij j i i ij N
u a x x b x x
1
-1
(
)
0sgn( ) sgn( )i
i ij j i i ij N
x a x x b x x
0i ie x x
sgn( ) sgn( )i
i ij j i i ij N
e a e e b e
Error dynamics
( ) ( )i
i ij j i i ij N
e a SGN e e b SGN e
Paden and Sastry 1987- calculus for Filippov map
sgn(.)
1
-1 SGN(.)
Node dynamics
Finite-time Synchronization in time T
Signum protocol
Closed-loop system
Synch. error
Binary, 1-bit quantization, reduced information needed
Cortes 2006, 2008
Bacciotti & Ceragioli, 1999, Shevitz & Paden, 1994
Wassim Haddad 2008
Filippov solutions (Caratheodory works)Fillipov set-valued map
11 1 1 1
1sgn( )2
n n n nT
ij i j i ij j ii j i j
a x x x a x x
Lemma 1. For undirected network one has
Theorem 1. Let the communication graph be undirected and connected. If there is at least one pinned node,
then the signum protocol solves the controlled consensus problem asymptotically.
Proof. Choose the candidate Lyapunov function
12
TV e e
1 2[ , , , ]T T T Tne e e e
[ ] [ ]TfL V K e e
1( sgn( ) sgn( ))
i
nTi ij j i i i
i j NK e a e e b e
1. Undirected Graphs Gang Chen, College of Automation Chongqing University
Interesting fact: the Lyapunov function and derivative are continuous for undirected graphs.
Proof: check out the second derivative of Lyapunov function
20[ ( )]L V t
12
TV e e
Convergence time bounded by
Depends on the initial synchronization errors
Must use Clarke generalized gradient and set-valued Lie derivative
Undirected Graphs – Finite Time Convergence
max{ }i inb d max din = max in-degree
Gang Chen, College of Automation Chongqing University
2[ ( )]( ) 0f fL L V x Cortes 2006
2. Directed Graphs
, ,i ij j jip a p a i j
Detail-balanced
There exist such that0ip
Sum over j. Then 1 2 Np p p p is a left e-vector of L for zero e-val
Gang Chen, College of Automation Chongqing University
11
12 22
0LF
F F
Let there exist a spanning tree.
Frobenius form of graph Laplacian L
11L
22F
1S
2S
3S
4S
2,max{ }ind
1,max{ }ind3
4,jll S
a some j S
Proof: first show that the leaders reach consensus in finite time, then show followers do so also.2
1 0 1, 1[ ( )] iL V t p Convergence times bounded like
leaders
followers
Must use Clarke generalized gradient and set-valued Lie derivative
Finite Time Tracking Control
0 0( , )x g t xCommand generator node (exosystem)
0 01 sgn( ) , sgn( )
i
i
i ij j j i i ij Nij i
j N
u a u x x b g t x x xa b
Idea from Lihua Xie and Liu Shuai -allows tracking a moving leader with FIRST-ORDER dynamics
i ix u
0 0 01 sgn ( ) ( ) ( ( , ) sgn(( ) ( )))
i
i
i ij j j j i i i i ij Nij i
j N
u a u x x b g t x x xa b
Finite Time Formation Control
Idea from Lihua Xie and Liu Shuai -formation position offsets
Gang Chen, College of Automation Chongqing University
Node dynamics
Control protocol
Finite Time Tracking Control
0 0( , )x g t xCommand generator node (exosystem)
0 01 sgn( ) , sgn( )
i
i
i ij j j i i ij Nij i
j N
u a u x x b g t x x xa b
i ix u
0 0 01 sgn ( ) ( ) ( ( , ) sgn(( ) ( )))
i
i
i ij j j j i i i i ij Nij i
j N
u a u x x b g t x x xa b
Finite Time Formation Control
Gang Chen, College of Automation Chongqing University
Node dynamics
Control protocol
What does Finite-Time Consensus Look Like? - Simulation
leaders
followers
Finite Time Consensus!
Communication Topology
Gang Chen, College of Automation Chongqing University
Leaders reach consensus