AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...

Separation result for delayedsharing information structures

Aditya Mahajan

McGill UniversityJoint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan

Queen's University, July 12, 2011

Common theme: multi-stage multi-agentdecision making under uncertainty

Outline of this talk

What is the conceptual difficulty withmulti-agent decision making? How to resolve it?

Modeling decision making under uncertainty

Overview of single-agent decision making

Delayed sharing information structure:A “simple” model for multi-agent decision making

History of the problemOur approachMain results

Conclusion

Model of uncertainty Model of information

Model of objective

Modeling multi-stage decisionmaking under uncertainty

Model of uncertainty

Stochastic dynamics

�� = ( � , � , � )Noisy observations

� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.

System dynamics and observationfunction ℎ are known.

Model of information

Model of objective

Stochastic dynamics

Single DM with perfect recall

� = � ( �:� , �:�� )Model of objective

Stochastic dynamics

Single DM with perfect recall

� = � ( �:� , �:�� )Model of objective

Cost at time � = � ( � , � ).Objective: minimize expected totalcost � [ �∑�� , � ]

More general setups

Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.

Fixed memory/complexity at decision makerMore than one decision maker

Model of objective

Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)

More general setups

Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.

Fixed memory/complexity at decision makerMore than one decision maker

Model of objective

Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)

Conclusion

Design difficulties� = � �:�, �:��Domain of control laws increaseswith time

min��,...,�� [ �∑�� , � ]Search of optimal control policyis a functional optimization problem

Single-agent decision making

Structural resultsDefine, information state:�� = ℙ � | �:�, �:��Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��

Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy

� �� = min�� [� �, �+ �� | ��, �]

Estimation Control

� �� = min�� [� �, �+ �� | ��, �]

Estimation Control

� �� = min�� [� �, �+ �� | ��, �]

�� is policy independent

Estimation Control

� �� = min�� [� �, �+ �� | ��, �]

�� is policy independent Each step of DP is a parameteroptimization.

(One-way) separation betweenestimation and control

In single-agent decision making, estimation is separated from control.This separation is critical for decomposing the search of optmial

control policy into a sequence of parameter optimization problems.

Does this separation extendto multi-agent decision making?

Conclusion

Model of objective

Delayed-sharing information structure

Model of uncertaintyStochastic dynamics

�� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , �� )

Model of objective

Each DM has perfect recall

Each DM observes �-step delayedinformation (observation and actions)of other DMs

Model of objective

�� = �� ( ��:�, ��:��:��, ��:�� )

Model of objective

�� = �� ( ��:�, ��:��:��, ��:�� )

Model of objective

Cost at time � = � ( � , �:�� ).Objective: minimize

� [ �∑�� , �:�� ]

Some Notation

�� = �� ( ��:�, ��:��:��, ��:�� ) �� = �� ( ��:��, ��:��:�, ��:�� )Thus, �� = �� ( �� , �� )where

Common info �� = �:��:��, �:��:��Local info �� = ��:�, ��:��

Some Notation

�� = �� ( ��:�, ��:��:��, ��:�� ) �� = �� ( ��:��, ��:��:�, ��:�� )Thus, �� = �� ( �� , �� )where

Common info �� = �:��:��, �:��:��Local info �� = ��:�, ��:��

Same design difficulties as single-agent case

Literature overview

(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).

Literature overview

(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .

Literature overview

(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.

Literature overview

(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.

NMT 2011 also obtain a recursive algorithm to find optimal control laws.At each step, we need to solve a functional optimization problem.

Original setup�� = �� , ��Structure of optimal control law

Original setup�� = �� , ��W71 Assertion�� = �� ℙ �� | �� , ��

Structure of optimal control law

NMT11 First result�� = �� ℙ� �, ��:�� | �� , ��Structure of optimal control law

NMT11 First result�� = �� ℙ� �, ��:�� | �� , ��NMT11 Second result�� = �� ℙ �� | �� , �� ,��:��:��

Contrast dependence on policy for the different results.

Importance of the problem

Applications (of one step delay sharing)

Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991

Importance of the problem

Applications (of one step delay sharing)

Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991

Conceptual significance

Understanding the design of networked control systems

Bridge between centralized and decentralized systems

Insights for the design of general decentralized systems

Orthogonal search does

not work due to presence

of signaling. How does

controller 1 figure out how

agent 2 will interpret

his (agent 1's) actions?

Orthogonal search does

not work due to presence

of signaling. How does

controller 1 figure out how

agent 2 will interpret

his (agent 1's) actions?

Proof outline

1. Construct a coordinated system

2. Show that any policy of the coordinated system is implementable in theoriginal system and vice-versa. Hence, the two systems are equivalent.

3. Optimal design of the coordinated system is a single-agent multi-stagedecision problem. Find a solution for the coordinated system.

4. Translate this solution back to the original system.

Step 1: The coordinated system��, �� , ��

Define partially evaluated control law: �� = �� ,

Step 1: The coordinated system��

�� , ��

��

Define partially evaluated control law: �� = �� ,Coordinator prescribes �� , �� to the controllers as�� , �� = �� , ��:��, ��:��

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� = �� , �� = ( �� , , �� , )

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� = �� , �� = ( �� , , �� , )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� = �� , �� = ( �� , , �� , )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.

At time 1, both controllers know ��. Choose�� , �� = �� .At time 2, both controllers knows ��, �� , and �� . Choose�� , �� = �� , �� , �� .

Step 3: Solve the coordinated system

By construction, the coordinated system has a single decision makerwith perfect recall.

Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history

Then, there is no loss of optimality in restricting attention to controllaws of the form:

control action = Fn ��

Step 3: Solve the coordinated system

By construction, the coordinated system has a single decision makerwith perfect recall.

Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history

Then, there is no loss of optimality in restricting attention to controllaws of the form:

control action = Fn ��What is the state (for I/O mapping) for the system.

State for the coordinated system

��

�� , ��

��

� �� , ��

State for the coordinated system

��

�� , ��

��

� �� , ��

State�, �� , ��

Define �� = ℙ �, �� , �� | ��, ��:��, ��:��Then, there is no loss of optimality in restricting attention tocoordination laws of the form�� , `�� = ��

Define �� = ℙ �, �� , �� | ��, ��:��, ��:��Then, there is no loss of optimality in restricting attention tocoordination laws of the form�� , `�� = �� The following recursive equations provide an optimal coordination policy

� �� = min�� ,�� [� �, � + �� | ��, �� , �� ]

Step 4: Translate the solution

For a system with delayed-sharing information structure, there is no loss ofoptimality in restricting attention to control laws of the form�� = �� , ��Optimal control laws can be obtained by the solution of the followingrecursive equations

� �� = min�� ,⋅ ,�� ,⋅ �[� �, � + �� | ��, �� , , �� , ]

Features of the solution

The space of realizations of �� = ℙ �, �� , �� | ��, ��:��, ��:��is time-invariant. Thus, the domain of the control laws �� , �� istime-invariant.�� is not policy independent! Estimation is not separated from control.This is always the case when signaling is present.

In each step of the dynamic program, we choose the partially evaluatedcontrol laws �� , , �� , . Choosing partially evaluated functions(instead of values) allows us to write a dynamic program even in thepresence of signaling.

Conclusion

Summary

Simple methodology to resolve a 40 year old open question:

Find common information at each time

Look at the problem for the point of view of a coordinator thatobserves this common information and chooses partiallly evaluatedfunctions

Find an information state for the problem at the coordinatorℙ(state for input-output mapping | common information)( ℙ(past state | common information), past partial control laws )

This methodology is also applicable to systems with more generalinformation structures (Mahajan, Nayyar, Teneketzis, 2008).

Salient Features

The size of the information state is time-invariant

The methodology is also applicable to infinite horizon problems

Each step of DP is a functional optimization problem

Form of the DP is similar to that of POMDP

Can borrow from the POMDP literature for numerical approaches

AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...

Documents

Transcript of AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...

2020 NTU-UTokyo€¦ · 3. Wu WK, Chen CC, Panyod S, Chen RA, Wu MS, Sheen LY, Chang SC*: Optimization of fecal sample processing for microbiome study - The journey from bathroom

Author: Kang Li, Francis Chang, Wu-chang Feng Publisher: INCON 2003 Presenter: Yun-Yan Chang Date:2010/11/03 1.

2016/1/51 2005 Oh My God Summer Camp by Vanessa Chang Sandy Wu HPER 678 Spring 2008.

Nephrotic Syndrome Associated with ThymomaNephrotic Syndrome Associated with Thymoma Sheng-Yen Hsiao 1,2 , Kwang-Yu Chang 1,2 , Jang-Yang Chang 1,2 , Wu-Chou Su 2 * 1 National Institute

Lung Hung Chen, Chia-Huei Wu and Jen-Ho Chang Gratitude ...eprints.lse.ac.uk/67116/1/Gratitude_mindfulness_2016.pdfLung Hung Chen, Chia-Huei Wu and Jen-Ho Chang Gratitude and athletes’

Optimization of an In silico Cardiac Cell Model for ...cipaproject.org/.../08/Dutta-et-al-2017-Front-Physiol.pdfDutta S, Chang KC, Beattie KA, Sheng J, Tran PN, Wu WW, Wu M, Strauss

Reporter ： Chun-Yang Hsieh Advisor ： Wen-Chang Wu Date ： 2014/3/26

1 CSE 524: TCP/IP Internetworking Protocols Wu-chang Feng.

CSE 524: TCP/IP Internetworking Protocols Wu-chang Feng.

Analog to Digital Microfluidic Converter - COMSOL Multiphysics · Analog to Digital Microfluidic Converter Renaud Dufour Chang Wu Renaud Dufour, Chang Wu, Farida Bendriaa, Vincent

Security services and the IXP Wu-chang Feng wuchang@cse.ogi.edu Systems Software Laboratory Dept. of Computer Science and Engineering.

TCP/IP Internetworking Protocols Instructor: Wu-chang Feng.

SPEAKER : KAI-JIA CHANG ADVISER : QUINCY WU DATA : 2010-03-17 Spiral Model.

The Way of the Wu-Chang: the Art of the Kitchen Remix

Presenter : Chang,Chun-Chih Authors : I-Chin Wu , Yi-Sheng Lin2012 2012 DSS

Wen-Lung Wu and Wen-Cheng Chang

Herbert G. Mayer, PSU CS status 6/24/2012 Slides derived from prof. Wu-Chang Feng

Wu Tien-chang: Never Say Goodbye Collateral Event of the ... · Taipei – ‘Wu Tien-chang: Never Say Goodbye’, Collateral Event of the 56th International Art Exhibition - La Biennale

1 A Trafﬁc Characterization of Popular On-line Games · A Trafﬁc Characterization of Popular On-line Games Wu-chang Feng Francis Chang Wu-chi Feng Jonathan Walpole Department

Provisioning On-line Games: A Traffic Analysis of a Busy Counter-Strike Server Wu-chang Feng, Francis Chang, Wu-chi Feng, Jonathan Walpole.