AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...

57
Separation result for delayed sharing information structures Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan Queen's University, July 12, 2011

Transcript of AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...

Page 1: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Separation result for delayedsharing information structures

Aditya Mahajan

McGill UniversityJoint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan

Queen's University, July 12, 2011

Page 2: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Common theme: multi-stage multi-agentdecision making under uncertainty

Page 3: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Outline of this talk

What is the conceptual difficulty withmulti-agent decision making? How to resolve it?

Modeling decision making under uncertainty

Overview of single-agent decision making

Delayed sharing information structure:A “simple” model for multi-agent decision making

History of the problemOur approachMain results

Conclusion

Page 4: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertainty Model of information

Model of objective

Modeling multi-stage decisionmaking under uncertainty

Page 5: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertainty

Stochastic dynamics

��� = ( � , � , � )Noisy observations

� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.

System dynamics and observationfunction ℎ are known.

Model of information

Model of objective

Modeling multi-stage decisionmaking under uncertainty

Page 6: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertainty

Stochastic dynamics

��� = ( � , � , � )Noisy observations

� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.

System dynamics and observationfunction ℎ are known.

Model of information

Single DM with perfect recall

� = � ( �:� , �:��� )Model of objective

Modeling multi-stage decisionmaking under uncertainty

Page 7: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertainty

Stochastic dynamics

��� = ( � , � , � )Noisy observations

� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.

System dynamics and observationfunction ℎ are known.

Model of information

Single DM with perfect recall

� = � ( �:� , �:��� )Model of objective

Cost at time � = � ( � , � ).Objective: minimize expected totalcost � [ �∑��� �� �, � ]

Modeling multi-stage decisionmaking under uncertainty

Page 8: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

More general setups

Model of uncertainty

Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.

Model of information

Fixed memory/complexity at decision makerMore than one decision maker

Model of objective

Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)

Page 9: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

More general setups

Model of uncertainty

Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.

Model of information

Fixed memory/complexity at decision makerMore than one decision maker

Model of objective

Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)

Page 10: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Outline of this talk

What is the conceptual difficulty withmulti-agent decision making? How to resolve it?

Modeling decision making under uncertainty

Overview of single-agent decision making

Delayed sharing information structure:A “simple” model for multi-agent decision making

History of the problemOur approachMain results

Conclusion

Page 11: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Design difficulties� = � �:�, �:���Domain of control laws increaseswith time

min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem

Single-agent decision making

Page 12: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Design difficulties� = � �:�, �:���Domain of control laws increaseswith time

min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem

Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��

Single-agent decision making

Page 13: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Design difficulties� = � �:�, �:���Domain of control laws increaseswith time

min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem

Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��

Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy

� �� = min�� �[� �, �+ ��� ���� | ��, �]

Single-agent decision making

Page 14: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Estimation Control

Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��

Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy

� �� = min�� �[� �, �+ ��� ���� | ��, �]

Single-agent decision making

Page 15: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Estimation Control

Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��

Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy

� �� = min�� �[� �, �+ ��� ���� | ��, �]

�� is policy independent

Single-agent decision making

Page 16: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Estimation Control

Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��

Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy

� �� = min�� �[� �, �+ ��� ���� | ��, �]

�� is policy independent Each step of DP is a parameteroptimization.

Single-agent decision making

Page 17: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

(One-way) separation betweenestimation and control

In single-agent decision making, estimation is separated from control.This separation is critical for decomposing the search of optmial

control policy into a sequence of parameter optimization problems.

Does this separation extendto multi-agent decision making?

Page 18: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Outline of this talk

What is the conceptual difficulty withmulti-agent decision making? How to resolve it?

Modeling decision making under uncertainty

Overview of single-agent decision making

Delayed sharing information structure:A “simple” model for multi-agent decision making

History of the problemOur approachMain results

Conclusion

Page 19: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertainty

Model of information

Model of objective

Delayed-sharing information structure

Page 20: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertaintyStochastic dynamics

��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )

Model of information

Model of objective

Delayed-sharing information structure

Page 21: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertaintyStochastic dynamics

��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )

Model of information

Each DM has perfect recall

Each DM observes �-step delayedinformation (observation and actions)of other DMs

Model of objective

Delayed-sharing information structure

Page 22: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertaintyStochastic dynamics

��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )

Model of information

Each DM has perfect recall

Each DM observes �-step delayedinformation (observation and actions)of other DMs

�� = �� ( ��:�, ��:�����:���, ��:��� )

Model of objective

Delayed-sharing information structure

Page 23: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Model of uncertaintyStochastic dynamics

��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )

Model of information

Each DM has perfect recall

Each DM observes �-step delayedinformation (observation and actions)of other DMs

�� = �� ( ��:�, ��:�����:���, ��:��� )

Model of objective

Cost at time � = � ( � , �:�� ).Objective: minimize

� [ �∑��� �� �, �:�� ]

Delayed-sharing information structure

Page 24: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Delayed-sharing information structure

Some Notation

�� = �� ( ��:�, ��:�����:���, ��:��� ) �� = �� ( ��:���, ��:�����:�, ��:��� )Thus, �� = �� ( �� , ��� )where

Common info �� = �:��:���, �:��:���Local info ��� = ������:�, ������:���

Page 25: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Delayed-sharing information structure

Some Notation

�� = �� ( ��:�, ��:�����:���, ��:��� ) �� = �� ( ��:���, ��:�����:�, ��:��� )Thus, �� = �� ( �� , ��� )where

Common info �� = �:��:���, �:��:���Local info ��� = ������:�, ������:���

Same design difficulties as single-agent case

Page 26: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Literature overview

(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).

Page 27: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Literature overview

(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).

(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .

Page 28: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Literature overview

(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).

(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .

(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.

Page 29: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Literature overview

(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).

(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .

(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.

NMT 2011 also obtain a recursive algorithm to find optimal control laws.At each step, we need to solve a functional optimization problem.

Page 30: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Original setup�� = �� ��, ���Structure of optimal control law

Page 31: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���

Structure of optimal control law

Page 32: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���

NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���Structure of optimal control law

Page 33: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���

NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���NMT11 Second result�� = �� ℙ ����� | �� , ��� ,��:������:���

Structure of optimal control law

Page 34: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���

NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���NMT11 Second result�� = �� ℙ ����� | �� , ��� ,��:������:���

Contrast dependence on policy for the different results.

Structure of optimal control law

Page 35: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Importance of the problem

Applications (of one step delay sharing)

Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991

Page 36: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Importance of the problem

Applications (of one step delay sharing)

Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991

Conceptual significance

Understanding the design of networked control systems

Bridge between centralized and decentralized systems

Insights for the design of general decentralized systems

Page 37: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Orthogonal search does

not work due to presence

of signaling. How does

controller 1 figure out how

agent 2 will interpret

his (agent 1's) actions?

Page 38: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Orthogonal search does

not work due to presence

of signaling. How does

controller 1 figure out how

agent 2 will interpret

his (agent 1's) actions?

Page 39: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Proof outline

1. Construct a coordinated system

2. Show that any policy of the coordinated system is implementable in theoriginal system and vice-versa. Hence, the two systems are equivalent.

3. Optimal design of the coordinated system is a single-agent multi-stagedecision problem. Find a solution for the coordinated system.

4. Translate this solution back to the original system.

Page 40: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 1: The coordinated system��, ��� ��, ����� ���� ��

Page 41: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 1: The coordinated system��, ��� ��, ����� ���� ��

Define partially evaluated control law: ��� = �� ��,

Page 42: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 1: The coordinated system��� �����

�� ����� , ���

��� �����

Define partially evaluated control law: ��� = �� ��,Coordinator prescribes ��� , ��� to the controllers as��� , ��� = �� ��, ���:���, ���:���

Page 43: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.

Page 44: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )

Page 45: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.

Page 46: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 2: Equivalence

For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.

At time 1, both controllers know ��. Choose�� ��, ��� = ��� �� ��� .At time 2, both controllers knows ��, ��� , and ��� . Choose�� ��, ��� = ��� ��, ��� , ��� ��� .

Page 47: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 3: Solve the coordinated system

By construction, the coordinated system has a single decision makerwith perfect recall.

Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history

Then, there is no loss of optimality in restricting attention to controllaws of the form:

control action = Fn ��

Page 48: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 3: Solve the coordinated system

By construction, the coordinated system has a single decision makerwith perfect recall.

Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history

Then, there is no loss of optimality in restricting attention to controllaws of the form:

control action = Fn ��What is the state (for I/O mapping) for the system.

Page 49: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

State for the coordinated system

��� �����

�� ����� , ���

��� �����

� �� , ���

Page 50: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

State for the coordinated system

��� �����

�� ����� , ���

��� �����

� �� , ���

State�, ��� , ���

Page 51: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Structure of optimal control law

Define �� = ℙ �, ��� , ��� | ��, ���:���, ���:���Then, there is no loss of optimality in restricting attention tocoordination laws of the form��� , `��� = �� ��

Page 52: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Structure of optimal control law

Define �� = ℙ �, ��� , ��� | ��, ���:���, ���:���Then, there is no loss of optimality in restricting attention tocoordination laws of the form��� , `��� = �� ��The following recursive equations provide an optimal coordination policy

� �� = min��� ,��� �[� �, � + ��� ���� | ��, ��� , ��� ]

Page 53: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Step 4: Translate the solution

For a system with delayed-sharing information structure, there is no loss ofoptimality in restricting attention to control laws of the form�� = �� ��, ���Optimal control laws can be obtained by the solution of the followingrecursive equations

� �� = min�� ��,⋅ ,�� ��,⋅ �[� �, � + ��� ���� | ��, �� ��, , �� ��, ]

Page 54: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Features of the solution

The space of realizations of �� = ℙ �, ��� , ��� | ��, ���:���, ���:���is time-invariant. Thus, the domain of the control laws �� ��, ��� istime-invariant.�� is not policy independent! Estimation is not separated from control.This is always the case when signaling is present.

In each step of the dynamic program, we choose the partially evaluatedcontrol laws �� ��, , �� ��, . Choosing partially evaluated functions(instead of values) allows us to write a dynamic program even in thepresence of signaling.

Page 55: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Outline of this talk

What is the conceptual difficulty withmulti-agent decision making? How to resolve it?

Modeling decision making under uncertainty

Overview of single-agent decision making

Delayed sharing information structure:A “simple” model for multi-agent decision making

History of the problemOur approachMain results

Conclusion

Page 56: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Summary

Simple methodology to resolve a 40 year old open question:

Find common information at each time

Look at the problem for the point of view of a coordinator thatobserves this common information and chooses partiallly evaluatedfunctions

Find an information state for the problem at the coordinatorℙ(state for input-output mapping | common information)( ℙ(past state | common information), past partial control laws )

This methodology is also applicable to systems with more generalinformation structures (Mahajan, Nayyar, Teneketzis, 2008).

Page 57: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding

Salient Features

The size of the information state is time-invariant

The methodology is also applicable to infinite horizon problems

Each step of DP is a functional optimization problem

Form of the DP is similar to that of POMDP

Can borrow from the POMDP literature for numerical approaches