Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki,...
-
Upload
meghan-hodge -
Category
Documents
-
view
216 -
download
2
Transcript of Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki,...
![Page 1: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/1.jpg)
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping
Pradeep Varakantham Singapore Management University
Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe
![Page 2: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/2.jpg)
Motivating Domains
Disaster RescueSensor
Networks
Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making
![Page 3: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/3.jpg)
Meeting the challengesProblem:
Multiple agents coordinating to perform multiple tasks in presence of uncertainty
Sol: Represent as Distributed POMDPs and solveNEXP Complete for optimal solutionApproximate algorithm to dynamically exploit
structure in interactionsResult: Vast improvement in performance over
existing algorithms
![Page 4: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/4.jpg)
Outline
Illustrative Domain
Model
Approach: Exploit dynamic structure in interactions
Results
![Page 5: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/5.jpg)
Illustrative Domain Multiple types of
robots Uncertainty in
movements Reward
Saving victims Collisions Clearing debris
Maximize expected joint reward
![Page 6: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/6.jpg)
ModelDisPOMDPs with Coordination Locales, DPCL
Joint model: <S, A, Ω, P, R, O, Ag>Global state represents completion of tasksAgents independent except in coordination locales,
CLsTwo types of CLs:
Same time CL (Ex: Agents colliding with each other)
Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal)
Individual observability
![Page 7: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/7.jpg)
Solving DPCLs with TREMOR
Teams REshaping of MOdels for Rapid execution
Two steps:1. Branch and Bound search
MDP based heuristics
2. Task Assignment evaluation By computing policies for every agentPerform only joint policy computation at CLs
![Page 8: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/8.jpg)
1. Branch and Bound search
![Page 9: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/9.jpg)
2. Task Assignment EvaluationUntil convergence of policies or
maximum iterations:1)Solve individual POMDPs2)Identify potential coordination locales3)Based on type and value of
coordination :Shape P and R of relevant individual agents
Capture interactionsEncourage/Discourage interactions
4)Go to step 1
![Page 10: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/10.jpg)
Identifying potential CLsCL = <State, Action>Probability of CL occurring at a time step, T
Given starting beliefStandard belief update given policy
Policy over belief states
Probability of observing w, in belief state “b”
Updating “b”
![Page 11: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/11.jpg)
Type of CLSTCL, if there exists “s” and “a” for which
Transition/Reward function not decomposable, P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’)) OR R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))
FTCL, Completion of task (global state) by an agent at
t’ affects transitions/rewards of other agents at t
![Page 12: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/12.jpg)
Shaping Model (STCL)Shaping transition function
Shaping reward function
Joint transition probability when CL occursNew transition
probability for agent “i”
![Page 13: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/13.jpg)
ResultsBenchmark Algorithms
Independent POMDPsMemory Bounded Dynamic Programming
(MBDP)
CriterionDecision qualityRun-time
Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon
![Page 14: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/14.jpg)
State space
![Page 15: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/15.jpg)
Agents
![Page 16: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/16.jpg)
Coordination Locales
![Page 17: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/17.jpg)
Time Horizon
![Page 18: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/18.jpg)
Related workExisting Research
DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input
DEC-POMDPs JESP MBDP Exploiting independence in
transition/reward/observation.Model Shaping
Guestrin and Gordon, 2002
![Page 19: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/19.jpg)
ConclusionDPCL, a specialization of Distributed POMDPs
TREMOR exploits presence of few CLs in domains
TREMOR depends on single agent POMDP solvers
Results: TREMOR outperformed DisPOMDP algorithms,
except in tightly coupled small problems
![Page 20: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/20.jpg)
Questions?
![Page 21: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/21.jpg)
Same Time CL (STCL)There is an STCL, if
Transition function not decomposable, OR P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’))
Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(oi,ai,(sg’,si’))
Reward function not decomposable R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))
Ex: Two robots colliding in a narrow corridor
![Page 22: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9200/html5/thumbnails/22.jpg)
Future Time CL
Actions of one agent at “ t’ ” can affect transitions OR observations OR rewards of
other agents at “ t ” P((st
g,sti),at
i,(stg’,st
i’)|ajt’ ) ≠ P((st
g,sti),at
i,(stg’,st
i’)) , ¥ t’ < t
R((stg,st
i),ati,(st
g’,sti’)|aj
t’ ) ≠ R((stg,st
i),ati,(st
g’,sti’)) , ¥ t’
< t O(wt
i,ati,(st
g’,sti’)|aj
t’ ) ≠ O(wti,at
i,(stg’,st
i’)) , ¥ t’ < t
Ex: Clearing of debris assists rescue robots in getting to victims faster