1 University of Southern California Between Collaboration and Competition: An Initial Formalization...

19
1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind Tambe University of Southern California Spiros Kapetanakis University of York,UK Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park July 2003

Transcript of 1 University of Southern California Between Collaboration and Competition: An Initial Formalization...

Page 1: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

1University of Southern California

Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs

Praveen Paruchuri, Milind Tambe

University of Southern California

Spiros Kapetanakis

University of York,UK

Sarit Kraus

Bar-Ilan University,Israel

University of Maryland, College Park

July 2003

Page 2: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

2University of Southern California

Motivation

Many domains present where agents act in team but need to maintain some self interest.

Electric Elves – Agents take decisions for users but act as a team like arranging a meeting etc.

SDR – Software for Distributed Robotics where 100+ robots must locate and protect objects. Robots must ensure their survival like refilling batteries

Page 3: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

3University of Southern California

The Problem

Framework for teams of agents maintaining private goals for stochastic, complex and dynamic environments.

Agents need to maximize joint objectives and yet honor private preferences. Private versus Team Interest – Might be conflicting

Build framework based on Distributed POMDPs for policy generation

Analyze complexity of policy generation

Page 4: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

4University of Southern California

Previous work

Distributed POMDPs like COM-MTDP Have single joint reward Optimal policy maximizes joint value (Ex1)

– Solution not stable

Stochastic Games Have individual rewards. Policy finds equilibrium solution. Stability, key concept (Ex2)

– Solution not favorable to both individually and as team

Ex1: Ex2:4(6,-2) 2(1,1)

0(0,0) 3(-2,5)

5,5(10) -1,6(5)

6,-1(5) 0,0(0)

Page 5: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

5University of Southern California

Motivation : Simple examples

One shot game without stochastic elements

Ex1: Two people need to meet, one prefers 4pm, other 5pm

When should they meet ??

Need to compromise some extent, but not totally.

No meeting is bad for both. Agree on mutually acceptable solution.

Ex2: Team of robots work on task

Limited battery

Last n% battery for re-fuelling itself. Otherwise die.

Need to achieve team goal while they don’t die.

Page 6: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

6University of Southern California

MTDP: A Distributed POMDP Model

An MTDP is a tuple <S,A(α),P,Ω(α),O(α),B(α),R> where, S is a set of world states. A(α) is a set of allowed team actions. A(α) = π ( A(i) ) , A(i) is

a set of domain level actions for each agent i. P is a probability distribution that governs the effect of domain

level actions.( P( s,a,s1) = Pr ( s1/s,a) ) Ω(α) is the joint set of observations. B(α) is the combination of all the agent’s set of possible belief

states. R is the common reward for the team. R:S * A(α) R

Page 7: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

7University of Southern California

E-MTDP: Formally Defined

An E-MTDP is a tuple <S,A(α),P,Ω(α),O(α),B(α),R> where S,A(α),P,Ω(α),O(α),B(α) are as defined in MTDP.

R = < R1, R2,…….., Rn, Rα > where, R1,R2,..,Rn are rewards of agents 1,2,..,n Rα is the joint reward for the n agents where Rα = γ*R1 +

δ*R2 +………

Both individual and joint rewards can be expressed.

Page 8: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

8University of Southern California

E-MTDP Policy

Policy maps belief states to actions - Π : Bi Ai

Centralized Policy generator.

Policy π is such that:

V1(π) > T1 , V2(π) > T2

For π’ <> π, where V1(π’) > T1 and V2(π’)>T2,

V(π) > V(π’)

where, T1 and T2 are thresholds for agents 1 and 2.

V1 is value from policy for agent1 and V2 for agent2.

V is overall value of policy without splitting.

Page 9: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

9University of Southern California

Novelties of E-MTDP

Maintains individual rewards for each agent and a joint reward for the team.

Solution concept is novel because optimal policy both Maximizes joint reward

and Ensures certain minimum expected value for individual team

members.

Page 10: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

10University of Southern California

Experimental Validation

Goal: Show utility of EMTDP

A real system called Electric(E)-Elves based on MDPs. Based on maximizing single joint reward.

Expressed as EMTDP and helped improve performance.

E-Elves- A published real world multi agent system

Used at USC/ISI for 6 months.

Agents called proxies - Reschedule meetings, Decide to present talks on behalf of user, Order meals, Track user location etc etc.

Page 11: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

11University of Southern California

Electric Elves

Focus on task of rescheduling meetings.

Used single agent MDP to model an agent

Actions like delaying/canceling meeting, asking user etc.

Asking user for his input is critical.

Time constraints might prevent agent asking user for input.

Policy generator uses the notion of team reward for deciding actions.

No notion of individual reward.

Page 12: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

12University of Southern California

Perceived Problem and Improvement

Original formulation had R(α) and R(user) terms[1].

However R(α) + R(user) is maximized in policy generation.

As R(α) increased with R(user) constant, agent stopped asking user.

As R(α) increases, cost(Uncertainty in getting response from user) > δ ( Increase in quality of decision due to user’s feedback ).

Hence, decision taken without asking.

User might want to have different decision.

User can set his importance to meeting using R(user) If user important, agent needs to make a correct decision

regarding user. User’s opinion becomes important affecting # of asks.

Page 13: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

13University of Southern California

Original Elves Result

x-axis: Value of meeting without the user.

y-axis: # of times the agent asks the user.

Number of asks decrease as R(alpha) increases.

Agents sometime cancel important meeting without asking user ( Very high cost )[1].

# of asks as function of joint activity weight

0 0 0 0 0

46

54

48 48

36

24

18

6

0 0 00

10

20

30

40

50

60

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5

Joint Activity Weight

# o

f asks

Page 14: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

14University of Southern California

E-MTDP based E-Elves

Solving using E-MTDP

– Let there be two agents Priv1 = R(user), agent 1’s private reward Priv2 = R(alpha), agent 2’s private reward

Set priv1 >= Threshold.

# of asks now dependent on Threshold.

User importance(priv1) set high. Agent asks the user for his input before deciding unlike earlier.

Setting threshold is important to obtain the required behavior.

Page 15: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

15University of Southern California

E-MTDP result

From graph above, giving flexibility to the user to set his threshold can result in agent asking him more times.

User opinion taken into consideration.

“Flexibility” is the key word. Users like control over their agents.

# of asks as function of joint activity weight

36

24

18

12 12

6 6 6 6 6

0

5

10

15

20

25

30

35

40

-1 0 1 2 3 4 5 6 7 8

Joint Activity Weight

# o

f as

ks

Page 16: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

16University of Southern California

Conclusions

A framework for teams of self-interested agents.

E-MTDP presented as a solution concept.

E-MTDP applied to E-Elves

Improvement in performance of system measured in terms of number of asks.

Fine-tuning of agents, according to user needs, now possible.

Page 17: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

17University of Southern California

Future Work

Fine tune the existing E-MTDP framework.

Need to analyze complexity of E-MTDP policies.

Analyze stability of the E-MTDP solutions.

References1. Towards Adjustable Autonomy for the Real World

Paul Scerri, David V.Pynadath and Milind Tambe, JAIR-02

THANK YOUAny Questions ??

Page 18: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

18University of Southern California

Page 19: 1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

19University of Southern California

Stability of solution

Designed a multistage game for E-MTDP policy to be stable.