Software Multiagent Systems: Lecture 13

42
Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California [email protected]

description

Software Multiagent Systems: Lecture 13. Milind Tambe University of Southern California [email protected]. Teamwork. When agents act together. Understanding Teamwork. Ordinary traffic Driving in a convoy Two friends A & B together drive in a convoy B is secretly following A - PowerPoint PPT Presentation

Transcript of Software Multiagent Systems: Lecture 13

Page 1: Software Multiagent Systems:  Lecture 13

Software Multiagent Systems: Lecture 13

Milind TambeUniversity of Southern [email protected]

Page 2: Software Multiagent Systems:  Lecture 13

Teamwork

When agents act together

Page 3: Software Multiagent Systems:  Lecture 13

Understanding Teamwork

Ordinary traffic

Driving in a convoy

Two friends A & B together drive in a convoy B is secretly following A

Pass play in Soccer

Contracting with a software company

Orchestra

Page 4: Software Multiagent Systems:  Lecture 13

Understanding Teamwork

• Not just a union of simultaneous coordinated actions• Different from contracting

Together

Joint Goal

Co-labor Collaborate

Page 5: Software Multiagent Systems:  Lecture 13

Why Teamwork?Why not: Master-Slave? Contracts?

Page 6: Software Multiagent Systems:  Lecture 13

Why Teams

Robust organizations

Responsibility to substitute

Mutual assistance

Information communicated to peers

Still capable of structure (not necessarily flat)

Subteams, subsubteams

Variations in capabilities and limitations

Page 7: Software Multiagent Systems:  Lecture 13

Approach

Theory

Practical teamwork architectures

Page 8: Software Multiagent Systems:  Lecture 13

Taking a step back…

Page 9: Software Multiagent Systems:  Lecture 13

Key Approaches in Multiagent Systems

Market mechanismsAuctions

Distributed Constraint Optimization (DCOP)

x1

x2

x3 x4

Belief-Desire-Intention (BDI) Logics and Psychology

(JPG p (MB p) ۸ (MG p) ۸ (Until [(MB p) ۷ (MB p)] (WMG p))

Distributed POMDP

HybridDCOP/ POMDP/ AUCTIONS/ BDI

• Essential in large-scale multiagent teams

• Synergistic interactions

Page 10: Software Multiagent Systems:  Lecture 13

Key Approaches for Multiagent Teams

Markets

BDI

Dis POMDPs

Local interactions

Uncertainty Localutility

Human usability& plan structure

DCOP

Markets

BDI

Dis POMDPs

Local interactions

Uncertainty Localutility

Human usability& plan structure

DCOP

BDI-POMDPHybrid

Page 11: Software Multiagent Systems:  Lecture 13

Distributed POMDPs

Three papers on the web pages:What to read:

Ignore all the proofsIgnore complexity resultsJAIR article: the model and the results at the endUnderstand fundamental principles

Page 12: Software Multiagent Systems:  Lecture 13

Domain: Teamwork for Disaster Response

Page 13: Software Multiagent Systems:  Lecture 13

Multiagent Team Decision Problem (MTDP)

MTDP: < S, A, P, R>

S: s1, s2, s3…

Single global world state, one per epoch

A: domain-level actions; A = {A1, A2, A3,…An}

Ai is a set of actions for each agent i

Joint action

Page 14: Software Multiagent Systems:  Lecture 13

MTDP

P: Transition function:

P(s’ | s, a1, a2, …an)

RA: Reward

R(s, a1, a2,…an)

One common reward; not separate

Central to teamwork

Page 15: Software Multiagent Systems:  Lecture 13

MTDP (cont’d)

: observations

Each agent: different finite sets of possible observations

O: probability of observation

O(destination-state, joint-action, joint-observation)

P(o1,o2..om | a1, a2,…am, s’)

Page 16: Software Multiagent Systems:  Lecture 13

Simple Scenario

Cost of action: -0.2

Must fight fires together

Observe own location and fire status

+20 +40

Page 17: Software Multiagent Systems:  Lecture 13
Page 18: Software Multiagent Systems:  Lecture 13

MTDP Policy

he problem: Find optimal JOINT policies

One policy for each agent

i: Action policy

Maps belief state into domain actions

(Bi A) for each agent

Belief state: sequence of observations

Page 19: Software Multiagent Systems:  Lecture 13

MTDP Domain Types

Collectively partially observable: general case, no assumptions

Collectively observable: Team (as a whole) observes state

For all joint observations, there is a state s, such that, for all other states s’ not equal to s, Pr (o1,o2…on | s’) = 0

Pr (o1, o2, …on | s ) = ?

Pr (s | o1,o2..on) = ?

Individually observable: each agent observes the state

For all individual observations, there is a state s, such that for all other states s’ not equal to s, Pr (oi | s’) = 0

Page 20: Software Multiagent Systems:  Lecture 13

From MTDP to COM-MTDP

Two separate actions: communication vs domain actions

Two separate reward types:

Communication rewards and domain rewards

Total reward: sum two rewards

Explicit treatment of communication

Analysis

Page 21: Software Multiagent Systems:  Lecture 13

Communicative MTDPs(COM-MTDPs)

: communication capabilities, possible “speech acts”

e.g., “I am moving to fire1.”

R: communication cost (over messages)

e.g., saying, “I am moving to fire1,” has a cost

RWhy ever communicate?

Page 22: Software Multiagent Systems:  Lecture 13

Two Stage Decision Process

Agent

World

Observes

Actions

SE1 P1b1

P2SE2

b2

Communicationsto and from

• P1: Communicationpolicy

• P2: Action policy

•Two state estimators

• Two beliefState updates

Page 23: Software Multiagent Systems:  Lecture 13

COM-MTDP Continued

Belief state (each Bi history of observations, Communication)

Two stage belief update

Stage 1: Pre-communication belief state for agent i (updates just from observations)

i0i

1i t-1t-1i

t

Stage 2: Post-communication belief state for i (updates from observations and communication)

i0i

1i t-1t-1i

tt

Cannot create probability distribution over states

Page 24: Software Multiagent Systems:  Lecture 13

COM-MTDP Continued

he problem: Find optimal JOINT policies

One policy for each agent

: Communication policy

Maps pre-communication belief state into message

(Bi for each agent

A: Action policy

Maps post-communication belief state into domain actions

(Bi A) for each agent

Page 25: Software Multiagent Systems:  Lecture 13

More Domain Types

General Communication: no assumptions on R

Free communication: R(s,) = 0

No communication: R(s,) is negatively infinite

Page 26: Software Multiagent Systems:  Lecture 13

Teamwork Complexity Results

Individual

observability

Collective

observability

Collective

Partial obser.

No

communication

P-complete NEXP

complete

NEXP

complete

General

communication

P-complete NEXP

complete

NEXP

complete

Full

communication

P-complete P-complete PSPACE

complete

Page 27: Software Multiagent Systems:  Lecture 13

Classifying Different Models

Individual

observability

Collective

observability

Collective

Partial obser.

No

communication

MMDP DEC-POMDP

POIPSG

General

communication

XUAN-LESSER

COM-MTDP

Full

communication

Page 28: Software Multiagent Systems:  Lecture 13

True or False

If agents communicated all their observations at each step then the distributed POMDP would be essentially a single agent POMDP

In distributed POMDPs, each agent plans its own policy

Solving Distributed POMDPs with two agents is of same complexity

as solving two separate individual POMDPs

Page 29: Software Multiagent Systems:  Lecture 13

Algorithms

Page 30: Software Multiagent Systems:  Lecture 13

NEXP-complete

No known efficient algorithms

Brute force search

1. Generate space of possible joint policies

2. For each policy in policy space

3. Evaluate over finite horizon T

Complexity:

No. of policies Cost of evaluation

Page 31: Software Multiagent Systems:  Lecture 13

Locally optimal search

Joint equilibrium based search for policiesJESP

Page 32: Software Multiagent Systems:  Lecture 13

Nash Equilibrium in Team Games

Nash equilibrium vs Global optimal reward for the team

3,6 7,1

5,1 8,2

6,0 6,2

x

y

z

u v

A

B

9 8

6 10

6 8

x

y

z

u v

A

B

Page 33: Software Multiagent Systems:  Lecture 13

JESP: Locally Optimal Joint Policy

9 5 8

6 7 10

6 3 8

x

y

z

u v

A

Bw

• Iterate keeping one agent’s policy fixed• More complex policies the same way

Page 34: Software Multiagent Systems:  Lecture 13

Joint Equilibrium-based Search

Description of algorithm:

1. Repeat until convergence

2. For each agent i

3. Fix policy of all agents apart from i

4. Find policy for i that maximizes joint reward

Exhaustive-JESP:

brute force search in policy space of agent I

Expensive

Page 35: Software Multiagent Systems:  Lecture 13

JESP: Joint Equilibrium Search (Nair et al, IJCAI 03)

Repeat until convergence to local equilibrium, for each agent K:

Fix policy for all except agent KFind optimal response policy for agent K

Optimal response policy for K, given fixed policies for others in MTDP:

Transformed to a single-agent POMDP problem:

“Extended” state defined as not as

Define new transition function

Define new observation function

Define multiagent belief state

Dynamic programming over belief states

Fast computation of optimal response

Page 36: Software Multiagent Systems:  Lecture 13

Extended State, Belief StateSample progression of beliefs: HL and HR are observations

a2: Listen

Page 37: Software Multiagent Systems:  Lecture 13

Run-time Results

Method 2 3 4 5 6 7

Exhaustive-JESP 10 317800 - - - -

DP-JESP 0 0 20 110 1360 30030

Page 38: Software Multiagent Systems:  Lecture 13

Is JESP guaranteed to find the global optimal?

Random restarts

9 5 8

6 7 10

6 3 8

Page 39: Software Multiagent Systems:  Lecture 13

Not All Agents are Equal

Scaling up Distributed POMDPs for Agent Networks

Page 40: Software Multiagent Systems:  Lecture 13

Runtime

Page 41: Software Multiagent Systems:  Lecture 13

POMDP vs. distributed POMDP

Distributed POMDPs more complex

Joint transition and observation functions

Better policy

Free communication = POMDP

Less dependency = lower complexity

Page 42: Software Multiagent Systems:  Lecture 13

BDI vs. distributed POMDP

BDI teamwork Distributed POMDP teamwork

Explicit joint goal Explicit joint reward

Plan/organization hierarchies Unstructured plans/teams

Explicit commitments Implicit commitments

No costs / uncertainties Costs & uncertainties included