Combining AI and Game Theory to Model Negotiation and...

Post on 03-Jul-2020

4 views 0 download

Transcript of Combining AI and Game Theory to Model Negotiation and...

Combining AI and Game Theory to Model Negotiation

and Cooperation Geoff Gordon Miroslav Dudíkggordon@cs.cmu.edu mdudik@cmu.edu

CMU Machine Learning Dep’t

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

TheoryFormation

Identify Cultural FactorsCUNY, Georgetown, CMU

Computational ModelsCMU, USC

Virtual HumansUSC

ImplementationCMU

RESEARCHPRODUCTS

Surveys & InterviewsCUNY, CMU, U Mich, Georgetown

Cross-Cultural Interactions

U Pitt, CMU

Data AnalysisCUNY, Georgetown,

U Pitt, CMU

validation

validation

validation

Validated TheoriesModels

Modeling ToolsBriefing Materials

ScenariosTraining Simulations

Common task

Subgroup task

RESEARCHPRODUCTS

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Modeling negotiation and cooperation

• Goal: build a model to

‣ predict behavior of others

‣ optimize our own behavior

‣ understand cultural differences

• …while negotiating and cooperating

3

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

What’s in a model?Own past behavior

Observations of other agentsObservations of nature

Initial store of private information

Model

Future behavior of other agentsFuture behavior of naturePlans for own behavior

observations

predictions

4

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

For example

5

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

For example

6

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

3 kinds of models

• Plain probabilistic

‣ P(predictions | observations)

• E.g.:

‣ roads example

‣ linear regression

‣ Bayes nets

7

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

3 kinds of models

• Decision theoretic

‣ P(predictions | observations, own plan)

‣ E(reward | predictions)

‣ choose plan to optimize rewards

• E.g., POMDPs, influence diagrams

• Adds optimization by self

8

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

3 kinds of models

• Decision theoretic

‣ P(predictions | observations, own plan)

‣ E(reward | predictions)

‣ choose plan to optimize rewards

• E.g., POMDPs, influence diagrams

• Adds optimization by self

model of my own goals

model of my available choices

no longer includes own future plan

8

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Decision-theoretic model

9

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Why decision theory?

• What if we add a new blockage to the road network?

10

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Why decision theory?

• What if we add a new blockage to the road network?

11

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Why decision theory?

• What if we add a new blockage to the road network?

12

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

3 kinds of models

• Game theoretic

‣ P(predictions | observations, all plans)

‣ E(rewardp | predictions)

‣ choose plans to optimize rewards

• E.g., MAIDs, MAML

• Adds optimization by others

13

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

3 kinds of models

• Game theoretic

‣ P(predictions | observations, all plans)

‣ E(rewardp | predictions)

‣ choose plans to optimize rewards

• E.g., MAIDs, MAML

• Adds optimization by others

everyone’s goals

everyone’s available choices

no longer includes any future plans

13

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Why game theory?

• Imagine a patrol planning its route

• It encounters a blockage

• Should it take the most efficient route around it?

• Maybe not: might channel into an ambush

• Need to model goals of other players

14

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Why not game theory?

• Biggest reason: computational cost!

‣ e.g., for characterizing equilibria, machine learning of strategies, even tracking beliefs

‣ Cf: POMDP models in upcoming talk

• In past, has been prohibitive

‣ can limit game-theoretic models of N&C to be very simple, or to use heuristic approximations in solution

15

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Contributions

• New game-theoretic models of N&C

‣ in Multi-Agent Markov Logic (MAML)

• New simulation experiments

• New algorithms: enable bigger models

• All of above: in service of better prediction and optimization of N&C behavior

16

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Contrast: classical game model

• Labor v. mgmt: 2*2*2 repeated Bayesian game‣ Management knows profit level (L/H)

‣ Mgmt. offers low or high wages (w/W)

‣ On w, union chooses whether to strike

‣ Repeat: mgmt offers, union responds, …

e.g.

, [Fu

denb

erg

et a

l., 19

83]

or [

Wils

on, 1

994]

w W

Strike D,0 W,H-W

Work w,H-w W,H-W

w W

Strike D,0 W,L-W

Work w,L-w W,L-W

17

H profitL profit

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Building a model of N&C: motivating example

• Scenario: two-party two-issue negotiation

‣ merchants negotiating over a purchase

‣ issues: type of product (Carpets/Textiles); delivery date (Early/Late)

‣ don’t know each other’s preferences (direction or strength)

‣ or each other’s personality18

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Motivating example

B: I’d rather buy carpets.

S: Carpets are expensive for me to get right now.

S: Could you accept a late delivery date?

B: No, I prefer an earlier one.

S: I could get you textiles for an earlier delivery?

B: OK.

[Buyer & seller conduct transaction for TE]

[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]

19

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

cheap talkcheap talk

propose (CL)reject; cheap talk

propose (TE)accept

transactionbelief update

Motivating example

B: I’d rather buy carpets.

S: Carpets are expensive for me to get right now.

S: Could you accept a late delivery date?

B: No, I prefer an earlier one.

S: I could get you textiles for an earlier delivery?

B: OK.

[Buyer & seller conduct transaction for TE]

[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]

19

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

cheap talkcheap talk

propose (CL)reject; cheap talk

propose (TE)accept

transactionbelief update

Motivating example

B: I’d rather buy carpets.

S: Carpets are expensive for me to get right now.

S: Could you accept a late delivery date?

B: No, I prefer an earlier one.

S: I could get you textiles for an earlier delivery?

B: OK.

[Buyer & seller conduct transaction for TE]

[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]

Payoffs19

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Computational game-theoretic model

• Models simplified subset of interactions

• First, what a solution gives us (or doesn’t)

• Then, brief review of MAML (Multi-Agent Markov Logic), our representation language

• Then, model and results

‣ both would be difficult w/o MAML + algos

20

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Solutions

• Solution = (recipe for behavior of agents, in which each optimizes own payoff) = equilibrium

• Tells us: how agents act/speak, how they interpret actions/utterances of others, how they react to actions/utterances of others

• May be many different solutions!

• In real world: arise through repeated interaction and learning

21

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Limitations

• Expressiveness of agents’ language

‣ here: choice among 5 statements / turn

‣ cf: 30–40 for POMDP, nearly ∞ for English

• Representation of external environment

‣ here: very simple (4 transactions + disagree)

‣ more detail is necessary for future research into combined N&C

22

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Limitations

• Current algorithm: centralized, no learning

‣ we are working on changing these

• The key to all the above: speed!

‣ limits length of game, amount of comm, size of environment, flexibility of structure

23

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

MAML

Y

Z’Z

X

R2

W

R1

T F

Z’’

24

graphical model for games

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

MAML

Y

Z’Z

X

R2

W

R1

T F

Z’’

24

graphical model for games

Nature move

P1 (Green) move

Branching

CollectionP2 (Blue) move

P2 (Blue) reward

P1 (Green) reward

Observation

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

MAML

Y

Z’Z

X

R2

W

R1

T F

Z’’

Info flow: valid path (directed path through same-color nodes—

source may be different color)

Time flow: any consistent complete

ordering of DAG

24

graphical model for games

Nature move

P1 (Green) move

Branching

CollectionP2 (Blue) move

P2 (Blue) reward

P1 (Green) reward

Observation

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Negotiation in MAMLtypes (preferences and strengths, personality)

negotiation: cheap talk, propose, or accept previous proposal (each turn)

utility assignment

final outcome: SW, ST, DW, DT, X

last turn: only accept or reject

SW,ST,DW,DT agree1

final

final

Xagree2

type1

speak’1

speak1

type2

speak’2

speak2

finalutil1

final

util2

SW,ST,DW,DT

X

adjust: #turns, #bits/turn

25

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Negotiation in MAMLtypes (preferences and strengths, personality)

negotiation: cheap talk, propose, or accept previous proposal (each turn)

utility assignment

final outcome: SW, ST, DW, DT, X

last turn: only accept or reject

SW,ST,DW,DT agree1

final

final

Xagree2

type1

speak’1

speak1

type2

speak’2

speak2

finalutil1

final

util2

SW,ST,DW,DT

X

adjust: #turns, #bits/turn

25

For this game:MAML: 5 parametersΓ = 6900EFG size = 166,000clique size = 3.3M

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Types• 24 types:‣ 12 prefs on contract‣ selfish v.

cooperative

• Initially, each agent uncertain about other’s type

• Infers it over time from behavior

26

textilescarpets

earl

yla

te

Contract preferences

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

27

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

27

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

Speak1 = CL27

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

Speak1 = CL28

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

Speak2 = TL28

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

Speak2 = TL28

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

Speak1’ = CE28

sc

Legend:

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

sc

Legend:

29

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

sc

Legend:

Speak2’ = CE29

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

sc

Legend:

Speak2’ = CE

X

29

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Runtime: negotiation• We generated 30 equilibria of 3 variants of the game

‣ in all cases, ε = 2% of initial regret

• 1 round of talk (1 turn each):

‣ Γ = 300, runtime = 2 min

• 1.5 rounds of talk (2+1 turns):

‣ Γ = 1600, runtime = 9 min

• 2 rounds of talk (2 turns each):

‣ Γ = 6900, runtime = 44 min

30

high •

low •

med •

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Runtime: detail• Runtime splits into two pieces:

‣ precomputation (reported on previous slide—do this once, ahead of time)

‣ realtime computation (near instantaneous for our algorithm—do this during negotiation, to compute next move)

• Precomputation is analogous to a group of agents repeatedly interacting over time, to arrive at a convention for how to negotiate

‣ could take days to years for real agents31

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

32

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

s

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

32

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

s

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

32

P1: buyerP2: seller

feasible gains

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

s

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

33

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

s

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

34

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

s

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

35

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

c

c

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

BATNA

BATNA

Simulation stats

36

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

c

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Future work• Coming year:

‣ Work to add detail to models

‣ particularly to learn about cooperation

‣ Begin to compare predictions to measured human behavior

• Ongoing:

‣ Decentralization and learning

‣ Algorithmic improvements

‣ Incorporate results into realistic agents37

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

Contributions

• New game-theoretic models of N&C

‣ in Multi-Agent Markov Logic (MAML)

• New simulation experiments

• New algorithms: enable bigger models

• All of above: in service of better prediction and optimization of N&C behavior

38