Combining AI and Game Theory to Model Negotiation and...

Combining AI and Game Theory to Model Negotiation

and Cooperation Geoff Gordon Miroslav Dudí[email protected] [email protected]

CMU Machine Learning Dep’t

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

TheoryFormation

Identify Cultural FactorsCUNY, Georgetown, CMU

Computational ModelsCMU, USC

Virtual HumansUSC

ImplementationCMU

RESEARCHPRODUCTS

Surveys & InterviewsCUNY, CMU, U Mich, Georgetown

Cross-Cultural Interactions

U Pitt, CMU

Data AnalysisCUNY, Georgetown,

U Pitt, CMU

validation

validation

validation

Validated TheoriesModels

Modeling ToolsBriefing Materials

ScenariosTraining Simulations

Common task

Subgroup task

RESEARCHPRODUCTS


Modeling negotiation and cooperation

• Goal: build a model to

‣ predict behavior of others

‣ optimize our own behavior

‣ understand cultural differences

• …while negotiating and cooperating

3


What’s in a model?Own past behavior

Observations of other agentsObservations of nature

Initial store of private information

Model

Future behavior of other agentsFuture behavior of naturePlans for own behavior

observations

predictions

4


For example

5


For example

6


3 kinds of models

• Plain probabilistic

‣ P(predictions | observations)

• E.g.:

‣ roads example

‣ linear regression

‣ Bayes nets

7


3 kinds of models

• Decision theoretic

‣ P(predictions | observations, own plan)

‣ E(reward | predictions)

‣ choose plan to optimize rewards

• E.g., POMDPs, influence diagrams

• Adds optimization by self

8


3 kinds of models

• Decision theoretic

‣ P(predictions | observations, own plan)

‣ E(reward | predictions)

‣ choose plan to optimize rewards

• E.g., POMDPs, influence diagrams

• Adds optimization by self

model of my own goals

model of my available choices

no longer includes own future plan

8


Decision-theoretic model

9


Why decision theory?

• What if we add a new blockage to the road network?

10




11




12


3 kinds of models

• Game theoretic

‣ P(predictions | observations, all plans)

‣ E(rewardp | predictions)

‣ choose plans to optimize rewards

• E.g., MAIDs, MAML

• Adds optimization by others

13


3 kinds of models

• Game theoretic

‣ P(predictions | observations, all plans)

‣ E(rewardp | predictions)

‣ choose plans to optimize rewards

• E.g., MAIDs, MAML

• Adds optimization by others

everyone’s goals

everyone’s available choices

no longer includes any future plans

13


Why game theory?

• Imagine a patrol planning its route

• It encounters a blockage

• Should it take the most efficient route around it?

• Maybe not: might channel into an ambush

• Need to model goals of other players

14


Why not game theory?

• Biggest reason: computational cost!

‣ e.g., for characterizing equilibria, machine learning of strategies, even tracking beliefs

‣ Cf: POMDP models in upcoming talk

• In past, has been prohibitive

‣ can limit game-theoretic models of N&C to be very simple, or to use heuristic approximations in solution

15


Contributions

• New game-theoretic models of N&C

‣ in Multi-Agent Markov Logic (MAML)

• New simulation experiments

• New algorithms: enable bigger models

• All of above: in service of better prediction and optimization of N&C behavior

16


Contrast: classical game model

• Labor v. mgmt: 2*2*2 repeated Bayesian game‣ Management knows profit level (L/H)

‣ Mgmt. offers low or high wages (w/W)

‣ On w, union chooses whether to strike

‣ Repeat: mgmt offers, union responds, …

e.g.

, [Fu

denb

erg

et a

l., 19

83]

or [

Wils

on, 1

994]

w W

Strike D,0 W,H-W

Work w,H-w W,H-W

w W

Strike D,0 W,L-W

Work w,L-w W,L-W

17

H profitL profit


Building a model of N&C: motivating example

• Scenario: two-party two-issue negotiation

‣ merchants negotiating over a purchase

‣ issues: type of product (Carpets/Textiles); delivery date (Early/Late)

‣ don’t know each other’s preferences (direction or strength)

‣ or each other’s personality18


Motivating example

B: I’d rather buy carpets.

S: Carpets are expensive for me to get right now.

S: Could you accept a late delivery date?

B: No, I prefer an earlier one.

S: I could get you textiles for an earlier delivery?

B: OK.

[Buyer & seller conduct transaction for TE]

[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]

19


cheap talkcheap talk

propose (CL)reject; cheap talk

propose (TE)accept

transactionbelief update

Motivating example






B: OK.



19


cheap talkcheap talk

propose (CL)reject; cheap talk

propose (TE)accept

transactionbelief update

Motivating example






B: OK.



Payoffs19


Computational game-theoretic model

• Models simplified subset of interactions

• First, what a solution gives us (or doesn’t)

• Then, brief review of MAML (Multi-Agent Markov Logic), our representation language

• Then, model and results

‣ both would be difficult w/o MAML + algos

20


Solutions

• Solution = (recipe for behavior of agents, in which each optimizes own payoff) = equilibrium

• Tells us: how agents act/speak, how they interpret actions/utterances of others, how they react to actions/utterances of others

• May be many different solutions!

• In real world: arise through repeated interaction and learning

21


Limitations

• Expressiveness of agents’ language

‣ here: choice among 5 statements / turn

‣ cf: 30–40 for POMDP, nearly ∞ for English

• Representation of external environment

‣ here: very simple (4 transactions + disagree)

‣ more detail is necessary for future research into combined N&C

22


Limitations

• Current algorithm: centralized, no learning

‣ we are working on changing these

• The key to all the above: speed!

‣ limits length of game, amount of comm, size of environment, flexibility of structure

23


MAML

Y

Z’Z

X

R2

W

R1

T F

Z’’

24

graphical model for games


MAML

Y

Z’Z

X

R2

W

R1

T F

Z’’

24


Nature move

P1 (Green) move

Branching

CollectionP2 (Blue) move

P2 (Blue) reward

P1 (Green) reward

Observation


MAML

Y

Z’Z

X

R2

W

R1

T F

Z’’

Info flow: valid path (directed path through same-color nodes—

source may be different color)

Time flow: any consistent complete

ordering of DAG

24


Nature move

P1 (Green) move

Branching

CollectionP2 (Blue) move

P2 (Blue) reward

P1 (Green) reward

Observation


Negotiation in MAMLtypes (preferences and strengths, personality)

negotiation: cheap talk, propose, or accept previous proposal (each turn)

utility assignment

final outcome: SW, ST, DW, DT, X

last turn: only accept or reject

SW,ST,DW,DT agree1

final

final

Xagree2

type1

speak’1

speak1

type2

speak’2

speak2

finalutil1

final

util2

SW,ST,DW,DT

X

adjust: #turns, #bits/turn

25


Negotiation in MAMLtypes (preferences and strengths, personality)

negotiation: cheap talk, propose, or accept previous proposal (each turn)

utility assignment

final outcome: SW, ST, DW, DT, X

last turn: only accept or reject

SW,ST,DW,DT agree1

final

final

Xagree2

type1

speak’1

speak1

type2

speak’2

speak2

finalutil1

final

util2

SW,ST,DW,DT

X

adjust: #turns, #bits/turn

25

For this game:MAML: 5 parametersΓ = 6900EFG size = 166,000clique size = 3.3M


Types• 24 types:‣ 12 prefs on contract‣ selfish v.

cooperative

• Initially, each agent uncertain about other’s type

• Infers it over time from behavior

26

textilescarpets

earl

yla

te

Contract preferences


Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

27

sc

Legend:



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X

Seller’s belief about Buyer’s next action

27

sc

Legend:



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


Speak1 = CL27

sc

Legend:



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


Speak1 = CL28

sc

Legend:



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


Speak2 = TL28

sc

Legend:



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


Speak1’ = CE28

sc

Legend:



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


sc

Legend:

29



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


sc

Legend:

Speak2’ = CE29



cooperative

textilescarpets

earl

yla

te

Selle

r’s b

elie

f abo

ut T

ype1

(Bu

yer)

CE CL TE TL X


sc

Legend:

Speak2’ = CE

X

29


Runtime: negotiation• We generated 30 equilibria of 3 variants of the game

‣ in all cases, ε = 2% of initial regret

• 1 round of talk (1 turn each):

‣ Γ = 300, runtime = 2 min

• 1.5 rounds of talk (2+1 turns):

‣ Γ = 1600, runtime = 9 min

• 2 rounds of talk (2 turns each):

‣ Γ = 6900, runtime = 44 min

30

high •

low •

med •


Runtime: detail• Runtime splits into two pieces:

‣ precomputation (reported on previous slide—do this once, ahead of time)

‣ realtime computation (near instantaneous for our algorithm—do this during negotiation, to compute next move)

• Precomputation is analogous to a group of agents repeatedly interacting over time, to arrive at a convention for how to negotiate

‣ could take days to years for real agents31


BATNA

BATNA

Simulation stats

32

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

P2 g

ain

P1

P2

s

s


BATNA

BATNA

Simulation stats

32

P1: buyerP2: seller

feasible gains

preferences:

social motives:


P2 g

ain

P1

P2

s

s


BATNA

BATNA

Simulation stats

33

P1: buyerP2: seller

preferences:

social motives:


P2 g

ain

P1

P2

s

s


BATNA

BATNA

Simulation stats

34

P1: buyerP2: seller

preferences:

social motives:


P2 g

ain

P1

P2

s

s


BATNA

BATNA

Simulation stats

35

P1: buyerP2: seller

preferences:

social motives:


P2 g

ain

P1

P2

c

c


BATNA

BATNA

Simulation stats

36

P1: buyerP2: seller

preferences:

social motives:


P2 g

ain

P1

P2

s

c


Future work• Coming year:

‣ Work to add detail to models

‣ particularly to learn about cooperation

‣ Begin to compare predictions to measured human behavior

• Ongoing:

‣ Decentralization and learning

‣ Algorithmic improvements

‣ Incorporate results into realistic agents37


Contributions

• New game-theoretic models of N&C

‣ in Multi-Agent Markov Logic (MAML)

• New simulation experiments

• New algorithms: enable bigger models

• All of above: in service of better prediction and optimization of N&C behavior

38

Combining AI and Game Theory to Model Negotiation and...

Documents

Transcript of Combining AI and Game Theory to Model Negotiation and...