Combining AI and Game Theory to Model Negotiation and...

Combining AI and Game Theory to Model Negotiation

and Cooperation Geoff Gordon Miroslav Dudíkggordon@cs.cmu.edu mdudik@cmu.edu

CMU Machine Learning Dep’t

MURI 14 Program Review — September 10, 2009 — Geoff Gordon

TheoryFormation

Identify Cultural FactorsCUNY, Georgetown, CMU

Computational ModelsCMU, USC

Virtual HumansUSC

ImplementationCMU

RESEARCHPRODUCTS

Surveys & InterviewsCUNY, CMU, U Mich, Georgetown

Cross-Cultural Interactions

U Pitt, CMU

Data AnalysisCUNY, Georgetown,

U Pitt, CMU

validation

Validated TheoriesModels

Modeling ToolsBriefing Materials

ScenariosTraining Simulations

Common task

Subgroup task

RESEARCHPRODUCTS

Modeling negotiation and cooperation

• Goal: build a model to

‣ predict behavior of others

‣ optimize our own behavior

‣ understand cultural differences

• …while negotiating and cooperating

What’s in a model?Own past behavior

Observations of other agentsObservations of nature

Initial store of private information

Future behavior of other agentsFuture behavior of naturePlans for own behavior

observations

predictions

For example

3 kinds of models

• Plain probabilistic

‣ P(predictions | observations)

• E.g.:

‣ roads example

‣ linear regression

‣ Bayes nets

3 kinds of models

• Decision theoretic

‣ P(predictions | observations, own plan)

‣ E(reward | predictions)

‣ choose plan to optimize rewards

• E.g., POMDPs, influence diagrams

• Adds optimization by self

3 kinds of models

• Decision theoretic

‣ P(predictions | observations, own plan)

‣ E(reward | predictions)

‣ choose plan to optimize rewards

• E.g., POMDPs, influence diagrams

• Adds optimization by self

model of my own goals

model of my available choices

no longer includes own future plan

Decision-theoretic model

Why decision theory?

• What if we add a new blockage to the road network?

3 kinds of models

• Game theoretic

‣ P(predictions | observations, all plans)

‣ E(rewardp | predictions)

‣ choose plans to optimize rewards

• E.g., MAIDs, MAML

• Adds optimization by others

3 kinds of models

• Game theoretic

‣ P(predictions | observations, all plans)

‣ E(rewardp | predictions)

‣ choose plans to optimize rewards

• E.g., MAIDs, MAML

• Adds optimization by others

everyone’s goals

everyone’s available choices

no longer includes any future plans

Why game theory?

• Imagine a patrol planning its route

• It encounters a blockage

• Should it take the most efficient route around it?

• Maybe not: might channel into an ambush

• Need to model goals of other players

Why not game theory?

• Biggest reason: computational cost!

‣ e.g., for characterizing equilibria, machine learning of strategies, even tracking beliefs

‣ Cf: POMDP models in upcoming talk

• In past, has been prohibitive

‣ can limit game-theoretic models of N&C to be very simple, or to use heuristic approximations in solution

Contributions

• New game-theoretic models of N&C

‣ in Multi-Agent Markov Logic (MAML)

• New simulation experiments

• New algorithms: enable bigger models

• All of above: in service of better prediction and optimization of N&C behavior

Contrast: classical game model

• Labor v. mgmt: 2*2*2 repeated Bayesian game‣ Management knows profit level (L/H)

‣ Mgmt. offers low or high wages (w/W)

‣ On w, union chooses whether to strike

‣ Repeat: mgmt offers, union responds, …

l., 19

Strike D,0 W,H-W

Work w,H-w W,H-W

Strike D,0 W,L-W

Work w,L-w W,L-W

H profitL profit

Building a model of N&C: motivating example

• Scenario: two-party two-issue negotiation

‣ merchants negotiating over a purchase

‣ issues: type of product (Carpets/Textiles); delivery date (Early/Late)

‣ don’t know each other’s preferences (direction or strength)

‣ or each other’s personality18

Motivating example

B: I’d rather buy carpets.

S: Carpets are expensive for me to get right now.

S: Could you accept a late delivery date?

B: No, I prefer an earlier one.

S: I could get you textiles for an earlier delivery?

B: OK.

[Buyer & seller conduct transaction for TE]

[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]

cheap talkcheap talk

propose (CL)reject; cheap talk

propose (TE)accept

transactionbelief update

Motivating example

B: OK.

cheap talkcheap talk

propose (CL)reject; cheap talk

propose (TE)accept

transactionbelief update

Motivating example

B: OK.

Payoffs19

Computational game-theoretic model

• Models simplified subset of interactions

• First, what a solution gives us (or doesn’t)

• Then, brief review of MAML (Multi-Agent Markov Logic), our representation language

• Then, model and results

‣ both would be difficult w/o MAML + algos

Solutions

• Solution = (recipe for behavior of agents, in which each optimizes own payoff) = equilibrium

• Tells us: how agents act/speak, how they interpret actions/utterances of others, how they react to actions/utterances of others

• May be many different solutions!

• In real world: arise through repeated interaction and learning

Limitations

• Expressiveness of agents’ language

‣ here: choice among 5 statements / turn

‣ cf: 30–40 for POMDP, nearly ∞ for English

• Representation of external environment

‣ here: very simple (4 transactions + disagree)

‣ more detail is necessary for future research into combined N&C

Limitations

• Current algorithm: centralized, no learning

‣ we are working on changing these

• The key to all the above: speed!

‣ limits length of game, amount of comm, size of environment, flexibility of structure

Z’’

graphical model for games

Z’’

Nature move

P1 (Green) move

Branching

CollectionP2 (Blue) move

P2 (Blue) reward

P1 (Green) reward

Observation

Z’’

Info flow: valid path (directed path through same-color nodes—

source may be different color)

Time flow: any consistent complete

ordering of DAG

Nature move

P1 (Green) move

Branching

CollectionP2 (Blue) move

P2 (Blue) reward

P1 (Green) reward

Observation

Negotiation in MAMLtypes (preferences and strengths, personality)

negotiation: cheap talk, propose, or accept previous proposal (each turn)

utility assignment

final outcome: SW, ST, DW, DT, X

last turn: only accept or reject

SW,ST,DW,DT agree1

Xagree2

speak’1

speak1

speak’2

speak2

finalutil1

SW,ST,DW,DT

adjust: #turns, #bits/turn

Negotiation in MAMLtypes (preferences and strengths, personality)

negotiation: cheap talk, propose, or accept previous proposal (each turn)

utility assignment

final outcome: SW, ST, DW, DT, X

last turn: only accept or reject

SW,ST,DW,DT agree1

Xagree2

speak’1

speak1

speak’2

speak2

finalutil1

SW,ST,DW,DT

adjust: #turns, #bits/turn

For this game:MAML: 5 parametersΓ = 6900EFG size = 166,000clique size = 3.3M

Types• 24 types:‣ 12 prefs on contract‣ selfish v.

cooperative

• Initially, each agent uncertain about other’s type

• Infers it over time from behavior

textilescarpets

Contract preferences

Simulation traceType2 (Seller): T+, L,

cooperative

textilescarpets

r’s b

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Seller’s belief about Buyer’s next action

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Speak1 = CL27

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Speak1 = CL28

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Speak2 = TL28

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Speak2 = TL28

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Speak1’ = CE28

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Legend:

cooperative

textilescarpets

r’s b

CE CL TE TL X

Legend:

Speak2’ = CE29

cooperative

textilescarpets

r’s b

CE CL TE TL X

Legend:

Speak2’ = CE

Runtime: negotiation• We generated 30 equilibria of 3 variants of the game

‣ in all cases, ε = 2% of initial regret

• 1 round of talk (1 turn each):

‣ Γ = 300, runtime = 2 min

• 1.5 rounds of talk (2+1 turns):

‣ Γ = 1600, runtime = 9 min

• 2 rounds of talk (2 turns each):

‣ Γ = 6900, runtime = 44 min

high •

low •

med •

Runtime: detail• Runtime splits into two pieces:

‣ precomputation (reported on previous slide—do this once, ahead of time)

‣ realtime computation (near instantaneous for our algorithm—do this during negotiation, to compute next move)

• Precomputation is analogous to a group of agents repeatedly interacting over time, to arrive at a convention for how to negotiate

‣ could take days to years for real agents31

Simulation stats

P1: buyerP2: seller

preferences:

social motives:

communication:low • med • high • P1 gain

Simulation stats

P1: buyerP2: seller

preferences:

social motives:

Simulation stats

P1: buyerP2: seller

feasible gains

preferences:

social motives:

Simulation stats

P1: buyerP2: seller

preferences:

social motives:

Simulation stats

P1: buyerP2: seller

preferences:

social motives:

Simulation stats

P1: buyerP2: seller

preferences:

social motives:

Simulation stats

P1: buyerP2: seller

preferences:

social motives:

Future work• Coming year:

‣ Work to add detail to models

‣ particularly to learn about cooperation

‣ Begin to compare predictions to measured human behavior

• Ongoing:

‣ Decentralization and learning

‣ Algorithmic improvements

‣ Incorporate results into realistic agents37

Contributions

• New game-theoretic models of N&C

‣ in Multi-Agent Markov Logic (MAML)

• New simulation experiments

• New algorithms: enable bigger models

• All of above: in service of better prediction and optimization of N&C behavior

Combining AI and Game Theory to Model Negotiation and...

Documents

Transcript of Combining AI and Game Theory to Model Negotiation and...

MURI Plans

Operational solar irradiances for MURI

MURI Progress Report - Zhang Lab | UC Berkeleyxlab.me.berkeley.edu/MURI/Kickoff/Jan25/UCLA Zhang MURI-Tele1.pdf · Microsoft PowerPoint - UCLA Zhang MURI-Tele1 Author: Administrator

MURI Orientation Agenda

Eco-MURI™ - Intro Sheet

Depliant MURI int+cop pdf

2009 MURI Topic #11: Chemical Energy Enhancement by ... · DoD MURI Fourth Year Review Meeting October 22 - 24, 2013 2009 MURI Topic #11: Chemical Energy Enhancement by Nonequilibrium

MURI SILENT PIPE - vahidgroup.com9,20,1,18,1MSP.pdf · Waste water system "MURI SILENT PIPE with acoustic pipe clamps "MURI SILENT PIPE DN 100" Flow rate [Vs] Installation sound level

MURI Project Proposal-EVMC

APL MURI Kickoff

The Book of Vodou Leah Gordon.PDF

N00014-01-1-0803 annual report 2004xlab.me.berkeley.edu/MURI/Kickoff/MURI annual progress... · 2012-07-11 · Annual MURI report for 2003-2004 – Scalable &Reconfigurable Metamaterials,

Muda,Mura & Muri

COMPANY PROFILE PT OKANE MURI MEDIA

SWSI Update Carnegie Mellon University Katia Sycara Carnegie Mellon University softagents.

MURI Hardware Resources

Schrimpf MURI

1 Intelligent Agents - Security Katia Sycara The Robotics Institute email: katia@cs.cmu.edu softagents.

$$ NSF, AFOSR MURI, DARPA

Sacred Values as cultural factors in collaboration and ...softagents/MURI14/files/2009/MURI-14-Ginges.pdf · – e.g., loyalty to leader, sacrifice for country, caring for others,