Intelligent Agents - Lecture 3 Katia Sycara The Robotics Institute [email protected] softagents.
Combining AI and Game Theory to Model Negotiation and...
Transcript of Combining AI and Game Theory to Model Negotiation and...
Combining AI and Game Theory to Model Negotiation
and Cooperation Geoff Gordon Miroslav Dudí[email protected] [email protected]
CMU Machine Learning Dep’t
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
TheoryFormation
Identify Cultural FactorsCUNY, Georgetown, CMU
Computational ModelsCMU, USC
Virtual HumansUSC
ImplementationCMU
RESEARCHPRODUCTS
Surveys & InterviewsCUNY, CMU, U Mich, Georgetown
Cross-Cultural Interactions
U Pitt, CMU
Data AnalysisCUNY, Georgetown,
U Pitt, CMU
validation
validation
validation
Validated TheoriesModels
Modeling ToolsBriefing Materials
ScenariosTraining Simulations
Common task
Subgroup task
RESEARCHPRODUCTS
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Modeling negotiation and cooperation
• Goal: build a model to
‣ predict behavior of others
‣ optimize our own behavior
‣ understand cultural differences
• …while negotiating and cooperating
3
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
What’s in a model?Own past behavior
Observations of other agentsObservations of nature
Initial store of private information
Model
Future behavior of other agentsFuture behavior of naturePlans for own behavior
observations
predictions
4
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
For example
5
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
For example
6
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
3 kinds of models
• Plain probabilistic
‣ P(predictions | observations)
• E.g.:
‣ roads example
‣ linear regression
‣ Bayes nets
7
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
3 kinds of models
• Decision theoretic
‣ P(predictions | observations, own plan)
‣ E(reward | predictions)
‣ choose plan to optimize rewards
• E.g., POMDPs, influence diagrams
• Adds optimization by self
8
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
3 kinds of models
• Decision theoretic
‣ P(predictions | observations, own plan)
‣ E(reward | predictions)
‣ choose plan to optimize rewards
• E.g., POMDPs, influence diagrams
• Adds optimization by self
model of my own goals
model of my available choices
no longer includes own future plan
8
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Decision-theoretic model
9
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Why decision theory?
• What if we add a new blockage to the road network?
10
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Why decision theory?
• What if we add a new blockage to the road network?
11
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Why decision theory?
• What if we add a new blockage to the road network?
12
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
3 kinds of models
• Game theoretic
‣ P(predictions | observations, all plans)
‣ E(rewardp | predictions)
‣ choose plans to optimize rewards
• E.g., MAIDs, MAML
• Adds optimization by others
13
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
3 kinds of models
• Game theoretic
‣ P(predictions | observations, all plans)
‣ E(rewardp | predictions)
‣ choose plans to optimize rewards
• E.g., MAIDs, MAML
• Adds optimization by others
everyone’s goals
everyone’s available choices
no longer includes any future plans
13
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Why game theory?
• Imagine a patrol planning its route
• It encounters a blockage
• Should it take the most efficient route around it?
• Maybe not: might channel into an ambush
• Need to model goals of other players
14
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Why not game theory?
• Biggest reason: computational cost!
‣ e.g., for characterizing equilibria, machine learning of strategies, even tracking beliefs
‣ Cf: POMDP models in upcoming talk
• In past, has been prohibitive
‣ can limit game-theoretic models of N&C to be very simple, or to use heuristic approximations in solution
15
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Contributions
• New game-theoretic models of N&C
‣ in Multi-Agent Markov Logic (MAML)
• New simulation experiments
• New algorithms: enable bigger models
• All of above: in service of better prediction and optimization of N&C behavior
16
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Contrast: classical game model
• Labor v. mgmt: 2*2*2 repeated Bayesian game‣ Management knows profit level (L/H)
‣ Mgmt. offers low or high wages (w/W)
‣ On w, union chooses whether to strike
‣ Repeat: mgmt offers, union responds, …
e.g.
, [Fu
denb
erg
et a
l., 19
83]
or [
Wils
on, 1
994]
w W
Strike D,0 W,H-W
Work w,H-w W,H-W
w W
Strike D,0 W,L-W
Work w,L-w W,L-W
17
H profitL profit
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Building a model of N&C: motivating example
• Scenario: two-party two-issue negotiation
‣ merchants negotiating over a purchase
‣ issues: type of product (Carpets/Textiles); delivery date (Early/Late)
‣ don’t know each other’s preferences (direction or strength)
‣ or each other’s personality18
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Motivating example
B: I’d rather buy carpets.
S: Carpets are expensive for me to get right now.
S: Could you accept a late delivery date?
B: No, I prefer an earlier one.
S: I could get you textiles for an earlier delivery?
B: OK.
[Buyer & seller conduct transaction for TE]
[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]
19
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
cheap talkcheap talk
propose (CL)reject; cheap talk
propose (TE)accept
transactionbelief update
Motivating example
B: I’d rather buy carpets.
S: Carpets are expensive for me to get right now.
S: Could you accept a late delivery date?
B: No, I prefer an earlier one.
S: I could get you textiles for an earlier delivery?
B: OK.
[Buyer & seller conduct transaction for TE]
[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]
19
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
cheap talkcheap talk
propose (CL)reject; cheap talk
propose (TE)accept
transactionbelief update
Motivating example
B: I’d rather buy carpets.
S: Carpets are expensive for me to get right now.
S: Could you accept a late delivery date?
B: No, I prefer an earlier one.
S: I could get you textiles for an earlier delivery?
B: OK.
[Buyer & seller conduct transaction for TE]
[Throughout, buyer & seller update beliefs about each other’s motives, reputation, etc.]
Payoffs19
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Computational game-theoretic model
• Models simplified subset of interactions
• First, what a solution gives us (or doesn’t)
• Then, brief review of MAML (Multi-Agent Markov Logic), our representation language
• Then, model and results
‣ both would be difficult w/o MAML + algos
20
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Solutions
• Solution = (recipe for behavior of agents, in which each optimizes own payoff) = equilibrium
• Tells us: how agents act/speak, how they interpret actions/utterances of others, how they react to actions/utterances of others
• May be many different solutions!
• In real world: arise through repeated interaction and learning
21
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Limitations
• Expressiveness of agents’ language
‣ here: choice among 5 statements / turn
‣ cf: 30–40 for POMDP, nearly ∞ for English
• Representation of external environment
‣ here: very simple (4 transactions + disagree)
‣ more detail is necessary for future research into combined N&C
22
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Limitations
• Current algorithm: centralized, no learning
‣ we are working on changing these
• The key to all the above: speed!
‣ limits length of game, amount of comm, size of environment, flexibility of structure
23
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
MAML
Y
Z’Z
X
R2
W
R1
T F
Z’’
24
graphical model for games
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
MAML
Y
Z’Z
X
R2
W
R1
T F
Z’’
24
graphical model for games
Nature move
P1 (Green) move
Branching
CollectionP2 (Blue) move
P2 (Blue) reward
P1 (Green) reward
Observation
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
MAML
Y
Z’Z
X
R2
W
R1
T F
Z’’
Info flow: valid path (directed path through same-color nodes—
source may be different color)
Time flow: any consistent complete
ordering of DAG
24
graphical model for games
Nature move
P1 (Green) move
Branching
CollectionP2 (Blue) move
P2 (Blue) reward
P1 (Green) reward
Observation
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Negotiation in MAMLtypes (preferences and strengths, personality)
negotiation: cheap talk, propose, or accept previous proposal (each turn)
utility assignment
final outcome: SW, ST, DW, DT, X
last turn: only accept or reject
SW,ST,DW,DT agree1
final
final
Xagree2
type1
speak’1
speak1
type2
speak’2
speak2
finalutil1
final
util2
SW,ST,DW,DT
X
adjust: #turns, #bits/turn
25
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Negotiation in MAMLtypes (preferences and strengths, personality)
negotiation: cheap talk, propose, or accept previous proposal (each turn)
utility assignment
final outcome: SW, ST, DW, DT, X
last turn: only accept or reject
SW,ST,DW,DT agree1
final
final
Xagree2
type1
speak’1
speak1
type2
speak’2
speak2
finalutil1
final
util2
SW,ST,DW,DT
X
adjust: #turns, #bits/turn
25
For this game:MAML: 5 parametersΓ = 6900EFG size = 166,000clique size = 3.3M
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Types• 24 types:‣ 12 prefs on contract‣ selfish v.
cooperative
• Initially, each agent uncertain about other’s type
• Infers it over time from behavior
26
textilescarpets
earl
yla
te
Contract preferences
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
27
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
27
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
Speak1 = CL27
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
Speak1 = CL28
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
Speak2 = TL28
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
Speak2 = TL28
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
Speak1’ = CE28
sc
Legend:
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
sc
Legend:
29
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
sc
Legend:
Speak2’ = CE29
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Simulation traceType2 (Seller): T+, L,
cooperative
textilescarpets
earl
yla
te
Selle
r’s b
elie
f abo
ut T
ype1
(Bu
yer)
CE CL TE TL X
Seller’s belief about Buyer’s next action
sc
Legend:
Speak2’ = CE
X
29
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Runtime: negotiation• We generated 30 equilibria of 3 variants of the game
‣ in all cases, ε = 2% of initial regret
• 1 round of talk (1 turn each):
‣ Γ = 300, runtime = 2 min
• 1.5 rounds of talk (2+1 turns):
‣ Γ = 1600, runtime = 9 min
• 2 rounds of talk (2 turns each):
‣ Γ = 6900, runtime = 44 min
30
high •
low •
med •
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Runtime: detail• Runtime splits into two pieces:
‣ precomputation (reported on previous slide—do this once, ahead of time)
‣ realtime computation (near instantaneous for our algorithm—do this during negotiation, to compute next move)
• Precomputation is analogous to a group of agents repeatedly interacting over time, to arrive at a convention for how to negotiate
‣ could take days to years for real agents31
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
32
P1: buyerP2: seller
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
s
s
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
32
P1: buyerP2: seller
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
s
s
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
32
P1: buyerP2: seller
feasible gains
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
s
s
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
33
P1: buyerP2: seller
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
s
s
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
34
P1: buyerP2: seller
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
s
s
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
35
P1: buyerP2: seller
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
c
c
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
BATNA
BATNA
Simulation stats
36
P1: buyerP2: seller
preferences:
social motives:
communication:low • med • high • P1 gain
P2 g
ain
P1
P2
s
c
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Future work• Coming year:
‣ Work to add detail to models
‣ particularly to learn about cooperation
‣ Begin to compare predictions to measured human behavior
• Ongoing:
‣ Decentralization and learning
‣ Algorithmic improvements
‣ Incorporate results into realistic agents37
MURI 14 Program Review — September 10, 2009 — Geoff Gordon
Contributions
• New game-theoretic models of N&C
‣ in Multi-Agent Markov Logic (MAML)
• New simulation experiments
• New algorithms: enable bigger models
• All of above: in service of better prediction and optimization of N&C behavior
38