Wood frame buildings in Mediterrean climate: Passive house Borghetti in Montiano (FC) - Italy
Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.
-
date post
22-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.
Opponent Modeling In Interesting Adversarial Environments
Brett Borghetti
22 Jul 2008
22 July 2008 Brett Borghetti 2
Opponent Modeling in Interesting Adversarial Environments
• Opponent: An entity known to be attempting to maximize his own well being in an environment where that usually implies he is minimizing ours.
• Modeling: Building a capability to predict something about a target based on observations of the target in its environment.
• Interesting Environments: Where some agents are unlikely to have access to – or desire to use - perfect equilibrium solutions.
22 July 2008 Brett Borghetti 3
Motivation
• Systems with opponent modeling capability can learn to beat approximations of equilibrium strategies.
• Leaving the equilibrium opens an agent to exploitation.
• So we must decide whether to model our opponent– How good can we do in a given environment?– How good are we doing with a given model?– What can we do to improve if we don’t know our
opponent yet?
22 July 2008 Brett Borghetti 4
Overview
• Contributions and Domains
• Related Work
• Opponent Models and Predictions
• Measuring Model Quality
• The Environment-Value of a Model
• Improving Opponent Models
• Conclusions & Future Work
22 July 2008 Brett Borghetti 5
Contributions
• Developed a fast-learning model for poker and showcased a method for quantifying the contribution made by the learning component.
• Created a new domain-independent measure of performance prediction quality for opponent models.
• Defined the environment-value of an opponent model and showed how to calculate it.
• Presented new techniques for improving models when the opponents are unknown and observations of their behavior are unavailable.
22 July 2008 Brett Borghetti 6
Domains of InterestInformation Action Length Use
Simultaneous
High Card Draw
Hidden Simult. Single Perf. Bounds
Full Scale Poker Hidden Seq. Ext. Measurement; Improving Models
Simultaneous Move Strategy Game
Perfect Simult. Ext. Perf. Bounds
22 July 2008 Brett Borghetti 7
Related Work
• Agent Interaction– Game Theory– Opponent Modeling
• Opponent Model Quality– Desired Properties– Measuring Quality
22 July 2008 Brett Borghetti 8
Related Work: Game TheoryRationality and Equilibrium
Nash, Aumann, Brandenberger: Nash, Eq. & Correlated Eq.; unbounded rationality
Simon: Bounded Rationality
Representing Environments as Games
Nash: two person games, non-cooperative games
Koller & Megiddio: Mapping environments to extensive form games
Solving Games Koller & Megiddio: Direct solution for extensive-form games
Allis: Levels of solution; solving Connect 4
Schaeffer: solving checkers
Intractable Solutions
Billings; Zinkevitch: State Abstraction
Gilpin & Sandholm: Automated Abstraction
22 July 2008 Brett Borghetti 9
Related Work: Opponent ModelingRepresentations: Carmel & Markovitch: DFA
Gal &Pfeffer: MAID, NID
Steffens: Case Based Reasoning
Riley & Veloso: Archetypes
Learning: Bowling & Veloso: WoLF (GD)
Jensen: ELPH (pred. entropy)
Hoehn: Exp3
Using: Carmel & Markovitch: M*
Luckhardt & Irani: maxn
Sturtevant & Bowling: Soft-maxn
22 July 2008 Brett Borghetti 10
Related Work: Model QualityDesired properties:
Bowling & Veloso: Rationality, Convergence
Jensen: Fast reaction (tracking)
Performance Measurements
Carmel & Markovitch: Error
Rogowski: Accuracy
Blaylock & Allen; Cox & Kerkez: Precision and Recall
Davidson; Sukthanar & Sycara: Confusion Matrix
22 July 2008 Brett Borghetti 11
Related Work: What’s Missing?
1. Model prediction quality is usually measured in terms of number of correct predictions made.
2. Nothing regarding performance bounds on opponent models as a function of the game.
3. Lacking in dedicated improvement techniques for opponent modeling.
22 July 2008 Brett Borghetti 12
Definition of Opponent Model
• Two types of models:– Predict future action
based on current state
– Explain past state based on current action
• Can be expressed in terms of probability distribution over possible actions or states
22 July 2008 Brett Borghetti 13
Prediction
• Opponent is going to make a decision• Model’s job is to predict the
conditional probability that the opponent will take each action
F
C
R
?
?
?
22 July 2008 Brett Borghetti 14
Calculating Expected Value (EV)of a Branch
• If we are deciding what to do next we would like to know the EV of each action
• EV = (PiVi)
• EV depends on our opponent
L
B4b
3b
LB
6b
5b
8b
7b
B2b
2b
LB
4b
3b
6b
5b
L8b
7b
r
r
r
r
r
r
c
c
c
c
c
c
c
ff
f
f
f
f
f
f
1.5b
cN
22 July 2008 Brett Borghetti 15
Measuring Model Quality
• Total Performance: How well did the agent perform overall?
• Differential Performance: How much better is the agent when it can learn?
• Prediction Performance: How well did the model predict the opponent?
22 July 2008 Brett Borghetti 16
Measuring Model Quality: Total Performance Measurement
• PokeMinnLimit1 vs. Top 3 agents (2007)– Equilibrium/Fixed
Strategy opponents
• Smoothed payouts over 10 matches/3000 hands each
• Evidence of learning against very strong fixed strategies
22 July 2008 Brett Borghetti 17
Measuring Model Quality:Differential Performance
22 July 2008 Brett Borghetti 18
Measuring Model Quality:Prediction Performance
• Expected value calculations require correct distributions over actions
• If our predictions are distributions over opponent behaviors, current measures are not well suited.– Precision/recall don’t make sense when the number of
underlying classes is large
• We need a scalar measure of the distance between two distributions
22 July 2008 Brett Borghetti 19
Measuring Model Quality:Prediction Performance
• Desired traits of a distribution-distance measurement – Result = 0 if distributions are the same– Result >= 0 for all pairs of distributions– Result is symmetric– Result obeys triangle inequality– Result obeys the transitive property– Result is bounded
22 July 2008 Brett Borghetti 20
Measuring Model Quality:Prediction Performance
Where D is Relative Entropy / KL-Divergence:
Given distributions p (prediction) and q (truth) over a set of choices , the Jensen-Shannon Divergence is:
22 July 2008 Brett Borghetti 21
Measuring Model Quality:Prediction Performance
• Weighted Prediction Divergence: For all nodes k in the set N, given the calculations for each node’s JSD and a weighting scheme W, calculate the weighted prediction divergence for the set of nodes
22 July 2008 Brett Borghetti 22
Measuring Model Quality:Prediction Performance
• Weighting Schemes– Uniform: When no particular biases are required, set
all weights to 1.
– Frequency weighting: When some nodes occur more frequently, weight them according to frequency of occurring
– Utility Weighting: based on the relative value of outcomes for trajectories passing through this node.
– Risk (reward) weighting: Multiplying a node’s frequency by its utility yields the node’s risk.
22 July 2008 Brett Borghetti 23
Measuring Model Quality:Prediction Performance
• Empirical Results from Poker– Collected data from PokeMinnLimit1 vs
HyperboreanLimitEq1 from 2007 Computer Poker Competition
– Compared two opponent models predictions to truth data
– Training Set Size: 20 observations/hand– Test Set Size: 660 observations/hand
22 July 2008 Brett Borghetti 24
Measuring Model Quality:Prediction Performance
• While Learning performed well during the pre-flop and flop portions of the game, it grew worse over time during the turn and river.
22 July 2008 Brett Borghetti 25
Measuring Model Quality:Discussion
• While agent-level performance of the learning model was better than the fixed model, WPD analysis reveals quality issues with the performance of the learning model.
• WPD paves the way for reasoning over which types of models might do well during a specific interaction.
22 July 2008 Brett Borghetti 26
Bounding Performance• Goal: Determine the relationship between the structure
of an environment and the potential value of an opponent model within that environment.– Helps us make decisions on whether, and what to model
• Method– Determine equilibrium behavior (and value) in original game– Transform the environment to provide perfect predictions– Recalculate the behavior and value in the new game
• Domains– Simultaneous-Bet High Card Draw – Simultaneous-move Strategy Game
22 July 2008 Brett Borghetti 27
Bounding Performance:Simultaneous-Bet High Card Draw
• High Card Draw With Simultaneous Betting– Ante = $1– Bet = $1
• Strategy = Threshold Card– Opponent = X axis– Me = Y axis
• Expected Earnings from our perspective (Z axis)
• Pure Strategy Equilibrium (PSE) at EIGHT,EIGHT
-0.75 -0.7-0.65 -0.6
-0.55-0.5
-0.45 -0.4
-0.35
-0.3
-0.25-0.2
-0.15
-0.1
-0.1
-0.05
-0.05
0
0 0.05
0.05
0.1
0.1
0.15
0.2
0.25
0.3
0.350.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
My payouts under normal game circumstances
My opponent's strategy
My
stra
tegy
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
JACK
QUEEN
KING
ACE
22 July 2008 Brett Borghetti 28
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACE-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
X: 8Y: 0.2443
My payouts if I can see my opponent's cards - BR strategies of each player, Ante = 1, Bet = 1
Strategy (Threshold)O
ppon
ent
payo
ut
X: 14Y: 0.2443
My Payout BR against opponent strategy
My Payout when opponent plays BR to my strategy
Bounding Performance: Simultaneous-Bet High Card Draw
• When we know the opponent’s State (Card) what strategy should we choose?– If we have higher card, bet, otherwise follow our strategy
• PSE = ACE,EIGHT• Guaranteed at least $0.24 per game
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.25
0.3
0.3
0.35
0.35
0.4
0.4
0.45
0.45
0.45
0.50.55
0.6
0.65
0.70.75
My payouts if I can see my opponent's cards
My opponent's strategy
My
stra
tegy
if m
y ca
rd is
low
er t
han
my
oppo
nent
s
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
JACK
QUEEN
KING
ACE
Guaranteed Bound
22 July 2008 Brett Borghetti 29
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACE-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
My payouts if I can see my opponent's strategy - BR strategies of each player, Ante = 1, Bet = 1
Strategy (Threshold)
Opp
onen
t pa
yout
X: 8Y: 0
My Payout BR against opponent strategy
My Payout when opponent plays BR to my strategy
Bounding Performance: Simultaneous-Bet High Card Draw
-0.75 -0.7-0.65-0.6
-0.55
-0.5
-0.45
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.1
-0.05
-0.05
0
00.05
0.05
0.1
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.70.75
My payouts under normal game circumstances
My opponent's strategy
My
stra
tegy
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
JACK
QUEEN
KING
ACE
• When we know the opponent’s Policy, what strategy should we use?
• If opponent policy EIGHT, our payoff = $0.0
• But could be higher if opponent is not using that policy
Guaranteed Bound
22 July 2008 Brett Borghetti 30
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACE-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
X: 7Y: 0.09804
Strategy (Threshold)M
y pa
yout
My Payouts if I can see my opponent's action: - BR strategies of each player, Ante = 1, Bet = 1
X: 4Y: 0.1161
X: 14Y: 0.8507My Payout BR against opponent strategy
My Payout when opponent plays BR to my strategy
Bounding Performance: Simultaneous-Bet High Card Draw
-0.75-0.7-0.65-0.6
-0.55
-0.5-0.45
-0.4
-0.35
-0.3-0.25
-0.2-0.15
-0.1
-0.1
-0.05
-0.05
0
00.05
0.05
0.1
0.1
0.150.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.80.85
My payouts if I can see my opponent's action
My opponent's strategy
My
stra
tegy
giv
en m
y op
pone
nt's
act
ion
is t
o be
t
TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
JACK
QUEEN
KING
ACE
• If we know the opponent’s next action what strategy should we choose?– Equivalent to turn-based betting
• No PSE• Guarantee of at least $0.10 if I play fixed (SEVEN)
– Mixed Strategy (SIX and SEVEN, 50% each) guarantees $0.1124• New expected value -0.78 < EV < +0.85
Guaranteed Bound
22 July 2008 Brett Borghetti 31
Bounding Performance: Simultaneous-Bet High Card Draw
• Normal expectation is ±$0.78
• If we know perfectly the opponent’s:– State: Guaranteed at least $0.24– Policy: Guaranteed at least $0.00– Action: Guaranteed at least $0.10 ($0.11
mixed)
• Helps us make modeling decisions
22 July 2008 Brett Borghetti 32
Bounding Performance: Simultaneous-Bet High Card Draw
• Examine class of games formed when– Ante=1-Bet
– Bet ranges from [0,2]
• Where do the guarantees occur?
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.05
0.1
0.15
0.2
0.25Perfect Model Guarantee Comparisons
Bet amount (=1-Ante Amount)
My
payo
ut
State
PolicyAction
22 July 2008 Brett Borghetti 33
Bounding Performance:Simultaneous-Move Strategy Game
• 2-Player Perfect Information Environment– 4 locations– 3 unit cap per player, 4 base limit per player
• Multiple phases per turn– Order/Move/Battle/Generate/Destroy/Build
• Solvable with iterated equilibrium calculation (2136 state isomorphisms with up to 2304 joint action sets per state)
22 July 2008 Brett Borghetti 34
Bounding Performance:Simultaneous-Move Strategy Game
• Transform environment to new environment where the action of the opponent is always revealed– Equivalent to a turn-based game
• Calculate equilibrium in the new game
• Subtract values to obtain performance improvement attributable to the model
22 July 2008 Brett Borghetti 35
Bounding Performance:Simultaneous-Move Strategy Game
• When oracle is used on every turn our improvement is significant– Fair Starting State mean improvement = 0.0808
– All State mean improvement = 0.0692
22 July 2008 Brett Borghetti 36
Bounding Performance:Simultaneous-Move Strategy Game
• When oracle is used only once our improvement is small– Fair Starting State mean improvement = 0.0041
– All State mean improvement = 0.0129
22 July 2008 Brett Borghetti 37
Improving Opponent Models:
• Goal: Formalize methods for assisting developers wishing to improve their modeling algorithms when actual opponents are unknown or unavailable for testing.
• Method: Use co-evolution and near-self-play to generate strong agents
22 July 2008 Brett Borghetti 38
Improving Opponent Models:Tools
• Instrumentation– Identify and track measurable
performance points where quality can be assessed (i.e. WPD)
• Advanced Parametric Control– Identify p key features of the code
(functions or variable settings) and rewrite the software to allow the features to be chosen at run time
22 July 2008 Brett Borghetti 39
Improving Opponent Models:Techniques
• How do we reduce the size of the search?– Preprocess the k-parameter space to determine
parameters with the most effect on outcome
• Rank order performance– Test all pairs of parameter settings after
preprocessing
• Techniques are amenable to automation
22 July 2008 Brett Borghetti 40
Future Work
• Equilibrium deviance detection (JSD)– If opponent is not playing correct distribution
• Automatic Performance Improvement – Genetic Algorithm Parameter Search (offline)– Metareasoning (online)
22 July 2008 Brett Borghetti 41
Future Work (cont.)
• Bounds calculation for model value– Private oracles– Distribution of opponent types– Non-Zero-Sum Environments
• Non-stationary opponents (anticipation)
22 July 2008 Brett Borghetti 42
Questions?
22 July 2008 Brett Borghetti 43
Precision and Recall
• Precision = correctly predicted / total predictions
– What percentage of my predictions were correct?
• Recall = correctly predicted / total in that prediction class
– What percentage of the total class was I able to correctly classify?
• F-measure = 2*P*R/(P+R)
22 July 2008 Brett Borghetti 44
Simultaneous Turn Strategy GameIterated Equilibrium Calculation
• Calculate the set of transition probabilities from every state to every other state, conditional on the orders given by each player
• Initialize values of states for terminal nodes to +1, 0 or -1 for win/tie/loss. Allow all other state values to float (initialize to zero)
• For each subgame of length L=1..N, compute:– Mixed NE distribution over orders from each player in each state– Value of playing the NE for each player, using game (L-1)’s
values for the values of the states
• Repeat until convergence
22 July 2008 Brett Borghetti 45
Metareasoning Goal
• For each prediction required within an expected value calculation, estimate:– What is the prediction quality and cost of each
model at this node?
• Choose the model for each prediction:– Lowest cost that meets the minimum quality– Highest quality within a fixed cost budget
22 July 2008 Brett Borghetti 46
Proposed System: Overview
Object
LevelMeta-Level
Ground
Level
Doing Reasoning Metareasoning
ActionSelection Control
Perception Monitoring
22 July 2008 Brett Borghetti 47
Proposed System:Object Level Component
Object Level
StrategyAction
Generator
Candidate Models
Control
Monitoring
PredictionEngine
ActionSelection
Perception
22 July 2008 Brett Borghetti 48
Proposed System:Meta-Level Component
Meta-Level
ContextAbstraction
ActivityRepository
Control
Monitoring
UtilityCalculation
PredictionsPrediction CostObservations
World State
Con
text