Skill and Billiards -...

62
Skill and Billiards: Game Theory in Complex Domains Chris Archibald Department of Computing Science University of Alberta January 31, 2013 Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 1 / 44

Transcript of Skill and Billiards -...

Skill and Billiards:Game Theory in Complex Domains

Chris Archibald

Department of Computing ScienceUniversity of Alberta

January 31, 2013

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 1 / 44

AI and Game Theory

Artificial Intelligence Game TheoryRational decision-making Strategic rational decision-making

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 2 / 44

Game Theory

Matching Pennies

b1 b2

a1 1,-1 -1,1a2 -1,1 1,-1

(mixed) strategy: a probability distribution over actionse.g. σA = (0.9, 0.1)best response: a strategy which yields the highest expectedpayoff against a given opponent strategye.g. br(σA) = (0.0, 1.0), since

� b1 �→ (−1 ∗ 0.9) + (1 ∗ 0.1) = −0.8� b2 �→ (+1 ∗ 0.9)− (1 ∗ 0.1) = +0.8

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 3 / 44

Game Theory

Matching Pennies

b1 b2

a1 1,-1 -1,1a2 -1,1 1,-1

(mixed) strategy: a probability distribution over actionse.g. σA = (0.9, 0.1)

best response: a strategy which yields the highest expectedpayoff against a given opponent strategye.g. br(σA) = (0.0, 1.0), since

� b1 �→ (−1 ∗ 0.9) + (1 ∗ 0.1) = −0.8� b2 �→ (+1 ∗ 0.9)− (1 ∗ 0.1) = +0.8

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 3 / 44

Game Theory

Matching Pennies

b1 b2

a1 1,-1 -1,1a2 -1,1 1,-1

(mixed) strategy: a probability distribution over actionse.g. σA = (0.9, 0.1)best response: a strategy which yields the highest expectedpayoff against a given opponent strategye.g. br(σA) = (0.0, 1.0), since

� b1 �→ (−1 ∗ 0.9) + (1 ∗ 0.1) = −0.8� b2 �→ (+1 ∗ 0.9)− (1 ∗ 0.1) = +0.8

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 3 / 44

Game Theory: Equilibrium

Matching Pennies

b1 b2

a1 1,-1 -1,1a2 -1,1 1,-1

(Nash) equilibrium: pair of strategies (σA,σB), such that� σA = br(σB)� σB = br(σA)

In Matching Pennies, the (only) equilibrium is((0.5, 0.5), (0.5, 0.5)):

� b1 �→ (−1 ∗ 0.5) + (1 ∗ 0.5) = 0.0� b2 �→ (+1 ∗ 0.5)− (1 ∗ 0.5) = 0.0

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 4 / 44

Game Theory and AI

Milind Tambe

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 5 / 44

Game Theory and AI

Kit Chen & Michael Bowling

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 6 / 44

Today’s domain: billiards

8-ballEach player has 7 ballsFirst to sink all 7 balls and then the 8-ball winsTo keep turn, called ball must be sunk in called pocketNot striking own ball first or pocketing cue ball gives ball in hand toopponent

Computational poolSoftware agents compete in virtual gameA deterministic physics simulator is usedNoise from a known distribution added to shots

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 7 / 44

Origins

Michael Greenspan

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 8 / 44

Why billiards?

Continuous state space.Continuous action space.Actions taken at discrete times.Unique turn-taking structure.Results of actions are stochastic.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 9 / 44

Outline

Game theory and billiardsModeling Billiards GamesChris Archibald and Yoav ShohamAAMAS 2009

AI and billiardsAnalysis of a Winning Computational Billiards PlayerChris Archibald, Alon Altman, and Yoav ShohamIJCAI 2009

Skill and billiardsSuccess, Strategy, and Skill: An Experimental StudyChris Archibald, Alon Altman, and Yoav ShohamAAMAS 2010

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 10 / 44

In search of a model

Does 8-ball have an equilibrium?

Results from previous game-theoretic models can’t be appliedFundamental dependence on finite number of states or actionsRestrictions on payoff function which don’t match billiards

We need a model that is more precise to billiards.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 11 / 44

In search of a model

Does 8-ball have an equilibrium?

Results from previous game-theoretic models can’t be appliedFundamental dependence on finite number of states or actionsRestrictions on payoff function which don’t match billiards

We need a model that is more precise to billiards.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 11 / 44

A Two-player Zero-sum Billiards Game

Tuple (S,A,λ, p, s0,C, r ), whereS ⊂ Rn is a compact n-dimensional state spaceA ⊂ Rm is the compact m-dimensional action space. at ∈ A is theaction chosen at time step t .λ : S �→ {1, 2} is the turn function. λ(st) indicates the playerwhose turn it is to play in state st .p : S × A �→ ∆(S) is the transition function, where ∆(S) is the setof all probability distributions over S.s0 is the starting state. (λ(s0) is the player who gets the first turnof the game.)C ⊆ S is the set of terminating states, which is a closed subset ofthe state space.r : C �→ R is the reward function.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 12 / 44

Game progression

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 13 / 44

Required assumption 1

Assumption�

f (·)dp(·|s, a) is continuous in A for any f ∈ B(S) and any s ∈ S \ C,where B(S) is the set of all bounded real-valued functions on S.

Continuous functions on compact sets are guaranteed to have amaximum value and a minimum value.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 14 / 44

Main lemmaLemmaLet v∗(s) be the unique fixed point to the value iteration equation

v �(s) =

�maxa

��S v(·)dp(·|s, a)

�if λ(s) = 1

mina��

S v(·)dp(·|s, a)�

if λ(s) = 2

Then the strategies

σ1(s) = arg maxa

��

Sv∗(·)dp(·|s, a)

andσ2(s) = arg min

a

��

Sv∗(·)dp(·|s, a)

form a stationary pure strategy Markov perfect Nash equilibium in thegame.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 15 / 44

Required assumptions 2 & 3

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 16 / 44

Required assumptions 2 & 3

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 16 / 44

Main result

TheoremIf the transition function obeys all three assumptions, then a stationarypure strategy Markov perfect Nash equilibrium exists in billiardsgames.

8-ball has an equilibrium!

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 17 / 44

Main result

TheoremIf the transition function obeys all three assumptions, then a stationarypure strategy Markov perfect Nash equilibrium exists in billiardsgames.

8-ball has an equilibrium!

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 17 / 44

Outline

Game theory and billiardsModeling Billiards GamesChris Archibald and Yoav ShohamAAMAS 2009

AI and billiardsAnalysis of a Winning Computational Billiards PlayerChris Archibald, Alon Altman, and Yoav ShohamIJCAI 2009

Skill and billiardsSuccess, Strategy, and Skill: An Experimental StudyChris Archibald, Alon Altman, and Yoav ShohamAAMAS 2010

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 18 / 44

Computational poolA shot is specified by five real-valued parameters ϕ, θ, a, b,V

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 19 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 20 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 21 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 21 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 21 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 21 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 21 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 22 / 44

State evaluation

State value = (1.0 · 0.95) + (0.33 · 0.82) + (0.15 · 0.63)

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 23 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 24 / 44

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 24 / 44

Break shot

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 25 / 44

Break shot

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 25 / 44

Break shot

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 25 / 44

Break shot

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 25 / 44

Computational pool results

We entered CueCard in the 2008 Computational Pool Tournamentin Beijing China

� 20 CPUs used� Each shot sampled 25-100 times

We won the gold medal, winning 82% of our games.Of the games that we broke, our agent won almost 75%off-the-break, meaning the other agent never got a turn

Let’s watch a game.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 26 / 44

Computational pool results

We entered CueCard in the 2008 Computational Pool Tournamentin Beijing China

� 20 CPUs used� Each shot sampled 25-100 times

We won the gold medal, winning 82% of our games.Of the games that we broke, our agent won almost 75%off-the-break, meaning the other agent never got a turn

Let’s watch a game.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 26 / 44

Analysis of success

A sampling of the results

Component Test Win %

20 CPUs vs 1 CPU 55 %1 CPU vs Pickpocket 77 %

Break shot CC|CC vs PP|CC 69 %CC|PP vs PP|PP 65 %

Sampling/Clustering 1 CPU vs < 30 samples 61 %

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 27 / 44

Outline

Game theory and billiardsModeling Billiards GamesChris Archibald and Yoav ShohamAAMAS 2009

AI and billiardsAnalysis of a Winning Computational Billiards PlayerChris Archibald, Alon Altman, and Yoav ShohamIJCAI 2009

Skill and billiardsSuccess, Strategy, and Skill: An Experimental StudyChris Archibald, Alon Altman, and Yoav ShohamAAMAS 2010

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 28 / 44

Skill in games

Skill: the ability of a player to perform the mental and physicaltasks necessary to succeed in a particular game or undertaking

Billiards exposes two facets of skill: strategic and execution

Strategic skill in billiardsThe method an agent uses to select a shot, (ϕ, θ, a, b,V ), for execution

Execution skill in billiardsThe amount and type of noise added to the shot parameters by theserver before it is executed in the game

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 29 / 44

Skill in games

Skill: the ability of a player to perform the mental and physicaltasks necessary to succeed in a particular game or undertakingBilliards exposes two facets of skill: strategic and execution

Strategic skill in billiardsThe method an agent uses to select a shot, (ϕ, θ, a, b,V ), for execution

Execution skill in billiardsThe amount and type of noise added to the shot parameters by theserver before it is executed in the game

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 29 / 44

Motivation

Main questions:How do different agents respond to changing execution skill?At which execution skill level is strategic skill most important?To identify most strategically skilled agent, what execution skilllevel should be used?

� Does it matter?

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 30 / 44

Experimental setup

Idea:Vary both strategic and execution skill, see how success incomputational pool is impacted

To vary strategic skill:� Four different agent strategies� Vary computing time

To vary execution skill:� Vary the noise added

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 31 / 44

Experimental setup

Idea:Vary both strategic and execution skill, see how success incomputational pool is impacted

To vary strategic skill:� Four different agent strategies� Vary computing time

To vary execution skill:� Vary the noise added

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 31 / 44

Experimental setup

Idea:Vary both strategic and execution skill, see how success incomputational pool is impacted

To vary strategic skill:� Four different agent strategies� Vary computing time

To vary execution skill:� Vary the noise added

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 31 / 44

Experimental setup: varying strategic skill

Four different agentsCueCard (CC)

� 2008 Champion (Single CPU)SingleLevel (SL)

� One level look-aheadOptimisticPlanner (OP)

� Noiseless shot planner� Success estimation with lookup table

MachineGunner (MG)� Random trial and error� Noise robustness (50 samples)� No state evaluation

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 32 / 44

Experimental setup: varying strategic skill

Varying computation timeEach agent has a time limitMore computing time ⇒ more shots to consider ⇒ improvement inchosen shotTime limits between 2 minutes and 6 minutes per gameConsistent time management

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 33 / 44

Experimental setup: varying execution skill

Varying the noise distributionIndependent zero-mean Gaussian distribution for each shotparametersScale all standard deviations by the same factor between 0 and 5

� 0 = perfect execution skill� 5 = very poor execution skill

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 34 / 44

Experimental setup: the experiment

Each agent participated in the same process to generate our data:1 Randomly generate:

� Noise level� Time limit

2 Agent breaks� Single break shot� Rebreak until successful

3 Agent continues game4 Win-off-the-break?5 Repeat around 20,000 times

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 35 / 44

Experimental results: raw data

Example raw data shown for CueCard agent.

(g) CueCard’s wins (h) CueCard’s non-wins

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 36 / 44

Experimental results: processed data

(i) Contour for CC (j) Contour for SL

(k) Contour for OP (l) Contour for MG

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 37 / 44

The value of execution skillMaximum win-off-the-break percentage for each agent

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 38 / 44

The value of strategic skill: time

QuestionDoes extra computation time benefit an agent the same amount ateach noise level?

To answer this, we looked at the difference in win-off-the-breakpercentage that extra time made for each agent at each noise level.

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 39 / 44

The value of strategic skill: timeDifference win-off-the-break percentage for each agent due to time

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 40 / 44

Conclusion

Superior strategic skill is most identifiable when agents have imperfectexecution skill

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 41 / 44

Conclusion

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 42 / 44

Conclusion

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 42 / 44

Other Interests

Execution skill and game theoryHustling in Repeated Zero-Sum Games with Imperfect InformationChris Archibald and Yoav ShohamIJCAI 2011

Search in densely stochastic domainsSparse Sampling for Adversarial GamesMarc Lanctot, Abdallah Saffadine, Joel Veness, and Chris ArchibaldComputer Games Workshop 2012, expanded version under submission.

Agent evaluationBaseline: Practical Control Variates for Agent EvaluationJosh Davidson, Chris Archibald, and Michael BowlingAAMAS 2013 (to appear)

Rating Players in Games with Real-Valued Outcomes (EXTENDED ABSTRACT)Chris Archibald, Matthew Rutherford, Neil Burch, and Michael BowlingAAMAS 2013 (to appear)

Automating Collusion Detection in Sequential GamesParisa Mazrooei, Chris Archibald, and Michael Bowlingunder submission

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 43 / 44

Thank you!Questions?

Chris [email protected]

Chris Archibald (Alberta CS) Skill and Billiards January, 31, 2013 44 / 44