Lecture 4 - Opponent Modelling

45
Making Better Decisions - Opponent Modelling 1

description

This is the 4th of an 8 lecture series that I presented at University of Strathclyde in 2011/2012 as part of the final year AI course. This lecture shows how we can use mathematical analysis to classify players into stereotypes and leverage this classification into generating more successful decisions. (Some content appears to be missing from the end of this one - I'll fix this as soon as I can)

Transcript of Lecture 4 - Opponent Modelling

Page 1: Lecture 4 - Opponent Modelling

Making Better Decisions -Opponent Modelling

1

Page 2: Lecture 4 - Opponent Modelling

Monte Carlo in Poker(Recap)

• Yesterday we saw that Monte Carlo could be used to

estimate the expected reward of an action by evaluating the

delayed reward

• We do this by simulating or "rolling out" games to their end

state.

• Assess the amount we won or lost

2

Page 3: Lecture 4 - Opponent Modelling

Game Tree and Monte Carlo

3

iF

CR

Opponent

Chance

Page 4: Lecture 4 - Opponent Modelling

Random Walksin the Game Tree

• When we walk the Game Tree at random, we pick nodes to

follow at random.

• We assume (for now) that this is an unbiased choice

• This means every choice has the same probability of being

chosen

4

Page 5: Lecture 4 - Opponent Modelling

Can We Do Better?

• Random walks are all well and good

• But a uniform distribution across action choices isn't

accurate

‣ Certain situations will make sensible players more likely to use

certain actions

• How can we bring this bias into play in the walk?

5

Page 6: Lecture 4 - Opponent Modelling

Classifying Opponents

• The way we do this is to work out what type of player

someone is.

• We observe them to get a better understanding of how they

operate.

• In Poker and other games, we can use all sorts of statistical

measures to quantify a player's type.

6

Page 7: Lecture 4 - Opponent Modelling

Action Prediction

• Once we know what kind of player someone is, we can flip

things on their head.

• We answered "what is the likelihood this player is type X

given we have seen this type of play"

• We can now answer "what is the likelihood this player will

make action Y given they are of type X"

• Remember from Bayes Theorem last week, these questions

are closely linked

7

Page 8: Lecture 4 - Opponent Modelling

Simple (Human) Classification

• Pro Poker players try to quantify their opponents into one of

several classes based on 3 measures

‣ Voluntarily Put in Pot (VPiP)

‣ Won at Showdown (WSD)

‣ Pre-flop Raise (PFR)

8

Page 9: Lecture 4 - Opponent Modelling

Player Stereotypes

• Players can be

‣ Tight / Loose (how likely they are to play hands)

‣ Passive / Aggressive pre-flop

‣ Passive / Aggressive post-flop

9

Page 10: Lecture 4 - Opponent Modelling

Utilising Stereotypes

• If we can classify players we can use this against them

• For instance, we might discover that passive players can be

chased off by aggressive play

• Or we understand that when a super-conservative player

decides to raise, we need to be careful

• We can build heuristic rule bases around this like we saw

before.

• Or we can be much smarter

10

Page 11: Lecture 4 - Opponent Modelling

Better Classifications

• Humans are getting by on 3 dimensions

• But Poker has waaaay more statistics available than this

• We can make a lot of use of this extra data.

11

Page 12: Lecture 4 - Opponent Modelling

Poker Tracker

• Poker Tracker is a stats package specifically for Poker

• Analyses play at online casinos

• Real-time access to stats about opponents

• Allows players to review hands later

12

Page 13: Lecture 4 - Opponent Modelling

Stats in Poker

• A few slides ago - Poker has many statistics

• Poker Tracker keeps tabs on around 150 metrics

• Some of these are somewhat similar, some relate more to

the games than the players

13

Page 14: Lecture 4 - Opponent Modelling

Problem of Dimensionality

• The problem now is that we have too much information!

• Trying to learn on cluttered data can be problematic,

assuming it works at all.

14

Page 15: Lecture 4 - Opponent Modelling

Dimensionality Reduction

• Somehow we have to reduce the number of dimensions that

our data points are using.

• In many ways, getting the right data into a learning algorithm

is the biggest challenge.

• As much art as it is engineering.

• Two options

‣ Feature selection

‣ Feature extraction

15

Page 16: Lecture 4 - Opponent Modelling

Selection vs Extraction

• In Selection, you pick the dimensions you believe to be most

relevant

‣ The human players did this to get their 3 dimensional

representations

• In Extraction, you come up new dimensions that can

represent your datapoint

16

Page 17: Lecture 4 - Opponent Modelling

Principal Components Analysis

• PCA is a common strategy for this.

• Recasts the dimensions of the datapoint into another set of

"basis vectors".

• Smushes together dimensions that have a strong correlation

‣ Some stats measures are looking at fundamentally the same thing, in

different ways

‣ E.g. Various raise frequency metrics might be treated as a single

“aggression” dimension after PCA

17

Page 18: Lecture 4 - Opponent Modelling

Principal Components Analysis

• This was going to be a worked example.

• Honestly, that’s way to painful.

• For N observations in M dimensions X is a matrix

where each column is an observation.

• Calculate the mean and std. dev. for each row in the

matrix (each dimension)

18

Page 19: Lecture 4 - Opponent Modelling

Principal Components Analysis

• Calculate the covariance matrix, the amount that

the dimensions vary with respect to each other.

• Calculate the eigenvectors and eigenvalues of the

covariance matrix

‣ The eigenvectors are the new basis vectors of the

reduced-dimension datapoints

‣ The eigenvalues represent how significant the

eigenvector is. Large value = significant

19

Page 20: Lecture 4 - Opponent Modelling

Principal Components Analysis

• Pick the most significant K of the eigenvectors.

• Project the original datapoint in X onto the new

basis vectors.

20

Page 21: Lecture 4 - Opponent Modelling

Principal Components Analysis

• Honestly, if anyone ever asks you to do this

‣ Get a textbook

‣ Use Matlab

‣ Be really careful because it’s kind of complicated

• It is possible to do it by hand.

‣ I can’t anymore...

21

Page 22: Lecture 4 - Opponent Modelling

Principal Components Analysis

• Assuming that you finish the calculations without

mucking up.

‣Or, you find something to work it out for you (Matlab

functions for this exist)

•What you have now is a new datapoint, that is

approximately the same information.

• Recast into fewer dimensions.

‣Note that the dimensions will not make sense

22

Page 23: Lecture 4 - Opponent Modelling

PCA in Action

23

Page 24: Lecture 4 - Opponent Modelling

Clustering Algorithms

• Having performed PCA, we have a much more manageable

set of datapoints, and we’ve eliminated extraneous

dimensions

• Now we need to group them together.

• Clustering algorithms are one approach.

• Tries to find a set of “clusters” of points that are grouped

together.

24

Page 25: Lecture 4 - Opponent Modelling

Clustering

25

0

12.5

25.0

37.5

50.0

0 7.5 15.0 22.5 30.0

Blue Peter style example - real data is rarely so neat

Page 26: Lecture 4 - Opponent Modelling

Clustering

• k-means is one of the most popular algorithms

‣Others exist, fuzzy c-means, FLAME clustering and more

• Pick a value for k

‣ You can play around a bit to find good values or use

some tricks

‣ Accepted “rule of thumb” :

26

Page 27: Lecture 4 - Opponent Modelling

K-Means Algorithm

• Typically, we run the k-means algorithm as an

“iterative refinement” process

‣ Guess at some initial values, keep running the process

round and round until it stabilises

• Randomly assign datapoints to one of the k clusters

• Step 1 - Calculate centroids of the clusters

• Step 2 - Update assignment based on new centroids

• Rinse and repeat 1 and 2 until convergence.27

Page 28: Lecture 4 - Opponent Modelling

K-Means Algorithm

• Calculating Centroids of clusters

‣ xj denotes the datapoints being sampled

‣mi(t+1) denotes mean of cluster i at iteration t+1

‣ Si(t) denotes the set of datapoints assigned to cluster i at

iteration t

• Effectively, the average of the datapoints

28

Page 29: Lecture 4 - Opponent Modelling

K-Means Algorithm

• Assigning Datapoints to Clusters

• The set of points Si is all datapoints for which the

centroid of cluster i (mi) is the nearest centroid.

29

Page 30: Lecture 4 - Opponent Modelling

K-Means Worked Example

• Board work

30

Page 31: Lecture 4 - Opponent Modelling

From Classification to Prediction

• Once we have our clusters defined, we know what

datapoints constitute the type of player we are analysing

• We can use this to predict what the player will do

‣We have a collection of “similar” players, we can use

their history.

‣We may be able to use the raw data from the

observations directly.

• In either case, we can use the classification to predict actions

31

Page 32: Lecture 4 - Opponent Modelling

Back to Monte Carlo

• So, back to the game tree.

• We now have an idea of what type of player we are dealing

with.

• We have an idea of what actions the players are going to

take in given situations.

• Can we plug this back into the Monte Carlo simulation?

32

Page 33: Lecture 4 - Opponent Modelling

Informed Walksin the Game Tree

• We talked earlier about Opponent nodes in the game tree

• Specifically, when we hit an Opponent node, we would use a

uniform distribution to randomly pick between the options

available.

• Now, we can bias that distribution towards selecting the

action we expect the player to take.

33

Page 34: Lecture 4 - Opponent Modelling

Does This Work?

• Intuitively, it should

• The more accurate we make the simulation, the

more accurate the results should be.

• Concern is that the prediction process will slow

things down too much

‣Monte Carlo relies on large numbers of samples, if they

take too long, accuracy isn’t helping.

34

Page 35: Lecture 4 - Opponent Modelling

Does This Work?

•We don’t know.

• It’s been proven to aid Monte Carlo for Poker when

k=1

‣ All players are treated as a generic “player”

• This is ongoing research right now in SAIG.

• Look for papers next year. :)

35

Page 36: Lecture 4 - Opponent Modelling

What We Do Know

•We’ve previously attempted Machine Learning for

Opponent Modelling.

• Using 32 different statistical measures (reduced

down to 8 significant dimensions by PCA)

• Training data of 700,000 hands of Poker

• Successfully extracted around 28 different player

stereotypes.

36

Page 37: Lecture 4 - Opponent Modelling

The Aim of the Game

•We aren’t going to be able to make an AI that

always wins at Poker

• There’s too much chance involved

‣ Bad hands come up

‣Mis-interpreting players

•What we want to do is make an AI that performs

better than the other players under the same

circumstances37

Page 38: Lecture 4 - Opponent Modelling

Evaluation

• Any time we do research we are testing some sort

of scientific hypothesis.

•We need to design experiments to test whether the

hypothesis is true or not

• Science doesn't care if we're right - unbiased. Even if

we're wrong, we have learnt something.

38

Page 39: Lecture 4 - Opponent Modelling

Evaluation

• Consider a pro Poker player

•Will win some games and lose others

‣ In fact, a fundamental rule of good poker play is not even

taking part in about 80% of the games you sit through

• Measuring in terms of a single game doesn't work

‣Need to look at the forest, not the trees

• What counts is how much money the player wins at

the end.39

Page 40: Lecture 4 - Opponent Modelling

Measuring the Strength of an AI

• What we need is a measure of how successful a bot

is on average.

• Poker gives a metric for this - Big Blind / 100

‣Metric is in terms of the table limit - normalised

• Note that even for a large number of games, the

variance on this measure can be really big.

‣Recall Black Swan events - low likelihood, high

impact. Large wins are Black Swans here. 40

Page 41: Lecture 4 - Opponent Modelling

Stable Experimentation

• We really need a way to remove the variance from

the problem.

• Ordinarily we might repeat the experimentation, take

a large number of sample, use law of averages to our

advantage.

• We talked yesterday about the state space of just the

card dealing component of Poker

‣We know it's too large for this to be an option41

Page 42: Lecture 4 - Opponent Modelling

Experimentation

•What if we generate experimental scenarios.

• A large number of games, with the deck already

configured.

•We can play the scenario with player A

• Then replay the exact same scenario with player B

• The results that player A and B generate are now

comparable.

42

Page 43: Lecture 4 - Opponent Modelling

Experimental Design

• Designing good experiments is really important

• Not just for AI but for all kinds of things

• Understanding sources of uncertainty means we can

find ways to factor them out

• Design fair unbiased experiments

• For Science!

43

Page 44: Lecture 4 - Opponent Modelling

Summary

• More detail on Monte Carlo in Poker

• Explanation of Opponent Modelling in Poker

‣Dimensionality Reduction

‣ Clustering algorithms

• Exploiting Opponent Models

• Experimental Design

44

Page 45: Lecture 4 - Opponent Modelling

Next Week

•Other uses for Opponent Models

• Procedural Content Generation

• AI in Video Games

45