Lecture 3 - Decision Making
-
Upload
luke-dicken -
Category
Technology
-
view
490 -
download
0
description
Transcript of Lecture 3 - Decision Making
![Page 1: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/1.jpg)
Making Decisions in Games
1
![Page 2: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/2.jpg)
Theory of Real Games
•We’ve been talking about “games” as single
instances of choice - heads/tails, odds/evens etc.
•We’ve talked about how we can repeat the game
(iterating) and interesting things happen.
• Are most games the same choice repeatedly?
2
![Page 3: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/3.jpg)
Real Games
• At a much less abstract level, a game is not one
choice repeated.
• A sequence of different choices.
• Delayed reward
3
![Page 4: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/4.jpg)
Delayed Reward
• Last week we could see the payoffs for each choice
pair in the games.
• Does a single move in chess have a “reward”?
• The reward is whether the game is won or lost -
the combined result of the choice sequence
4
![Page 5: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/5.jpg)
Evaluating Delayed Rewards
•We need to evaluate what the expected payoff of a
given choice is.
• Typically we can only do this at the end of the game.
• How can we decide what to do now if we won’t
know if it was a good decision until later?
5
![Page 6: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/6.jpg)
Chess
•Opening move is one choice.
•Opponent makes their move.
• You reply.
• Note that your 2nd move is a totally different
theoretical “game” to the first move.
6
![Page 7: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/7.jpg)
Chess
• Initially there are 20 opening moves
• Your opponent has 20 responding moves
• 2 moves in, the size of the potential statespace is
400 states.
• The game gets more complicated later
‣ Average number moves per turn : 35
‣ Average game length : 80
• State space size (Shannon's number) : 3123 - HUGE 7
![Page 8: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/8.jpg)
Search
• This state space is way too big for an exhaustive
search approach like mini-max
• Any brute force approach is not going to work
•We need some mechanism to guide the search
towards areas of the game tree that are useful
8
![Page 9: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/9.jpg)
Heuristics
• A heuristic is formally a “strategy using readily
accessible, though loosely applicable, information to
control problem solving in human beings and
machines”
• Less formally, it’s a guess-timate of the value of a
state, typically based on the distance to the goal
(planning) or likelihood of winning (games)
9
![Page 10: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/10.jpg)
Using Heuristics
• Heuristics guide search across spaces that are too
complex to fully enumerate.
• Estimate potential of the next set of states using the
heuristic and go with the best looking one.
• Can be combined with a search strategy like Best
First Search or Enforced Hill Climbing
10
![Page 11: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/11.jpg)
Heuristic Example - A*
• A* search for path planning is a great example of
heuristics in use.
• In a world of tiles, find an optimal path from A to B.
• A* uses two metric :
‣ Concrete metric of the work to get to a location (g)
‣ Estimate of work to get from location to goal (h)
• Search strategy always chooses location that
minimises (h+g)11
![Page 12: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/12.jpg)
Heuristic Example - A*
12
![Page 13: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/13.jpg)
Heuristic Example - A*
13
B
A
![Page 14: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/14.jpg)
Heuristic Example - A*
14
B
A
![Page 15: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/15.jpg)
Heuristic Example - A*
15
B
1 + 7 A
1 +7
![Page 16: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/16.jpg)
Heuristic Example - A*
16
B
1 + 7 A
1 +7
![Page 17: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/17.jpg)
Heuristic Example - A*
17
B
2 + 6
1 + 7 A
2 + 8 1 +7
![Page 18: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/18.jpg)
Heuristic Example - A*
18
B
2 + 6
1 + 7 A
2 + 8 1 +7
![Page 19: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/19.jpg)
Heuristic Example - A*
19
B
3 + 5
2 + 6
1 + 7 A
2 + 8 1 +7
![Page 20: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/20.jpg)
Heuristic Example - A*
20
B
3 + 5
2 + 6
1 + 7 A
2 + 8 1 +7
![Page 21: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/21.jpg)
Heuristic Example - A*
21
4 + 4 B
3 + 5 4 + 4
2 + 6
1 + 7 A
2 + 8 1 +7
![Page 22: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/22.jpg)
Heuristic Example - A*
22
4 + 4 5 + 3 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7
![Page 23: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/23.jpg)
Heuristic Example - A*
23
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7
6 + 4
![Page 24: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/24.jpg)
Heuristic Example - A*
24
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7
6 + 4
![Page 25: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/25.jpg)
Heuristic Example - A*
25
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7 2 + 6
6 + 4
![Page 26: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/26.jpg)
Heuristic Example - A*
26
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7 2 + 6
6 + 4
![Page 27: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/27.jpg)
Heuristic Example - A*
27
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A
2 + 8 1 +7 2 + 6 3 + 5
6 + 4
![Page 28: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/28.jpg)
Heuristic Example - A*
28
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A 4 + 4
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 29: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/29.jpg)
Heuristic Example - A*
29
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6
1 + 7 A 4 + 4
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 30: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/30.jpg)
Heuristic Example - A*
30
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 31: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/31.jpg)
Heuristic Example - A*
31
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 32: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/32.jpg)
Heuristic Example - A*
32
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 33: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/33.jpg)
Heuristic Example - A*
33
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 34: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/34.jpg)
Heuristic Example - A*
34
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 35: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/35.jpg)
Heuristic Example - A*
35
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 36: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/36.jpg)
Heuristic Example - A*
36
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 37: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/37.jpg)
Heuristic Example - A*
37
4 + 4 5 + 3 6 + 2 B
3 + 5 4 + 4 5 + 3 7 + 1
2 + 6 5 + 3 6 + 2
1 + 7 A 4 + 4 5 + 3
2 + 8 1 +7 2 + 6 3 + 5 4 + 4
6 + 4
![Page 38: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/38.jpg)
Heuristics
• Heuristics can guide our search
• Help us understand what states are bringing us
closer to our goals
• Allow us to backtrack when a promising route
becomes problematic
• Do they work well for games?
38
![Page 39: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/39.jpg)
The Maths of Choice
• Common (basic) Combinatorics problem:
‣ How many X element sub-sets can I make from this set of Y
elements.
• Less formally :
‣ How many different ways can I pick Y things from X things
39
![Page 40: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/40.jpg)
Choice
• We can refer to this as “Choosing”
• “I have 5 things, I choose 2”
• We can write it as : 5 C 2
40
![Page 41: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/41.jpg)
Binomials
• Mathematically, n C k is equivalent to the binomial coefficient
• This can be re-written as
‣ ( nk / k! )
41
![Page 42: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/42.jpg)
Permutations
• The choose operator tells you how many sets there are with
unique elements.
• What if the order that the elements are in matters?
• For this we use Permutation
‣ n P k
• Equivalent to :
‣ n! / (n - k)!
42
![Page 43: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/43.jpg)
Poker
• Card game.
• Typically involves gambling.
• “Poker” is technically an entire set of different games that
share similar structure.
• For the purposes of this lecture, Poker refers specifically to
“Limit Texas Hold ‘Em”
43
![Page 44: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/44.jpg)
Texas Hold ‘Em
• Variant of poker created in 1900’s
• Typically 2-10 player games
• Popular recently - Poker on TV and online is typically Texas
Hold ‘Em
• Aim is to make best hand 5 card hand possible using any of
two private “hole” cards and 5 public “community” cards
44
![Page 45: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/45.jpg)
Phases of the Game
• The game is broken into four phases.
• Initial or “Pre-flop” - Hole cards are dealt and a round of
betting occurs.
• Flop - The first three community cards are dealt, another
round of betting.
• Turn - A fourth community card is dealt, and a round of
betting
• River - Final community card dealt, final betting
45
![Page 46: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/46.jpg)
Some Terminology
• Raise - Increase the bet amount
• Fold - Give up on this game, losing any money already bet
• Call - Put in an amount of money to equal what others are
wagering
• Blinds - An initial mandatory wager by two players. Small and
Big. Players responsible for the blind rotates each game.
46
![Page 47: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/47.jpg)
Poker in Research
• Poker has been a major research area for AI for many years.
• Characteristics in common with many real world problems
‣ Hidden information
‣ Bluffing
‣ Loss minimisation
47
![Page 48: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/48.jpg)
Poker at SAIG
• Major research area for us for many years
• Under my supervision for the last 2 years as
honours projects and Summer internships.
• Much of what you’re going to hear about this week
is based on current research happening right now at
SAIG
48
![Page 49: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/49.jpg)
Strathclyde Poker Research Environment
• SPREE was developed to overcome two challenges we face.
‣ Training data sets obtained from online casinos are
imperfect information. This leads to bad machine learning
‣ Every research project wasted significant time re-
implementing a framework for Poker
• SPREE is open source client/server implementation
in Java, with AI-based client and GUI client.
• http://sourceforge.net/projects/spree-poker
49
![Page 50: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/50.jpg)
Limit or No Limit?
• Two types of game - Limit and No Limit
• No Limit - Classical movie Poker.
‣ Raises can be any amount
‣ Any number of raises
• Limit - Common rule set
‣ Raises are a single fixed amount
‣ Limited number of raises allowed per round
50
![Page 51: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/51.jpg)
Limit or No Limit?
• Focus on Limit
• Significantly reduces complexity of the problem.
• Also means we can focus on the game, rather than the
psychological aspects.
51
![Page 52: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/52.jpg)
Poker State Space
• At each point, each player has typically 3 options
‣ Raise, Call, Fold
• We can approximate the size of the search space at point k
as 3k
• We can also determine lower and upper bounds for k since
in Limit there are a fixed number of raises.
52
![Page 53: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/53.jpg)
Dealing Cards
• For a game of N players, 2N + 5 cards are required.
• There are 52 C (2N + 5) different sets of cards that could be
dealt.
• But who gets which card is important, so we need to use
Permutation not Choose
• 52 P (2N+5)
‣ For a standard 10 player game - 5.86 x1024
53
![Page 54: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/54.jpg)
Length of a Poker Game(Lower)
• In the shortest game possible, all players fold.
• The last player (who put in the Big Blind) wins by default
• N-1 choices to reach this point
• 2N cards are required
• 3(N-1) * 52 P 2N
• For a standard 10 player game :
‣ 19683 * 3x1032 = 6x1036
54
![Page 55: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/55.jpg)
Length of a Poker Game(Upper)
• In the longest game possible
• All players initially call, final player to call instead raises.
• 4N-4 turns per round, 4 rounds = 16N-16 turns total
• 2N + 5 cards required
• 3(16N-6) * 52 P 2N + 5
• Again for a 10 player game
‣ 5x1068 * 7.4x1039 = 3.7x10108
55
![Page 56: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/56.jpg)
Total State Space Size
• The total state space is smaller than Shannon’s number
• Still completely unwieldy for any kind of exhaustive search
• Note that we’ve considered the lower and upper bounds of
the state space.
• Actual values will typically fall somewhere between.
• Also note that the upper bound hinges on the restrictions
imposted by Limit, and we don’t need to consider any state
complexity variable raise size would introduce.
56
![Page 57: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/57.jpg)
Abstraction
• There are some things we can do to trim this down (a bit)
• Firstly, we can simplify our view of the starting position
• We don’t need to consider all possible cards that could be
dealt
‣ Cards that will help us change the situation
‣ Cards that don’t help us can be grouped together
57
![Page 58: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/58.jpg)
Starting Hands
• There are 52 C 2 = 1,326 potential opening hands
• But we can reduce this
‣ Suit doesn’t matter except for matching
‣We can reduce it to 2 card “suited” or “unsuited”
‣ 2c, 7d is equivalent to 2d, 7c or 2s, 7h
• This gives a total number of abstract hands as
‣ 13 (pairs) + 13 C 2 (suited) + 13 C 2 (unsuited) = 169
•We’ll see tomorrow there are more abstractions.
58
![Page 59: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/59.jpg)
Heuristics for Poker
• “Every hand’s a winner and every hand’s a loser”
• Heuristics for Poker are tricky because of this.
• Analysis is largely based on your own hand - if my hand at a
point is such-and-such a type or better, it is worth playing
• Kind of naive
59
![Page 60: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/60.jpg)
“Expert” Poker Systems
• You can make a somewhat capable agent by combining a
bunch of these naive heuristics.
• It’s known which of the starting hands are strong and which
are weak.
• You can make a guess as to what you should do based on
your hand strength.
‣ This is not massively informed
• Basic, functional approach, attempts to lift out general rules
that will lead to good results.
60
![Page 61: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/61.jpg)
Evaluating DelayedReward in Poker
• I’ve mentioned delayed reward a few times
• How does this fit into Poker?
• We know that the strength of our hand alone won’t
decide the game.
• We know that opponents can bluff about their hand
strength.
• Need to find out “what happens if” for possible
actions61
![Page 62: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/62.jpg)
Monte Carlo Tree Search
• Initially used without formally defining it by Buffon and Fermi
(among others)
• Developed at Los Alamos by our Game Theory friend John
Von Neumann
• For a large enough sample size, random sampling can often
take the place of exhaustive enumeration
62
![Page 63: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/63.jpg)
Samples and Probes
• When we say a “random sample” we want to sample
the potential outcomes
‣ And find the potential rewards
• The leaf nodes of the game tree have the final value
of the game.
• By randomly walking from the current node to leaf
nodes, we can build up a picture of where our
actions might lead us.63
![Page 64: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/64.jpg)
Exploration vs Exploitation
• We can sample at random, and we'll get coverage in all areas
• Some areas are more promising than others
• We want to "exploit" these areas and inspect them closely
‣ Ensure that they are as good as they look
• At the same time, we want to keep "exploring" in case there
are better areas in the game tree.
• Balancing these two contradictory goals falls to the UCT
heuristic.
64
![Page 65: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/65.jpg)
Reward Evaluation
• We can use the Monte Carlo samples to simulate down to
the end of the game.
• Establish whether we win or lose (and how much).
• Bubble this value back up the tree.
• Build a picture of the amount we can expect to win based on
the actions we are considering this turn.
65
![Page 66: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/66.jpg)
Caveat Emptor
•What we’ve seen today is just ONE approach to
tackling Poker.
• It’s an open challenge in AI to find a good solution
• The techniques used are important
• More important is the reasoning for using these
approaches.
• AI as a toolkit, not a definitive solution.
66
![Page 67: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/67.jpg)
Sampling the StPetersburg Paradox
67
12481632641282565121024
2,147,483,647
834,532,607
435,781,603
222,566,052
108,347,756
54,225,257
27,184,330
13,605,016
6,792,164
3,393,086
1,698,228
![Page 68: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/68.jpg)
Sampling the StPetersburg Paradox
68
• If we repeatedly play out the St Petersburg game we
see that it behaves much as we expect
• Half the games end immediately, a quarter after 1
turn and so on.
• 4,000,000,000 samples, the average is only £14.50
•Where the Expected Value metric didn't inform our
decision making, we can use sampling to see what
actually happens!
![Page 69: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/69.jpg)
Summary
• Understanding real games
• Delayed reward systems
• Poker
• Monte Carlo with UCT (in brief)
69
![Page 70: Lecture 3 - Decision Making](https://reader033.fdocuments.in/reader033/viewer/2022052823/555233c6b4c905b00e8b49b4/html5/thumbnails/70.jpg)
Next Lecture
• More on Monte Carlo
• Describing a player mathematically
• Categorising players into types
• Using this classification for better decisions
70