Post on 01-Jan-2021
Learning to Play HeartsNathan Sturtevant
University of Alberta - AI SeminarFebruary 3, 2006
1
Learning In Hearts Nathan SturtevantAdam White
Hearts• Trick-based card game
2
2
Learning In Hearts Nathan SturtevantAdam White
Sample Trick
3
Learning In Hearts Nathan SturtevantAdam White
Sample Trick
4
Learning In Hearts Nathan SturtevantAdam White
Hearts• Trick-based card game
• Want to minimize your points
• One point for every heart (♥)
• 13 points for Q♠
• If one player takes all 26 points
(shoots the moon) others get 26 each
5
5
Learning In Hearts Nathan SturtevantAdam White
Challenge
• Learn to play the game of Hearts well:
• Multi-Player Game
• Imperfect Information
• Learning
6
6
Learning In Hearts Nathan SturtevantAdam White
Multi-Player Games
• A lot of work in two-player games:
• Checkers, chess, backgammon, scrabble, othello, go…
• Much less in multi-player games
7
7
Learning In Hearts Nathan SturtevantAdam White
Multi-Player Games
• Differences:
• Maxn algorithm; generalization of minimax
• Less pruning possible
• Weaker theoretical properties
8
8
Learning In Hearts Nathan SturtevantAdam White
Hearts Example
(0, 0, 13) (13, 0, 0)
1
3
2
3 3 3
(0, 0, 13) (0, 0, 13)
A♠ 3♣
K♠ Q♠ K♠ Q♠2
9
Learning In Hearts Nathan SturtevantAdam White
Imperfect Information• In practice we can’t see opponents cards
• Monte-Carlo Sampling
• Generate perfect-information sample hands for opponents
• Analyze samples
• Combine results
10
10
Learning In Hearts Nathan SturtevantAdam White
Previous Work
• Hearts program based on previous ideas
• Hand-tuned evaluation function
• Non-linear evaluation function
• Somewhat slow
• Plays as well (better than) best computers?
11
11
Learning In Hearts Nathan SturtevantAdam White
12
Learning In Hearts Nathan SturtevantAdam White
Average Scores
Per Game Per Hand
Expert Program 56.1 5.16
Opponent Avg. 76.3 6.97
Played 90 games, each to 100 points.
13
Learning In Hearts Nathan SturtevantAdam White
Learning in Hearts• University of Mass. Course Project
(Perkins, 1998)
• Operational Advice
(Fürnkranz, et. al., 2000)
• State sampling with imperfect-information
(Fujita and Ishii, 2005)
14
14
Learning In Hearts Nathan SturtevantAdam White
Our Approach
• Use search-based approach to train
• Similar to what was used in Backgammon
• TD-Gammon plays at the level of the best humans
15
15
Learning In Hearts Nathan SturtevantAdam White
Backgammon
• Why did learning in backgammon work well?
• Already developed good neural networks
• Stochastic element helps exploration
• Good features
16
16
Learning In Hearts Nathan SturtevantAdam White
Hearts
• Promising domain for learning:
• Game fixed length (13 moves)
• Cards dealt randomly
• Occasionally get good cards
17
17
Learning In Hearts Nathan SturtevantAdam White
Hearts Difficulty
• Cards have relative value
• 5♣ is good when 2-4♣ already played
• 5♣ is bad when 6-A♣ already played
18
18
Learning In Hearts Nathan SturtevantAdam White
Approach
• Define features for game
• Use perceptron (regression) to predict game score given features
• Use maxn to play given predicted score
• Use TD(λ) to train
19
19
Learning In Hearts Nathan SturtevantAdam White
Features
• What features to use for each player?
• 52 cards they could have in their hand
• 52 cards they could have taken
• 104 features per player
• 416 total features
20
20
Learning In Hearts Nathan SturtevantAdam White
Valuable Feature• Interesting feature: P1 has the lowest ♥
• [P1 has 2♥] or
• [P1 has 3♥] and
[[P1 has taken 2♥] or [P2 has taken 2♥] [[P3 has taken 2♥] or [P4 has taken 2♥]]
• …
21
21
Learning In Hearts Nathan SturtevantAdam White
Features• We defined basic ‘atomic’ features
• Only evaluate features on trick boundaries
• Sample Features
• Which suits do we hold low/high cards
• Which suits are we ‘short’
• Which suits does the ‘leader’ have
22
22
Learning In Hearts Nathan SturtevantAdam White
Features
• These features still inadequate
• Combinations of features more interesting than ‘atomic’ features
• Combine features using AND operator
23
23
Learning In Hearts Nathan SturtevantAdam White
Learning Part I• Learn to avoid the Q♠
• 60 ‘atomic features’
• λ = 0.75
• α = 1/[13 x # active features]
• Predict expected points in game
• Train against previous program
• Randomly switch position each game
24
24
Learning In Hearts Nathan SturtevantAdam White
QoS Features
2.75
3.00
3.25
3.50
3.75
4.00
4.25
Break Even 1x Features 2x Features 3x Features 4x Features
Games Played
Ave
rage
Sco
re
200k
25
Learning In Hearts Nathan SturtevantAdam White
Analysis
• What is the network learning
• Easily understand by examining weights assigned to feature sets
26
26
Learning In Hearts Nathan SturtevantAdam White
Features - Avoid Q♠Rank Weight We have We have We have Opponent
1 -0.103 1 low ♠ Lead Q♠ no ♠
2 -0.097 1 low ♠ No ♥ Lead Q♠ no ♠
3 -0.096 2 low ♠ K♠ Q♠ two ♠
4 -0.093 1 low ♠ No ♣ Lead Q♠ no ♠
5 -0.090 1 low ♠ No ♦ Lead Q♠ no ♠
148 -0.040 1 low ♠ Q♠ Lead no ♠
27
Learning In Hearts Nathan SturtevantAdam White
Features - Take Q♠Rank Weight We Have We have We have We have
1 0.125 Q♠ 1 low ♠ Lead
2 0.123 Q♠ 1 low ♠
3 0.117 Q♠ No ♣ No ♥ Lead
4 0.116 A/K/Q♠ Lead
5 0.112 Q♠ No ♣ No ♥ No ♦
28
Learning In Hearts Nathan SturtevantAdam White
Learning Part II
• Learn to avoid taking ♥
• Removed 14 Q♠-specific features
• 42 new point (♥) related features (0-13)
• Same learning parameters
29
29
Learning In Hearts Nathan SturtevantAdam White
Hearts Features
2.75
3.00
3.25
3.50
3.75
Break Even 1x Features 2x Features 3x Features
Games Played
Ave
rage
Sco
re
200k
30
Learning In Hearts Nathan SturtevantAdam White
Learning Part III
• Learn to play the full game of Hearts
• No ‘shooting the moon’
• Take best 10,000 features from the Q♠
• Take best 1,000 features from ♥ points
• Train against expert and by self-play
31
31
Learning In Hearts Nathan SturtevantAdam White
Steady-State Evaluation
• Test the learned networks
• 4 players, 2 player types
• 24 ways of assigning player types
• Repeat each hand 24 - 2 times
• Play 100 hands
32
32
Learning In Hearts Nathan SturtevantAdam White
ArrangementPlayer 1 Player 2 Player 3 Player 4
Expert Trained Trained Trained
Trained Expert Trained Trained
Expert Expert Trained Trained
Trained Trained Expert Trained
Expert Trained Expert Trained
Trained Expert Expert Trained
Expert Expert Expert Trained
33
Learning In Hearts Nathan SturtevantAdam White
Games Against Expert
5.05.56.06.57.07.58.08.5
Break-Even Expert Trained Self Trained
Games Played
Ave
rage
Sco
re
10k
34
Learning In Hearts Nathan SturtevantAdam White
Games Against Expert
5.05.56.06.57.07.58.08.5
Break Even Expert Trained Self Trained
Games Played
Ave
rage
Sco
re
300k
35
Learning In Hearts Nathan SturtevantAdam White
Trained Play
5.5
6.0
6.5
7.0
7.5
Self Trained Expert Trained Break Even
Games Played
Ave
rage
Sco
re
250k
36
Learning In Hearts Nathan SturtevantAdam White
Summary
• Learned to beat ‘expert’ by a large margin
• Program plays well, but lacks deep analysis of game
37
37
Learning In Hearts Nathan SturtevantAdam White
Ongoing Work• Learn with ‘shooting the moon’ turned on
• Duplicate all features three times
• Points are split
• We have all the points
• Someone else has all the points
• Compare steady-state play
38
38
Learning In Hearts Nathan SturtevantAdam White
Steady-State Results(1) (2) Score (1) Score (2)
Self-Trained90k games Expert 6.27 7.10
Expert-Trained220k games Expert 6.17 7.35
Self-Trained90k games
Expert-Trained220k games 6.35 7.03
39
Learning In Hearts Nathan SturtevantAdam White
Future Work• Optimize code
• Different algorithm than maxn
• Better ways of combining features
• Play against humans
• Passing cards
• Imperfect information
40
40
Learning In Hearts Nathan SturtevantAdam White
• Joint work with Adam White
• Thanks to Rich Sutton and Mark Ring
Thank You
41
41