Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan...

Learning to Play HeartsNathan Sturtevant

University of Alberta - AI SeminarFebruary 3, 2006

Learning In Hearts Nathan SturtevantAdam White

Hearts• Trick-based card game

Sample Trick

Hearts• Trick-based card game

• Want to minimize your points

• One point for every heart (♥)

• 13 points for Q♠

• If one player takes all 26 points

(shoots the moon) others get 26 each

Challenge

• Learn to play the game of Hearts well:

• Multi-Player Game

• Imperfect Information

• Learning

Multi-Player Games

• A lot of work in two-player games:

• Checkers, chess, backgammon, scrabble, othello, go…

• Much less in multi-player games

Multi-Player Games

• Differences:

• Maxn algorithm; generalization of minimax

• Less pruning possible

• Weaker theoretical properties

Hearts Example

(0, 0, 13) (13, 0, 0)

(0, 0, 13) (0, 0, 13)

A♠ 3♣

K♠ Q♠ K♠ Q♠2

Imperfect Information• In practice we can’t see opponents cards

• Monte-Carlo Sampling

• Generate perfect-information sample hands for opponents

• Analyze samples

• Combine results

Previous Work

• Hearts program based on previous ideas

• Hand-tuned evaluation function

• Non-linear evaluation function

• Somewhat slow

• Plays as well (better than) best computers?

Average Scores

Per Game Per Hand

Expert Program 56.1 5.16

Opponent Avg. 76.3 6.97

Played 90 games, each to 100 points.

Learning in Hearts• University of Mass. Course Project

(Perkins, 1998)

• Operational Advice

(Fürnkranz, et. al., 2000)

• State sampling with imperfect-information

(Fujita and Ishii, 2005)

Our Approach

• Use search-based approach to train

• Similar to what was used in Backgammon

• TD-Gammon plays at the level of the best humans

Backgammon

• Why did learning in backgammon work well?

• Already developed good neural networks

• Stochastic element helps exploration

• Good features

Hearts

• Promising domain for learning:

• Game fixed length (13 moves)

• Cards dealt randomly

• Occasionally get good cards

Hearts Difficulty

• Cards have relative value

• 5♣ is good when 2-4♣ already played

• 5♣ is bad when 6-A♣ already played

Approach

• Define features for game

• Use perceptron (regression) to predict game score given features

• Use maxn to play given predicted score

• Use TD(λ) to train

Features

• What features to use for each player?

• 52 cards they could have in their hand

• 52 cards they could have taken

• 104 features per player

• 416 total features

Valuable Feature• Interesting feature: P1 has the lowest ♥

• [P1 has 2♥] or

• [P1 has 3♥] and

[[P1 has taken 2♥] or [P2 has taken 2♥] [[P3 has taken 2♥] or [P4 has taken 2♥]]

• …

Features• We defined basic ‘atomic’ features

• Only evaluate features on trick boundaries

• Sample Features

• Which suits do we hold low/high cards

• Which suits are we ‘short’

• Which suits does the ‘leader’ have

Features

• These features still inadequate

• Combinations of features more interesting than ‘atomic’ features

• Combine features using AND operator

Learning Part I• Learn to avoid the Q♠

• 60 ‘atomic features’

• λ = 0.75

• α = 1/[13 x # active features]

• Predict expected points in game

• Train against previous program

• Randomly switch position each game

QoS Features

Break Even 1x Features 2x Features 3x Features 4x Features

Games Played

Analysis

• What is the network learning

• Easily understand by examining weights assigned to feature sets

Features - Avoid Q♠Rank Weight We have We have We have Opponent

1 -0.103 1 low ♠ Lead Q♠ no ♠

2 -0.097 1 low ♠ No ♥ Lead Q♠ no ♠

3 -0.096 2 low ♠ K♠ Q♠ two ♠

4 -0.093 1 low ♠ No ♣ Lead Q♠ no ♠

5 -0.090 1 low ♠ No ♦ Lead Q♠ no ♠

148 -0.040 1 low ♠ Q♠ Lead no ♠

Features - Take Q♠Rank Weight We Have We have We have We have

1 0.125 Q♠ 1 low ♠ Lead

2 0.123 Q♠ 1 low ♠

3 0.117 Q♠ No ♣ No ♥ Lead

4 0.116 A/K/Q♠ Lead

5 0.112 Q♠ No ♣ No ♥ No ♦

Learning Part II

• Learn to avoid taking ♥

• Removed 14 Q♠-specific features

• 42 new point (♥) related features (0-13)

• Same learning parameters

Hearts Features

Break Even 1x Features 2x Features 3x Features

Games Played

Learning Part III

• Learn to play the full game of Hearts

• No ‘shooting the moon’

• Take best 10,000 features from the Q♠

• Take best 1,000 features from ♥ points

• Train against expert and by self-play

Steady-State Evaluation

• Test the learned networks

• 4 players, 2 player types

• 24 ways of assigning player types

• Repeat each hand 24 - 2 times

• Play 100 hands

ArrangementPlayer 1 Player 2 Player 3 Player 4

Expert Trained Trained Trained

Trained Expert Trained Trained

Expert Expert Trained Trained

Trained Trained Expert Trained

Expert Trained Expert Trained

Trained Expert Expert Trained

Expert Expert Expert Trained

Games Against Expert

5.05.56.06.57.07.58.08.5

Break-Even Expert Trained Self Trained

Games Played

Games Against Expert

5.05.56.06.57.07.58.08.5

Break Even Expert Trained Self Trained

Games Played

Trained Play

Self Trained Expert Trained Break Even

Games Played

Summary

• Learned to beat ‘expert’ by a large margin

• Program plays well, but lacks deep analysis of game

Ongoing Work• Learn with ‘shooting the moon’ turned on

• Duplicate all features three times

• Points are split

• We have all the points

• Someone else has all the points

• Compare steady-state play

Steady-State Results(1) (2) Score (1) Score (2)

Self-Trained90k games Expert 6.27 7.10

Expert-Trained220k games Expert 6.17 7.35

Self-Trained90k games

Expert-Trained220k games 6.35 7.03

Future Work• Optimize code

• Different algorithm than maxn

• Better ways of combining features

• Play against humans

• Passing cards

• Imperfect information

• Joint work with Adam White

• Thanks to Rich Sutton and Mark Ring

Thank You

Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan...

Documents

Transcript of Learning to Play Hearts - University of Albertanathanst/talks/Learning...Learning In Hearts Nathan...

National Low Income Housing Coalition - A Shortage of ......Gershenson, 2016), interrupt student learning, and decrease academic achievement (Brennan, Reed, & Sturtevant, 2014). NLIHC

Sturtevant 8 x 5 Table Roll Crusher

Sturtevant - The Monophthongization of Latin Ae

FINANCIAL ANALYSIS AND SWOT FINDINGS RENTAL HOUSING STUDY · 2016-09-09 · August 24, 2016 Presented by: Kyle Talente, RKG Associates, Inc. Dr. Lisa Sturtevant, Lisa Sturtevant &

Today Introduction to Artiﬁcial Intelligence COMP 3501 ...sturtevant/ai-f11/Lecture04.pdfIntroduction to Artiﬁcial Intelligence COMP 3501 / COMP 4704-4 Lecture 4 Prof. Nathan Sturtevant

Sturtevant - The Gutturals in Hittite and IE

Remote Learning Packet First Grade - Great Hearts Irving ... › wp-content › ... · My Great Hearts Irving Student, _____, to the best of my knowledge, attended to his/her remote

Sturtevant Richmont

Electronics Sketching Op Amp Circuit Inputs and Outputs -Terry Sturtevant

Sturtevant Richmont - Amazon S3

Big Hearts for Little Hearts Newsletter - LLUCH · Big Hearts for Little Hearts Newsletter BIG HEARTS FOR LITTLE HEARTS NEWSLETTER, LOMA LINDA GUILD FEBRUARY 2017 — SPECIAL EDITION

BENJAMIN FRANKLIN STURTEVANT AMERICAN FAN ENGINEER · Benjamin Franklin Sturtevant, 1833-1890 Benjamin Franklin Sturtevant was an American fan engineer and possibly the most important

Broken Hearts Bloody Hearts

Sturtevant - Initial h Before l, n, r in Old Icelandic

Sturtevant - Stems of the Hittite Hi-conjugation

DAVID STURTEVANT - University of the Philippines … See D. R. Sturtevant, "Sakdalism and Philippine Radicalism ... 5 Mountainous sociological and anthropological literature on the

See the newench Page 33 - Sturtevant Richmont

Nathan Sturtevant Introduction to Artiﬁcial Intelligencesturtevant/ai-f11/Lecture01.pdfNathan Sturtevant Introduction to Artiﬁcial Intelligence Commercial Success (1980 - present)

Artiﬁcial Intelligence: A History of Game-Playing …sturtevant/w13-games/Lecture1.pdfArtiﬁcial Intelligence: A History of Game-Playing Programs Nathan Sturtevant AI For Traditional

PC/CP 320- Project Overviewdenethor.wlu.ca/pc320/project/projshell.pdf · PC/CP 320 Project Overview Terry Sturtevant Wilfrid Laurier University October 23, 2019 Terry Sturtevant