Game playing. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
-
Upload
randy-hamer -
Category
Documents
-
view
224 -
download
2
Transcript of Game playing. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Game playing
Outline
• Optimal decisions• α-β pruning• Imperfect, real-time decisions
Games vs. search problems
• "Unpredictable" opponent specifying a move for every possible opponent repl
• Time limits unlikely to find goal, must approximate
Game tree (2-player, deterministic, turns)
Minimax
• Perfect play for deterministic games• Idea: choose move to position with highest
minimax value = best achievable payoff against best play
• E.g., 2-ply game:
•
Minimax algorithm
Properties of minimax
• Complete? Yes (if tree is finite)• Optimal? Yes (against an optimal opponent)• Time complexity? O(bm)• Space complexity? O(bm) (depth-first exploration)
• For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible
α-β pruning example
α-β pruning example
α-β pruning example
α-β pruning example
α-β pruning example
Properties of α-β
• Pruning does not affect final result
• Good move ordering improves effectiveness of pruning
• With "perfect ordering," time complexity = O(bm/2) doubles depth of search
• A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)
Why is it called α-β?
• α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max
• If v is worse than α, max will avoid it prune that branch
• Define β similarly for min
The α-β algorithm
The α-β algorithm
How much do we gain?
Assume a game tree of uniform branching factor b Minimax examines O(bh) nodes, so does alpha-beta
in the worst-case The gain for alpha-beta is maximum when:
• The MIN children of a MAX node are ordered in decreasing backed up values
• The MAX children of a MIN node are ordered in increasing backed up values
Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975]
But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree)
If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4)
Heuristic Ordering of Nodes
Order the nodes below the root according to the values backed-up at the previous iteration
Order MAX (resp. MIN) nodes in decreasing (increasing) values of the evaluation function computed at these nodes
Games of imperfect information
• Minimax and alpha-beta pruning require too much leaf-node evaluations.
• May be impractical within a reasonable amount of time.
• SHANNON (1950):– Cut off search earlier (replace
TERMINAL-TEST by CUTOFF-TEST)– Apply heuristic evaluation function EVAL
(replacing utility function of alpha-beta)
Cutting off search
• Change:– if TERMINAL-TEST(state) then return
UTILITY(state)into– if CUTOFF-TEST(state,depth) then return EVAL(state)
• Introduces a fixed-depth limit depth– Is selected so that the amount of time will not exceed
what the rules of the game allow.
• When cuttoff occurs, the evaluation is performed.
Heuristic EVAL
• Idea: produce an estimate of the expected utility of the game from a given position.
• Performance depends on quality of EVAL.• Requirements:
– EVAL should order terminal-nodes in the same way as UTILITY.
– Computation may not take too long.– For non-terminal states the EVAL should be
strongly correlated with the actual chance of winning.
• Only useful for quiescent (no wild swings in value in near future) states
Heuristic EVAL example
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
Heuristic EVAL example
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
Addition assumes independence
Heuristic difficulties
Heuristic counts pieces won
Horizon effectFixed depth search thinks it can avoidthe queening move
Games that include chance
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)
Games that include chance
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)
• [1,1], [6,6] chance 1/36, all other chance 1/18
chance nodes
Games that include chance
• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only
expected value
Expected minimax value
EXPECTED-MINIMAX-VALUE(n)=UTILITY(n) if n is a terminalmaxs successors(n) MINIMAX-VALUE(s) if n is a max node
mins successors(n) MINIMAX-VALUE(s) if n is a max node
s successors(n) P(s) . EXPECTEDMINIMAX(s) if n is a chance node
These equations can be backed-up recursively all the way to the root of the game tree.
Position evaluation with chance nodes
• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change
when values are scaled differently.• Behavior is preserved only by a positive linear
transformation of EVAL.
State-of-the-Art
Checkers: Tinsley vs. Chinook
Name: Marion TinsleyProfession:Teach mathematicsHobby: CheckersRecord: Over 42 years loses only 3 games of checkersWorld champion for over 40 years
Mr. Tinsley suffered his 4th and 5th losses against Chinook
Chinook
First computer to become official world champion of Checkers!Has all endgame table for 10 pieces or less: over 39 trillion entries.
Chess: Kasparov vs. Deep Blue
Kasparov
5’10” 176 lbs 34 years50 billion neurons
2 pos/secExtensiveElectrical/chemicalEnormous
HeightWeight
AgeComputers
SpeedKnowledge
Power SourceEgo
Deep Blue
6’ 5”2,400 lbs
4 years32 RISC processors
+ 256 VLSI chess engines200,000,000 pos/sec
PrimitiveElectrical
None
1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws
Chess: Kasparov vs. Deep Junior
August 2, 2003: Match ends in a 3/3 tie!
Deep Junior
8 CPU, 8 GB RAM, Win 2000
2,000,000 pos/secAvailable at $100
Othello: Murakami vs. Logistello
Takeshi MurakamiWorld Othello Champion
1997: The Logistello software crushed Murakami by 6 games to 0
Go: Goemate vs. ??
Name: Chen ZhixingProfession: RetiredComputer skills:
self-taught programmerAuthor of Goemate (arguably the
best Go program available today)
Gave Goemate a 9 stonehandicap and still easilybeat the program,thereby winning $15,000
Go: Goemate vs. ??
Name: Chen ZhixingProfession: RetiredComputer skills:
self-taught programmerAuthor of Goemate (arguably the
strongest Go programs)
Gave Goemate a 9 stonehandicap and still easilybeat the program,thereby winning $15,000
Jonathan Schaeffer
Go has too high a branching factor for existing search techniques
Current and future software must rely on huge databases and pattern-recognition techniques
Go has too high a branching factor for existing search techniques
Current and future software must rely on huge databases and pattern-recognition techniques
Backgammon
• 1995 TD-Gammon by Gerald Thesauro won world championship on 1995
• BGBlitz won 2008 computer backgammon olympiad
Secrets Many game programs are based on alpha-beta +
iterative deepening + extended/singular search + transposition tables + huge databases + ...
For instance, Chinook searched all checkers configurations with 8 pieces or less and created an endgame database of 444 billion board configurations
The methods are general, but their implementation is dramatically improved by many specifically tuned-up enhancements (e.g., the evaluation functions) like an F1 racing car
Perspective on Games: Con and Pro
Chess is the Drosophila of artificial intelligence. However, computer chess has developed much as genetics might have if the geneticists had concentrated their efforts starting in 1910 on breeding racing Drosophila. We would have some science, but mainly we would have very fast fruit flies.
John McCarthy
Saying Deep Blue doesn’t really think about chess is like saying an airplane
doesn't really fly because it doesn't flap
its wings.
Drew McDermott
Other Types of Games
Multi-player games, with alliances or not
Games with randomness in successor function (e.g., rolling a dice) Expectminimax algorithm
Games with partially observable states (e.g., card games) Search of belief state spaces
See R&N p. 175-180