Chapter 5 Real Time Heuristic Search

IntroductionIntroduction One of the major problems of the algorithms

schemas we have seen so far is that they take exponential time to find the optimal solution.

Therefore, these algorithms are usually used for solving relatively small problems.

However, most often we do not need the optimal solution, and a near optimal solution can satisfy most “real” problems.

A serious drawback of the above algorithms is that they must search all the way to a complete solution before making a commitment to even the first move of the solution.

The reason is that an optimal first move can not be guaranteed until the entire solution is found and shown to be at least as good as any other solution.

IntroductionIntroduction

Two Player GamesTwo Player Games

Heuristic search in two player games adopts an entirely different set of assumptions .

Example - chess game : the actions are made before the consequences

are known there is a limited amount of time a move that has been made can’t be revoked .

Two Player GamesTwo Player Games

Real-Time Single-Agent SearchReal-Time Single-Agent Search

Our goal is to apply the assumptions of two-player games to single agent heuristic search .

So far we needed both to check all of the available moves and in each case of back tracking- each move that was tried was a waste of time and we didn’t gain any information.

Minimin Lookahead SearchMinimin Lookahead Search

In similar to the minmax search which we used in the two player games , we will use an algorithm minimin for the single problem solving agent .

This algorithm will always look for the minimal route to the goal by choosing each time the next minimal node , and that is because there is only one player that makes all of the decisions.

The search proceeds so that we do a minimin lookahead from a planning mode and at the end of the search we execute the best move that was found . From that point we repeat the lookahead search procedure .

There are a few heuristics that can be used in this algorithm : A* heuristic function : f(n) = g(n)+h(n) Fixed depth heuristic function : a g(n) fixed cost. Fixed f(n) cost : search at the frontier for the

minimal node .


If a goal state is encountered before the search horizon, then the pat is terminated and a heuristic value of zero is assigned to the goal.

If a path ends I a non-goal dead-end before the horizon is reached, than a heuristic value of infinity is assigned to the dead-end node, guaranteeing that the path will not be chosen.


Branch-and-Bound Pruning Branch-and-Bound Pruning

An obvious question is whether every frontier node must be examined to find one of minimum cost.

If we allow heuristic evaluations of interior nodes, then pruning is possible. By using an admissible f function, we can apply the branch-and-bound method, in order to reduce the number of nodes checked .

Efficiency of Branch and BoundEfficiency of Branch and Bound

10 20 30 40 50

10

100

1000

10,000

100,000

1,000,000

99

241588152499

Search depth

nodes

From the graph above we can see the advantages of branch-and-bound pruning verses the brute-force minmax search .

For example : on a scale of a million nodes per move , in the 8

puzzle - brute-force search searches 25 moves, where else the branch-and-bound searches 35 moves.

About 40 % ! We can also see that we get better results for the

15 puzzle than for the 8 puzzle .

Efficiency of Branch and BoundEfficiency of Branch and Bound

An Analytic ModelAn Analytic Model Is the former surprising result special for the

sliding-tile-puzzles or is it more general ?

A model has been defined so that each edge was assigned a value of 0 or 1 with probability p , and a uniform branching factor and depth .

This model represents the tile puzzle. In the tile puzzle- each movement of a tile either increases or decreases by one the h value.

Since for each move the g function increases by one- the f function either increases by 2 or doesn’t increase at all.

It has been proved that if the probability of finding a zero cost node below a certain node is less than one- finding the lowest cost route is exponential, while if the probability is more than one -then the time is polynomial.

An Analytic ModelAn Analytic Model

For example: if the probability is 0.5-for a binary tree the

expected number of zero cost edges is 2*0.5=1, whereas for a ternary tree the expected number of zero cost edges is 3*0.5=1.5 !

We see that a ternary tree can be searched more efficiently than a binary tree !


But this model is not so accurate for a number of reasons: We can predict the results only till a certain

point. The model applies only to a limited depth,

because from there the model assumes the probability for a zero cost node is the same for all edges whereas in the sliding tile puzzle the probability from some depth is not the same for each node. (The probability for a positive node increases).


Real -Time-A* (RTA*)Real -Time-A* (RTA*)

So far , we only found a solution for one move at a time , but not for several moves.

The initial idea would be to repeat the action done for one move several times . But that leads to several problems .

Problems:Problems:

We might want to make a move to a node that has already been visited and we will be in an infinite loop .

If we don’t allow visiting in previous visited nodes , then we may encounter a state where we visited all the rest of the nodes.

Due to the limited information known in each state we want to allow back tracking in cases where we won’t repeat the same moves from that state.


Solution:Solution:

We should allow backtracking only if the cost of returning to that point plus the estimated cost from there is less than the estimated cost from the current point.

Real-Time-A* (RTA*) is an efficient algorithm for implementing this solution.


RTA* AlgorithmRTA* Algorithm In RTA* the value of f(n) for node n is like in A*.

The difference is g(n) in RTA* is computed differently than in A*- g(n) of RTA* is the distance of node n from the current state, and not from the initial state.

The implementation will be stored in an open list, and for each move we update the g value of each node in the open list relatively to the new current state .

f(n) = g(n)+h(n)

The time to make a move is linear in the size of the open list.

It is not clear exactly how to update the g values.

It is not clear how to find the path to the next destination node that was chosen from the open list .

But these problems can be solved in constant time per move !

RTA* - The DrawbacksRTA* - The Drawbacks

RTA* - Example RTA* - Example

22

55

11

33

44

ca

d

b

e i

99

88 77

99 j

km

i

In the example, we start at node a and we update the nodes so that now

f(b)=1+1=2, f(c)=1+2=3 , f(d)=1+3=4 .

The problem solver goes to b because it’s the minimal and update h(a)=f(c)=3. Then we generate nodes e and i and updates that

f(e)=1+4=5 , f(i)=1+5=6 ,f(a)=1+3=4

The problem solver goes to a and updates f(b)=1+5=6 , and so on.


As we can notice we won’t get into an infinite loop even though we do allow back tracking, since each time we gather more information and according to that we decide what will be the next move .

Note: RTA* does not require good admissible functions and will find the solution in any case (though a good heuristic function will give better results).

RTA* running time is linear in the number of moves made, and so is the size of the hash table stored.


Completeness of RTA*Completeness of RTA*

RTA* is complete if it stands under the following restrictions:

The problem space must be finite. A goal must be reachable from every state. There can’t be cycles in the graph with zero or

negative cost. The heuristic values returned must be finite.

Correctness of RTA*Correctness of RTA* RTA* makes decisions based on limited

information, and therefore the quality of the decision it makes is the best relative to the part of the search space it has seen so far.

The nodes that need to be expanded by RTA* are similar to the open list in A*.

The main difference is in the definitions of g and h.

The completeness of RTA* can be proved by induction on the number of moves made.

Solution Quality vs. Solution Quality vs. ComputationComputation

We should also consider the quality of the solution that is returned by RTA*.

This depends on the accuracy of the heuristic function and the search depth.

A choice should be made between all of the families of heuristic functions, while some of them are more accurate but more expensive to compute, while the other are less accurate but simpler to compute.

Learning-RTA* (LRTA*)Learning-RTA* (LRTA*)

Until now RTA* solved the problem for single trailed problems.

We would like now to improve the algorithm so that it will now be good also for multiple problem solving trails

The algorithm for that is the same, except for one change that will make the algorithm suitable for the new problem:

The algorithm will store the best value of the heuristic function, instead of the second best value, each time.

Learning-RTA* (LRTA*)Learning-RTA* (LRTA*)

Convergence of LRTA*Convergence of LRTA*

An important advantage of LRTA* is that because of the repetition of the problem solving trails the heuristic values become the exact values!

This advantage is under the following circumstances :

The initial and goal states are chosen randomly. The initial heuristic values are admissble or do not

overestimate the distance to the nearest goal. Ties are broken randomly, otherwise if we find one

optimal solution- we might continue finding the same one each time, and not find the other trails to the goal.

Theorem 5.2: In a finite space with finite positive edge costs , and non-overestimating initial heuristic values , in which a goal state is reachable from every state , over repeated trials of LRTA* , the heuristic values will eventualy converge to their exact values along every optimal path.

Convergence of LRTA*Convergence of LRTA*

ConclusionConclusion In real-time large scale application, we can’t use the

single agent heuristic search algorithms, because the high cost and the fact that the algorithm does not return a solution before searching the expanded tree.

Minimin solves the problems for such cases.

Branch and bound pruning improves very much the results given by minimin.

RTA* solves the problem, of abandoning a trail to a better looking one, efficiently.

RTA* guarantees finding a solution.

RTA* makes optimal local decisions.

The more usage of lookahead- the higher the cost is but the better quality of solution.

The family of heuristic varies according to the accuracy of the solution and the computational complexity.

The optimal level of lookahead depends on the relative costs of simulating vs. executing moves.

LRTA* is an algorithm that solves the over repeated problem solving trail while preserving the completeness of the solution.

ConclusionConclusion

Heuristic from Relaxed ModelsHeuristic from Relaxed Models

A heuristic function returns the exact cost of reaching a goal in a simplified or relaxed version of the original problem.

This means that we remove some of the constraints of the problem we are dealing with.

Heuristic from Relaxed Models - Heuristic from Relaxed Models - ExampleExample

Consider the problem of navigating in a network of roads from initial location to a goal location,

A good heuristic would be to estimate the cost between two points in a straight line.

We remove the constraint of the original problem that we have to move along the roads and assume that we are allowed to move in a straight line between two points. Thus we get a relaxation of the original problem.

Relaxation example - TSP problemRelaxation example - TSP problem We can describe the problem as a graph with 3

constraints:1 Our tour covers all the cities.2 Every node has a degree two

an edge entering the node and an edge leaving the node.

3 The graph is connected. If we remove constraint 2If we remove constraint 2 :

We get a spanning graph and the optimal solution to this problem is a MST (Minimum Spanning tree).

If we remove constraint 3:If we remove constraint 3:

Now the graph isn’t connected and the optimal solution to this problem is the solution to the assignment problem.

Relaxation example - Tile Puzzle Relaxation example - Tile Puzzle problemproblem

One of the constraints in this problem is that a tile can only slide into the position occupied by a blank.

If we remove this constraint we allow any tile to be moved horizontally or vertically position. And we actually get its Manhattan distance to its goal location.

The STRIPS Problem formulationThe STRIPS Problem formulation

We would like to derive such heuristics automatically.

In order to do that we need a formal description language that is richer than the problem space graph.

One such language is called STRIPSSTRIPS. In this language we have predicatespredicates and

operatorsoperators. Let’s see a STRIPS representation of the Eight

Puzzle Problem

1 On(x,y) = tile x is in location y.2 Clear(z) = location z is clear.3 Adj(y,z) = location y is adjacent to location z.4 Move(x,y,z) = move tile x from location y to location z.

In the language we have: A precondition listA precondition list - for example to execute move(x,y,z) we

must have: On(x,y)

Clear(z)

Adj(y,z) An add listAn add list - predicates that weren’t true before the operator

and now after the operator was executed are true.

A delete listA delete list - a subset of the preconditions, that now after the

operator was executed aren’t true anymore.

STRIPS - Eight Puzzle ExampleSTRIPS - Eight Puzzle Example

STRIPS - Eight Puzzle ExampleSTRIPS - Eight Puzzle Example

Now in order to construct a simplified or relaxed problem we only have to remove some of the preconditions.

For exampleFor example - by removing Clear(z) we allow tiles to move to adjacent locations.

In general, the hard part is to identify which relaxed problems have the property that their exact solution can be efficiently computed.

Admissibility and ConsistencyAdmissibility and Consistency The heuristics that are derived by this method are both

admissible and consistent.

AdmissibilityAdmissibility means that the simplified graph has an equal or lower cost than the lowest - cost path in the original graph.

Note : The cost of the simplified graph should be as close as possible to the original graph.

ConsistencyConsistency means that a heuristic h is consistent for every neighbor n’ of n,

when h(n) is the actual optimal cost of reaching a goal in the graph of the relaxed problem.

h(n) c(n,n’)+h(n)h(n) c(n,n’)+h(n)

We begin by presenting an alternative derivation of the Manhattan distance heuristic for the sliding tile puzzles.

Any description of this problem is likely to describe the goal state as a set of subgoals, where each subgoal is to correctly position of individual tile, ignoring the interaction with the other tiles.

Heuristic from Multiple SubgoalsHeuristic from Multiple Subgoals

Enhancing the Manhattan Enhancing the Manhattan distancedistance

In the Manhattan distance for each tile we looked for the optimal solution ignoring other tiles and only counting moves of the tile in question.

Therefore the heuristic function we get isn’t accurate.

16

1

7 8 95

1110 12 13 14

1615 17 18 19

2120 22 23 24

1 2 3 4

We can perform a single search for each tile, starting from its goal position, and record how many moves of the tile are required to move to it to every other position.

Doing this for all tiles results in a table which gives, for every possible position of each tile, its Manhattan distance from its goal position.

Then, since each move moves one tile, for a given state we add the Manhattan distances of each tile to get an admissible heuristic for the state.


However this heuristic function isn’t accurate, since it ignores the interactions between the tiles.

The obvious next step is to repeat the process on all possible pairs of tiles.

In other words, for each pair of tile, and each combination of positions they would occupy, perform a search to their goal positions, and count only moves of the two tiles of interest. We call this value the pairwise pairwise distance of the two tiles from their goal locationsdistance of the two tiles from their goal locations.


Of course the goal is to find the shortest path from the goal state to all possible positions of the two tiles, where only moves of the two tiles of interest are counted.

For almost all pairs of tiles and positions, their pairwise distances will equal the sum of their Manhattan distances from their goal positions.

However, there are three types of cases where the pairwise distance exceeds the combined Manhattan distance.


1 Two tiles are in the same row or column but are reversed relative to their goal positions.

In order to get to the goal states of the tiles, one tile must move down or up in order to unable the other one to get to its goal location, and than return to the row and go back to its place.

Enhancing the Manhattan Enhancing the Manhattan distance - the first casedistance - the first case

2 1

1

2

1

2

1 2

12

1

2

1

Cost us relatively to Manhattan :+2

2 The corners of the puzzle.

If the 3 tile is in its goal position, but some tile other than the 4 is in the 4 position, the 3 tile will have to move temporarily to correctly position the 4 tile. This requires two moves of the 3 tile, one to move it out of position, and another to move it back. Thus the sum of their Manhattan distances will exceed by two moves to their Manhattan distances.

Enhancing the Manhattan Enhancing the Manhattan distance - the second casedistance - the second case

3

1

1

2

3

3

Cost us relatively to Manhattan : +2

3

4

4

3 4

4

3 In the last moves of the solution.

A detailed explanation is in the next slide

Enhancing the Manhattan Enhancing the Manhattan distance - the third casedistance - the third case

1

Cost us relatively to Manhattan : +2

1

5

1

1

5 1

5

1

Before the last move either the 1 or 5 tile must be in the upper -left corner in the goal state. Thus, the last move must move either the 1 tile right, or the 5 tile down.

Since the Manhattan distance of these tiles is computed to their goal positions, unless the 1 tile is in the left-most column, its Manhattan distance will not accommodate a path through the upper-left corner. Similarly, unless the 5 tile is in the top row ,its Manhattan will not accommodate a path through the upper-left corner.

Thus, if the 1 tile is not in the left-most-column, and the 5 tile is not in the top row, we can add tow moves to the sum of their Manhattan distances. If we first move 1 or 5 tile into the blank position, and thus the pairwise of the 1 and 5 tiles will be two moves greater than the sum o their Manhattan distances, unless the 1 tile starts in the left column or the 5 starts in the top row.

Enhancing the Manhattan Enhancing the Manhattan distance - the third casedistance - the third case

The states of these searches are only distinguishable by

the positions of the two tiles and the blank, and hence there

are = O(n3) different states , where n is number of tiles.

Since there are pairs of tiles, there are = O(n2) such

searches to perform, for an overall time complexity of O(n5).

The size of the resulting table is O(n4), one entry for each

pair of tiles in each combination of positions.

2 2Enhancing the Manhattan Enhancing the Manhattan

distancedistance

n

2

n

2

n

3

The next question is how to automatically handle the interactions between these individual heuristics to compute an overall admissible heuristic estimate for a particular state.

If we represent a state as a graph with a node for each tile, and an edge between each pair. We need to select a set of edges in such a way that the sum of edges selected is maximized. This problem is called the maximum weighted matching problemthe maximum weighted matching problem, and can be solved in O(n3) time, where n is the number of nodes.]

Applying the HeuristicsApplying the Heuristics

Of course when using as heuristic the solution with the pairs, it’s not an optimal solution.

For example for the case:

Therefore in order to get the full power of these heuristic, we need to extend the idea of pairs of tiles to include triples etc’.

13 2 1X X

Higher-Order HeuristicsHigher-Order Heuristics

Pattern DatabasesPattern Databases

In the tile puzzles seen earlier each legal move, moves only one tile and therefore affects only one subgoal.

This unable us to add heuristic estimates for the individual tiles.

This isn’t the same for all problems. For example in the Rubik’s Cube each legal twist

moves a large fraction of the individual cubes.

The simple heuristic is 3-dimensional Manhattan distance, where for every cubie we compute the minimum number of moves required to correctly position and orient it, and sum these values over all cubies.

Here we have to divide the sum to 8 since every move, moves 8 cubies.

A better heuristic is as before, but only to compute the sum of moves for the edge and corner cubies (In contrast to the previous heuristic, here we calculate the number of moves needed for the edge and corner cubies separately).

For the edge cubies we will divide the value by 4, and for the corner cubies we will also divide the value by 4.


We can compute the heuristic function by a table lookup, which is sometimes more efficient since it will save time during execution of the program.

The use in such tables is called pattern databasespattern databases. A pattern database stores the number of moves to

solve different patterns of subsets of the puzzle elements.


For example, the Manhattan distance function is usually computed with the aid of a small table that contains the Manhattan distance for each cubie from all possible positions and orientations.

The idea can be developed much further since for the 8 corner cubies - each cubie can be carried in 3 different orientions, but the last cubie is determined by the other 7.

This results in 8!*3^7=88,179,840 differnet states.


We can use a breadth first search and record in a table the number of moves required to solve each combination of corner cubies (This table requires 42 megabytes).

During an IDA* search as each state is generated, a unique index into the heuristic table is computed, followed by a reference to the table. The stored value is the number of moves needed to solve just the corner cubies, and thus a lower bound on the number of moves needed to solve the entire puzzle.


We can improve the heuristic by considering the 12 edge cubies as well. The edge cubies can be in one of 12! permutations, and each can be in one of two different orientations, but the last orientation of the last cubie is determined by the other 11. However this requires too much memory.

Therefore we will compute and store pattern databases for subsets of the edge cubies .

We can compute the possible combinations for 6 cubies (for 7 cubies it will take too much memory). The number of possible combinations for 6 of the 12 edge cubies is *2 = 42,557,920.

Similarly, we can compute the corresponding heuristic table for the remaining 6 edge cubies.

6126


12

6

The heuristic used for the experiments is the maximum of all 3 of these values: all 8 corner cubies, and 2 groups of 6 edge cubies each.

The total amount of memory for all the 3 tables is 82 megabytes.

The total time to generate all 3 heuristic tables was about an hour.

Even though , the result was a small increase compared to the number of moves for the corner cubies only, it results in a significant performance improvement.

Given more memory we could compute and store even larger pattern databases.


Computer Chess Computer Chess A natural domain for studying AI A natural domain for studying AI

The game is well structured. Perfect information game. Early programmers and AI researchers were

often amateur chess players as well.

Brief History of Computer ChessBrief History of Computer Chess

Maelzel’s Chess MachineMaelzel’s Chess Machine

1769 Chess automaton by Baron Wolfgang von Kempelen of Austria

Appeared to automatically move the pieces on a board on top of the machine and played excellent chess.

Puzzle of the machine playing solved in 1836 by Edgar Allen Poe.

Brief History of Computer ChessBrief History of Computer ChessMaelzel’s Chess MachineMaelzel’s Chess Machine

Early 1950’sEarly 1950’s - First serious paper on computer chess was written by Claude Shannon. Described minimax search with a heuristic static evaluation function and anticipated the need for more selective search algorithms.

19561956 - Invention of alpha-beta pruning by John McCarthy. Used in early programs such as Samuel’s checkers player and Newell, Shaw and Simon’s chess program.


19821982 - Development of Belle by Condon and Thomson. Belle - first machine whose hardware was specifically designed to play chess, in order to achieve speed and search depth.

19971997 - Deep Blue machine was the first machine to defeat the human world champion, Garry Kasparov, in a six-game match.


CheckersCheckers

19521952 - Samuel developed a checkers program that learned its own evaluation through self play.

19921992 - Chinook (J. Schaeffer) wins the U.S Open. At the world championship, Marion Tinsley beat Chinook.

OthelloOthello

Othello programs better than the best humans. Large number of pieces change hands in each

move. Best Othello program today is Logistello (Michael

Buro).

BackgammonBackgammon

Unlike the above games backgammon includes a roll of the dice, introducing a random element.

Best backgammon program TD -gammon(Gerry Tesauro). Comparable to best human players today.

Learns an evaluation function using temporal-difference.

Card gamesCard games In addition to a random element there is hidden

information introduced. Best bridge GIB (M.Ginsberg) Bridge games are not competitive with the best

human players. Poker programs are worse relative to their human

counterparts. Poker involves a strong psychological element

when played by people.

Other games - SummaryOther games - Summary

The greater the branching factor the worse the performance.

Go - branching factor 361 very poor performance. Checkers - branching factor 4 - very good performance.

Backgammon - exception. Large branching factor still gets good results.

Brute-Force SearchBrute-Force Search

We begin considering a purely brute-force approach to game playing.

Clearly, this will only be feasible for small games, but provides a basis for further discussions.

Example - 5-stone NimExample - 5-stone Nim

played with 2 players and pile of stones. Each player removes a stone from the pile. player who removes the last stone wins the game.

Example - Game Tree for 5-Stone NimExample - Game Tree for 5-Stone Nim

5

4 3

3 2 2 1

2 1 1 0 1 0 0

1 0 0 0 0

0

OR nodes

AND nodes

x

x

MinimaxMinimax

Minimax theoremMinimax theorem - Every two-person zero-sum game is a forced win for one player, or a forced draw for either player, in principle these optimal minimax strategies can be computed.

Performing this algorithm on tic-tac-toe results in the root being labeled a draw.

Heuristic Evaluation FunctionsHeuristic Evaluation Functions

ProblemProblem: How to evaluate positions, where brute force is out of the question?

SolutionSolution: Use a heuristic static evaluationheuristic static evaluation functionfunction to estimate the merit of a position when the final outcome has not yet been determined.

Example of heuristic FunctionExample of heuristic Function

ChessChess : Number of pieces on board of each type multiplied

by relative value summed up for each color. By subtracting the weighted material of the black player from the weighted material of the white player we receive the relative strength of the position for each player.

A heuristic static evaluation function for a two player game is a function from a state to a number.

The goal of a two player game is to reach a winning state, but the number of moves required to get there is unimportant.

Other features must be taken into account to get to an overall evaluation function.


Given a heuristic static evaluation function, it is straightforward to write a program to play a game.

From any given position, we simply generate all the legal moves, apply our static evaluator to the position resulting from each move, and then move to the position with the largest or smallest evaluation, depending if we are MIN/MAX


Example - tic-tac-toeExample - tic-tac-toeBehavior of Evaluation FunctionBehavior of Evaluation Function

Detect if game over.

If X is the Maximizer, the function should return if there are three X’s in a row and - if there are three O’s in a row.

Count of the number of different rows, columns, and diagonals occupied by O.

Example: First moves of tic-tac-toeExample: First moves of tic-tac-toe

X

XX

3-0 = 33-0 = 34-4=44-4=4 2-0 = 22-0 = 2

This algorithm is extremely efficient, requiring time that is only linear in the number of legal moves.

It’s drawback is that it only considers immediate consequences of each move (doesn’t look over the horizon).

Example - tic-tac-toeExample - tic-tac-toeBehavior of Evaluation FunctionBehavior of Evaluation Function

Minimax SearchMinimax Search

Where does X go?

1 0 1 0 -1 -1 0 -1 0 -2

X X X

4-3 = 14-3 = 1 4-2 = 24-2 = 2

Minimax searchMinimax search Search as deeply as possible given the computational

resources of the machine and the time constraints on the game.

Evaluate the nodes at the search frontier by the heuristic function.

Where MIN is to move, save the minimum of it’s children’s values. Where MAX is to move, save the maximum of it’s children’s values.

A move is made to a child of the root with the largest or smallest value, depending on whether MAX or MIN is moving.

Minimax searchMinimax search example Minmax Tree

4

4 141413121121019876235

1412218624

14284

24

MAX

MIN

Alpha-Beta PruningAlpha-Beta Pruning

By using alpha-beta pruning the minimax value of the root of a game tree can be determined without having to examine all the nodes.

Alpha-Beta Pruning ExampleAlpha-Beta Pruning Example

4

4 217635

<=2<=16<=34

<=2>=64

24

a

b

og

rl pkhfe

c

d j

i

q

n

m

MAX

MIN

Alpha-BetaAlpha-Beta

Deep pruning - Right half of tree in example.

Next slide code for alpha-beta pruning :

MAXIMINMAXIMIN - assumes that its argument node is a maximizing node.

MINIMAXMINIMAX - the same.

V(N)V(N) - Heuristic static evaluation of node N.

MAXIMIN ( node: N ,lowerbound : alpha ,upperbound: beta)

IF N is at the search depth, RETURN V(N)

FOR each child Ni of N

value = MINIMAX(Ni,alpha,beta)

IF value > alpha , alpha := value

IF alpha >= beta ,return alpha

RETURN alpha

MINIMAX ( node: N ,lowerbound : alpha ,upperbound: beta)

IF N is at the search depth, RETURN V(N)

FOR each child Ni of N

value = MAXIMIN(Ni,alpha,beta)

IF value < beta , beta := value

IF beta <= alpha, return alpha

RETURN beta

Performance of Alpha-BetaPerformance of Alpha-Beta

Efficiency depends on the order in which the nodes are encountered at the search frontier.

Optimal - b½ - if the largest child of a MAX node is generated first, and the smallest child of a MIN node is generated first.

Worst - b.

Average b¾ - random ordering.

Games with chance chance nodes: nodes where chance events

happen (rolling dice, flipping a coin, etc) Evaluate expected value by averaging outcome

probabilities: C is a chance node P(di) probability of rolling di (1,2, …, 12)

S(C,di) is the set of positions generated by applying all legal moves for roll di to C

Games with chance

Backgammon board

Search tree with probabilities MAX

MIN

2 4 7 4 6 0 5 -2

0.5 0.50.5 0.5

2 4 0 -2

3 -1

Search tree with probabilities

Additional EnhancementsAdditional Enhancements

A number of additional improvements have been developed to improve performance with limited computation.

We briefly discuss the most important of these below.

Node OrderingNode Ordering

By using node ordering we can get close to b½ .

Node ordering instead of generating the tree left-to-right, we reorder the tree based on the static evaluations of the interior nodes.

To save space only the immediate children are reordered after the parent is fully expanded.

Iterative DeepeningIterative Deepening

Another idea is to use iterative deepening. In two player games using time, when time runs out, the move recommended by the last completed iteration is made.

Can be combined with node ordering to improve pruning efficiency. Instead of using the heuristic value we can use the value from pervious iteration.

QuiescenceQuiescence

Quiescence search is to make a secondary search in the case of a position whose values are unstable.

This way obtains a stable evaluation.

Transposition TablesTransposition Tables

For efficiency, it is important to detect when a state has already been searched.

In order to detect a searched state, previously generated game states, with their minimax values are saved into a transposition tabletransposition table.

Opening BookOpening Book

Most board games start with the same initial state.

A table of good initial moves is used, based on human expertise, known as an opening bookopening book.

Endgame DatabasesEndgame Databases

A database of endgame moves, with minimax values, is used.

In checkers, endgame for less than eight or fewer pieces on board.

A technique for calculating endgame databases, retrograde analysis.

Special Purpose HardwareSpecial Purpose Hardware

The faster the machine ,the deeper the search in the time available and the better it plays.

The best machines today are based on special-purpose hardware designed and built only to play chess.

Selective SearchSelective Search

The fundamental reason that humans are competitive with computers is that they are very selectiveselective in their choice of positions to examine, unlike programs which do full-widthfull-width fixed depthfixed depth searches.

Selective search: to search only on a “interesting” domain.

ExampleExample - Best first minimax.

Best First MinimaxBest First Minimax

Given a partially expanded minimax tree, the backed up minimax value of the root is determined by one of the leaf nodes, as is the value of every node on the path from the root to that leaf.

This path is known as principal variationprincipal variation, and the leaf is known as principal leafprincipal leaf.

In general, the best-first minimax will generate an unbalanced tree, and make different move decisions than full-width-fixed-depth alpha-beta.

Best First minimax search- Best First minimax search- ExampleExample

6

64

Principal leaf -

expand it


4

24

5 2Principal leaf -

expand it


4

21

5 28 1

Principal leaf -

expand it


4

21

5 78 1

73

Best First searchBest First search

Full width search is a good insurance against missing a move (and making a mistake).

Most game programs that use selective searches use a combined algorithm that starts with a full-width search to a nominal length, and then searches more selectively below that depth.

Chapter 5 Real Time Heuristic Search

Documents

Transcript of Chapter 5 Real Time Heuristic Search