Chapter 5 Real Time Heuristic Search

110

description

Chapter 5 Real Time Heuristic Search. Introduction. One of the major problems of the algorithms schemas we have seen so far is that they take exponential time to find the optimal solution. Therefore, these algorithms are usually used for solving relatively small problems. - PowerPoint PPT Presentation

Transcript of Chapter 5 Real Time Heuristic Search

Page 1: Chapter 5 Real Time Heuristic Search
Page 2: Chapter 5 Real Time Heuristic Search

IntroductionIntroduction One of the major problems of the algorithms

schemas we have seen so far is that they take exponential time to find the optimal solution.

Therefore, these algorithms are usually used for solving relatively small problems.

However, most often we do not need the optimal solution, and a near optimal solution can satisfy most “real” problems.

Page 3: Chapter 5 Real Time Heuristic Search

A serious drawback of the above algorithms is that they must search all the way to a complete solution before making a commitment to even the first move of the solution.

The reason is that an optimal first move can not be guaranteed until the entire solution is found and shown to be at least as good as any other solution.

IntroductionIntroduction

Page 4: Chapter 5 Real Time Heuristic Search

Two Player GamesTwo Player Games

Page 5: Chapter 5 Real Time Heuristic Search

Heuristic search in two player games adopts an entirely different set of assumptions .

Example - chess game : the actions are made before the consequences

are known there is a limited amount of time a move that has been made can’t be revoked .

Two Player GamesTwo Player Games

Page 6: Chapter 5 Real Time Heuristic Search

Real-Time Single-Agent SearchReal-Time Single-Agent Search

Our goal is to apply the assumptions of two-player games to single agent heuristic search .

So far we needed both to check all of the available moves and in each case of back tracking- each move that was tried was a waste of time and we didn’t gain any information.

Page 7: Chapter 5 Real Time Heuristic Search

Minimin Lookahead SearchMinimin Lookahead Search

In similar to the minmax search which we used in the two player games , we will use an algorithm minimin for the single problem solving agent .

This algorithm will always look for the minimal route to the goal by choosing each time the next minimal node , and that is because there is only one player that makes all of the decisions.

Page 8: Chapter 5 Real Time Heuristic Search

The search proceeds so that we do a minimin lookahead from a planning mode and at the end of the search we execute the best move that was found . From that point we repeat the lookahead search procedure .

There are a few heuristics that can be used in this algorithm : A* heuristic function : f(n) = g(n)+h(n) Fixed depth heuristic function : a g(n) fixed cost. Fixed f(n) cost : search at the frontier for the

minimal node .

Minimin Lookahead SearchMinimin Lookahead Search

Page 9: Chapter 5 Real Time Heuristic Search

If a goal state is encountered before the search horizon, then the pat is terminated and a heuristic value of zero is assigned to the goal.

If a path ends I a non-goal dead-end before the horizon is reached, than a heuristic value of infinity is assigned to the dead-end node, guaranteeing that the path will not be chosen.

Minimin Lookahead SearchMinimin Lookahead Search

Page 10: Chapter 5 Real Time Heuristic Search

Branch-and-Bound Pruning Branch-and-Bound Pruning

An obvious question is whether every frontier node must be examined to find one of minimum cost.

If we allow heuristic evaluations of interior nodes, then pruning is possible. By using an admissible f function, we can apply the branch-and-bound method, in order to reduce the number of nodes checked .

Page 11: Chapter 5 Real Time Heuristic Search

Efficiency of Branch and BoundEfficiency of Branch and Bound

10 20 30 40 50

10

100

1000

10,000

100,000

1,000,000

99

241588152499

Search depth

nodes

Page 12: Chapter 5 Real Time Heuristic Search

From the graph above we can see the advantages of branch-and-bound pruning verses the brute-force minmax search .

For example : on a scale of a million nodes per move , in the 8

puzzle - brute-force search searches 25 moves, where else the branch-and-bound searches 35 moves.

About 40 % ! We can also see that we get better results for the

15 puzzle than for the 8 puzzle .

Efficiency of Branch and BoundEfficiency of Branch and Bound

Page 13: Chapter 5 Real Time Heuristic Search

An Analytic ModelAn Analytic Model Is the former surprising result special for the

sliding-tile-puzzles or is it more general ?

A model has been defined so that each edge was assigned a value of 0 or 1 with probability p , and a uniform branching factor and depth .

This model represents the tile puzzle. In the tile puzzle- each movement of a tile either increases or decreases by one the h value.

Page 14: Chapter 5 Real Time Heuristic Search

Since for each move the g function increases by one- the f function either increases by 2 or doesn’t increase at all.

It has been proved that if the probability of finding a zero cost node below a certain node is less than one- finding the lowest cost route is exponential, while if the probability is more than one -then the time is polynomial.

An Analytic ModelAn Analytic Model

Page 15: Chapter 5 Real Time Heuristic Search

For example: if the probability is 0.5-for a binary tree the

expected number of zero cost edges is 2*0.5=1, whereas for a ternary tree the expected number of zero cost edges is 3*0.5=1.5 !

We see that a ternary tree can be searched more efficiently than a binary tree !

An Analytic ModelAn Analytic Model

Page 16: Chapter 5 Real Time Heuristic Search

But this model is not so accurate for a number of reasons: We can predict the results only till a certain

point. The model applies only to a limited depth,

because from there the model assumes the probability for a zero cost node is the same for all edges whereas in the sliding tile puzzle the probability from some depth is not the same for each node. (The probability for a positive node increases).

An Analytic ModelAn Analytic Model

Page 17: Chapter 5 Real Time Heuristic Search

Real -Time-A* (RTA*)Real -Time-A* (RTA*)

So far , we only found a solution for one move at a time , but not for several moves.

The initial idea would be to repeat the action done for one move several times . But that leads to several problems .

Page 18: Chapter 5 Real Time Heuristic Search

Problems:Problems:

We might want to make a move to a node that has already been visited and we will be in an infinite loop .

If we don’t allow visiting in previous visited nodes , then we may encounter a state where we visited all the rest of the nodes.

Due to the limited information known in each state we want to allow back tracking in cases where we won’t repeat the same moves from that state.

Real -Time-A* (RTA*)Real -Time-A* (RTA*)

Page 19: Chapter 5 Real Time Heuristic Search

Solution:Solution:

We should allow backtracking only if the cost of returning to that point plus the estimated cost from there is less than the estimated cost from the current point.

Real-Time-A* (RTA*) is an efficient algorithm for implementing this solution.

Real -Time-A* (RTA*)Real -Time-A* (RTA*)

Page 20: Chapter 5 Real Time Heuristic Search

RTA* AlgorithmRTA* Algorithm In RTA* the value of f(n) for node n is like in A*.

The difference is g(n) in RTA* is computed differently than in A*- g(n) of RTA* is the distance of node n from the current state, and not from the initial state.

The implementation will be stored in an open list, and for each move we update the g value of each node in the open list relatively to the new current state .

f(n) = g(n)+h(n)

Page 21: Chapter 5 Real Time Heuristic Search

The time to make a move is linear in the size of the open list.

It is not clear exactly how to update the g values.

It is not clear how to find the path to the next destination node that was chosen from the open list .

But these problems can be solved in constant time per move !

RTA* - The DrawbacksRTA* - The Drawbacks

Page 22: Chapter 5 Real Time Heuristic Search

RTA* - Example RTA* - Example

22

55

11

33

44

ca

d

b

e i

99

88 77

99 j

km

i

Page 23: Chapter 5 Real Time Heuristic Search

In the example, we start at node a and we update the nodes so that now

f(b)=1+1=2, f(c)=1+2=3 , f(d)=1+3=4 .

The problem solver goes to b because it’s the minimal and update h(a)=f(c)=3. Then we generate nodes e and i and updates that

f(e)=1+4=5 , f(i)=1+5=6 ,f(a)=1+3=4

The problem solver goes to a and updates f(b)=1+5=6 , and so on.

RTA* - Example RTA* - Example

Page 24: Chapter 5 Real Time Heuristic Search

As we can notice we won’t get into an infinite loop even though we do allow back tracking, since each time we gather more information and according to that we decide what will be the next move .

Note: RTA* does not require good admissible functions and will find the solution in any case (though a good heuristic function will give better results).

RTA* running time is linear in the number of moves made, and so is the size of the hash table stored.

RTA* - Example RTA* - Example

Page 25: Chapter 5 Real Time Heuristic Search

Completeness of RTA*Completeness of RTA*

RTA* is complete if it stands under the following restrictions:

The problem space must be finite. A goal must be reachable from every state. There can’t be cycles in the graph with zero or

negative cost. The heuristic values returned must be finite.

Page 26: Chapter 5 Real Time Heuristic Search

Correctness of RTA*Correctness of RTA* RTA* makes decisions based on limited

information, and therefore the quality of the decision it makes is the best relative to the part of the search space it has seen so far.

The nodes that need to be expanded by RTA* are similar to the open list in A*.

The main difference is in the definitions of g and h.

The completeness of RTA* can be proved by induction on the number of moves made.

Page 27: Chapter 5 Real Time Heuristic Search

Solution Quality vs. Solution Quality vs. ComputationComputation

We should also consider the quality of the solution that is returned by RTA*.

This depends on the accuracy of the heuristic function and the search depth.

A choice should be made between all of the families of heuristic functions, while some of them are more accurate but more expensive to compute, while the other are less accurate but simpler to compute.

Page 28: Chapter 5 Real Time Heuristic Search

Learning-RTA* (LRTA*)Learning-RTA* (LRTA*)

Until now RTA* solved the problem for single trailed problems.

We would like now to improve the algorithm so that it will now be good also for multiple problem solving trails

Page 29: Chapter 5 Real Time Heuristic Search

The algorithm for that is the same, except for one change that will make the algorithm suitable for the new problem:

The algorithm will store the best value of the heuristic function, instead of the second best value, each time.

Learning-RTA* (LRTA*)Learning-RTA* (LRTA*)

Page 30: Chapter 5 Real Time Heuristic Search

Convergence of LRTA*Convergence of LRTA*

An important advantage of LRTA* is that because of the repetition of the problem solving trails the heuristic values become the exact values!

This advantage is under the following circumstances :

The initial and goal states are chosen randomly. The initial heuristic values are admissble or do not

overestimate the distance to the nearest goal. Ties are broken randomly, otherwise if we find one

optimal solution- we might continue finding the same one each time, and not find the other trails to the goal.

Page 31: Chapter 5 Real Time Heuristic Search

Theorem 5.2: In a finite space with finite positive edge costs , and non-overestimating initial heuristic values , in which a goal state is reachable from every state , over repeated trials of LRTA* , the heuristic values will eventualy converge to their exact values along every optimal path.

Convergence of LRTA*Convergence of LRTA*

Page 32: Chapter 5 Real Time Heuristic Search

ConclusionConclusion In real-time large scale application, we can’t use the

single agent heuristic search algorithms, because the high cost and the fact that the algorithm does not return a solution before searching the expanded tree.

Minimin solves the problems for such cases.

Branch and bound pruning improves very much the results given by minimin.

RTA* solves the problem, of abandoning a trail to a better looking one, efficiently.

Page 33: Chapter 5 Real Time Heuristic Search

RTA* guarantees finding a solution.

RTA* makes optimal local decisions.

The more usage of lookahead- the higher the cost is but the better quality of solution.

The family of heuristic varies according to the accuracy of the solution and the computational complexity.

The optimal level of lookahead depends on the relative costs of simulating vs. executing moves.

LRTA* is an algorithm that solves the over repeated problem solving trail while preserving the completeness of the solution.

ConclusionConclusion

Page 34: Chapter 5 Real Time Heuristic Search
Page 35: Chapter 5 Real Time Heuristic Search

Heuristic from Relaxed ModelsHeuristic from Relaxed Models

A heuristic function returns the exact cost of reaching a goal in a simplified or relaxed version of the original problem.

This means that we remove some of the constraints of the problem we are dealing with.

Page 36: Chapter 5 Real Time Heuristic Search

Heuristic from Relaxed Models - Heuristic from Relaxed Models - ExampleExample

Consider the problem of navigating in a network of roads from initial location to a goal location,

A good heuristic would be to estimate the cost between two points in a straight line.

We remove the constraint of the original problem that we have to move along the roads and assume that we are allowed to move in a straight line between two points. Thus we get a relaxation of the original problem.

Page 37: Chapter 5 Real Time Heuristic Search

Relaxation example - TSP problemRelaxation example - TSP problem We can describe the problem as a graph with 3

constraints:1 Our tour covers all the cities.2 Every node has a degree two

an edge entering the node and an edge leaving the node.

3 The graph is connected. If we remove constraint 2If we remove constraint 2 :

We get a spanning graph and the optimal solution to this problem is a MST (Minimum Spanning tree).

If we remove constraint 3:If we remove constraint 3:

Now the graph isn’t connected and the optimal solution to this problem is the solution to the assignment problem.

Page 38: Chapter 5 Real Time Heuristic Search

Relaxation example - Tile Puzzle Relaxation example - Tile Puzzle problemproblem

One of the constraints in this problem is that a tile can only slide into the position occupied by a blank.

If we remove this constraint we allow any tile to be moved horizontally or vertically position. And we actually get its Manhattan distance to its goal location.

Page 39: Chapter 5 Real Time Heuristic Search

The STRIPS Problem formulationThe STRIPS Problem formulation

We would like to derive such heuristics automatically.

In order to do that we need a formal description language that is richer than the problem space graph.

One such language is called STRIPSSTRIPS. In this language we have predicatespredicates and

operatorsoperators. Let’s see a STRIPS representation of the Eight

Puzzle Problem

Page 40: Chapter 5 Real Time Heuristic Search

1 On(x,y) = tile x is in location y.2 Clear(z) = location z is clear.3 Adj(y,z) = location y is adjacent to location z.4 Move(x,y,z) = move tile x from location y to location z.

In the language we have: A precondition listA precondition list - for example to execute move(x,y,z) we

must have: On(x,y)

Clear(z)

Adj(y,z) An add listAn add list - predicates that weren’t true before the operator

and now after the operator was executed are true.

A delete listA delete list - a subset of the preconditions, that now after the

operator was executed aren’t true anymore.

STRIPS - Eight Puzzle ExampleSTRIPS - Eight Puzzle Example

Page 41: Chapter 5 Real Time Heuristic Search

STRIPS - Eight Puzzle ExampleSTRIPS - Eight Puzzle Example

Now in order to construct a simplified or relaxed problem we only have to remove some of the preconditions.

For exampleFor example - by removing Clear(z) we allow tiles to move to adjacent locations.

In general, the hard part is to identify which relaxed problems have the property that their exact solution can be efficiently computed.

Page 42: Chapter 5 Real Time Heuristic Search

Admissibility and ConsistencyAdmissibility and Consistency The heuristics that are derived by this method are both

admissible and consistent.

AdmissibilityAdmissibility means that the simplified graph has an equal or lower cost than the lowest - cost path in the original graph.

Note : The cost of the simplified graph should be as close as possible to the original graph.

ConsistencyConsistency means that a heuristic h is consistent for every neighbor n’ of n,

when h(n) is the actual optimal cost of reaching a goal in the graph of the relaxed problem.

h(n) c(n,n’)+h(n)h(n) c(n,n’)+h(n)

Page 43: Chapter 5 Real Time Heuristic Search

We begin by presenting an alternative derivation of the Manhattan distance heuristic for the sliding tile puzzles.

Any description of this problem is likely to describe the goal state as a set of subgoals, where each subgoal is to correctly position of individual tile, ignoring the interaction with the other tiles.

Heuristic from Multiple SubgoalsHeuristic from Multiple Subgoals

Page 44: Chapter 5 Real Time Heuristic Search

Enhancing the Manhattan Enhancing the Manhattan distancedistance

In the Manhattan distance for each tile we looked for the optimal solution ignoring other tiles and only counting moves of the tile in question.

Therefore the heuristic function we get isn’t accurate.

16

1

7 8 95

1110 12 13 14

1615 17 18 19

2120 22 23 24

1 2 3 4

Page 45: Chapter 5 Real Time Heuristic Search

We can perform a single search for each tile, starting from its goal position, and record how many moves of the tile are required to move to it to every other position.

Doing this for all tiles results in a table which gives, for every possible position of each tile, its Manhattan distance from its goal position.

Then, since each move moves one tile, for a given state we add the Manhattan distances of each tile to get an admissible heuristic for the state.

Enhancing the Manhattan Enhancing the Manhattan distancedistance

Page 46: Chapter 5 Real Time Heuristic Search

However this heuristic function isn’t accurate, since it ignores the interactions between the tiles.

The obvious next step is to repeat the process on all possible pairs of tiles.

In other words, for each pair of tile, and each combination of positions they would occupy, perform a search to their goal positions, and count only moves of the two tiles of interest. We call this value the pairwise pairwise distance of the two tiles from their goal locationsdistance of the two tiles from their goal locations.

Enhancing the Manhattan Enhancing the Manhattan distancedistance

Page 47: Chapter 5 Real Time Heuristic Search

Of course the goal is to find the shortest path from the goal state to all possible positions of the two tiles, where only moves of the two tiles of interest are counted.

For almost all pairs of tiles and positions, their pairwise distances will equal the sum of their Manhattan distances from their goal positions.

However, there are three types of cases where the pairwise distance exceeds the combined Manhattan distance.

Enhancing the Manhattan Enhancing the Manhattan distancedistance

Page 48: Chapter 5 Real Time Heuristic Search

1 Two tiles are in the same row or column but are reversed relative to their goal positions.

In order to get to the goal states of the tiles, one tile must move down or up in order to unable the other one to get to its goal location, and than return to the row and go back to its place.

Enhancing the Manhattan Enhancing the Manhattan distance - the first casedistance - the first case

2 1

1

2

1

2

1 2

12

1

2

1

Cost us relatively to Manhattan :+2

Page 49: Chapter 5 Real Time Heuristic Search

2 The corners of the puzzle.

If the 3 tile is in its goal position, but some tile other than the 4 is in the 4 position, the 3 tile will have to move temporarily to correctly position the 4 tile. This requires two moves of the 3 tile, one to move it out of position, and another to move it back. Thus the sum of their Manhattan distances will exceed by two moves to their Manhattan distances.

Enhancing the Manhattan Enhancing the Manhattan distance - the second casedistance - the second case

3

1

1

2

3

3

Cost us relatively to Manhattan : +2

3

4

4

3 4

4

Page 50: Chapter 5 Real Time Heuristic Search

3 In the last moves of the solution.

A detailed explanation is in the next slide

Enhancing the Manhattan Enhancing the Manhattan distance - the third casedistance - the third case

1

Cost us relatively to Manhattan : +2

1

5

1

1

5 1

5

1

Page 51: Chapter 5 Real Time Heuristic Search

Before the last move either the 1 or 5 tile must be in the upper -left corner in the goal state. Thus, the last move must move either the 1 tile right, or the 5 tile down.

Since the Manhattan distance of these tiles is computed to their goal positions, unless the 1 tile is in the left-most column, its Manhattan distance will not accommodate a path through the upper-left corner. Similarly, unless the 5 tile is in the top row ,its Manhattan will not accommodate a path through the upper-left corner.

Thus, if the 1 tile is not in the left-most-column, and the 5 tile is not in the top row, we can add tow moves to the sum of their Manhattan distances. If we first move 1 or 5 tile into the blank position, and thus the pairwise of the 1 and 5 tiles will be two moves greater than the sum o their Manhattan distances, unless the 1 tile starts in the left column or the 5 starts in the top row.

Enhancing the Manhattan Enhancing the Manhattan distance - the third casedistance - the third case

Page 52: Chapter 5 Real Time Heuristic Search

The states of these searches are only distinguishable by

the positions of the two tiles and the blank, and hence there

are = O(n3) different states , where n is number of tiles.

Since there are pairs of tiles, there are = O(n2) such

searches to perform, for an overall time complexity of O(n5).

The size of the resulting table is O(n4), one entry for each

pair of tiles in each combination of positions.

2 2Enhancing the Manhattan Enhancing the Manhattan

distancedistance

n

2

n

2

n

3

Page 53: Chapter 5 Real Time Heuristic Search

The next question is how to automatically handle the interactions between these individual heuristics to compute an overall admissible heuristic estimate for a particular state.

If we represent a state as a graph with a node for each tile, and an edge between each pair. We need to select a set of edges in such a way that the sum of edges selected is maximized. This problem is called the maximum weighted matching problemthe maximum weighted matching problem, and can be solved in O(n3) time, where n is the number of nodes.]

Applying the HeuristicsApplying the Heuristics

Page 54: Chapter 5 Real Time Heuristic Search

Of course when using as heuristic the solution with the pairs, it’s not an optimal solution.

For example for the case:

Therefore in order to get the full power of these heuristic, we need to extend the idea of pairs of tiles to include triples etc’.

13 2 1X X

Higher-Order HeuristicsHigher-Order Heuristics

Page 55: Chapter 5 Real Time Heuristic Search

Pattern DatabasesPattern Databases

In the tile puzzles seen earlier each legal move, moves only one tile and therefore affects only one subgoal.

This unable us to add heuristic estimates for the individual tiles.

This isn’t the same for all problems. For example in the Rubik’s Cube each legal twist

moves a large fraction of the individual cubes.

Page 56: Chapter 5 Real Time Heuristic Search

The simple heuristic is 3-dimensional Manhattan distance, where for every cubie we compute the minimum number of moves required to correctly position and orient it, and sum these values over all cubies.

Here we have to divide the sum to 8 since every move, moves 8 cubies.

A better heuristic is as before, but only to compute the sum of moves for the edge and corner cubies (In contrast to the previous heuristic, here we calculate the number of moves needed for the edge and corner cubies separately).

For the edge cubies we will divide the value by 4, and for the corner cubies we will also divide the value by 4.

Pattern DatabasesPattern Databases

Page 57: Chapter 5 Real Time Heuristic Search

We can compute the heuristic function by a table lookup, which is sometimes more efficient since it will save time during execution of the program.

The use in such tables is called pattern databasespattern databases. A pattern database stores the number of moves to

solve different patterns of subsets of the puzzle elements.

Pattern DatabasesPattern Databases

Page 58: Chapter 5 Real Time Heuristic Search

For example, the Manhattan distance function is usually computed with the aid of a small table that contains the Manhattan distance for each cubie from all possible positions and orientations.

The idea can be developed much further since for the 8 corner cubies - each cubie can be carried in 3 different orientions, but the last cubie is determined by the other 7.

This results in 8!*3^7=88,179,840 differnet states.

Pattern DatabasesPattern Databases

Page 59: Chapter 5 Real Time Heuristic Search

We can use a breadth first search and record in a table the number of moves required to solve each combination of corner cubies (This table requires 42 megabytes).

During an IDA* search as each state is generated, a unique index into the heuristic table is computed, followed by a reference to the table. The stored value is the number of moves needed to solve just the corner cubies, and thus a lower bound on the number of moves needed to solve the entire puzzle.

Pattern DatabasesPattern Databases

Page 60: Chapter 5 Real Time Heuristic Search

We can improve the heuristic by considering the 12 edge cubies as well. The edge cubies can be in one of 12! permutations, and each can be in one of two different orientations, but the last orientation of the last cubie is determined by the other 11. However this requires too much memory.

Therefore we will compute and store pattern databases for subsets of the edge cubies .

We can compute the possible combinations for 6 cubies (for 7 cubies it will take too much memory). The number of possible combinations for 6 of the 12 edge cubies is *2 = 42,557,920.

Similarly, we can compute the corresponding heuristic table for the remaining 6 edge cubies.

6126

Pattern DatabasesPattern Databases

12

6

Page 61: Chapter 5 Real Time Heuristic Search

The heuristic used for the experiments is the maximum of all 3 of these values: all 8 corner cubies, and 2 groups of 6 edge cubies each.

The total amount of memory for all the 3 tables is 82 megabytes.

The total time to generate all 3 heuristic tables was about an hour.

Even though , the result was a small increase compared to the number of moves for the corner cubies only, it results in a significant performance improvement.

Given more memory we could compute and store even larger pattern databases.

Pattern DatabasesPattern Databases

Page 62: Chapter 5 Real Time Heuristic Search
Page 63: Chapter 5 Real Time Heuristic Search

Computer Chess Computer Chess A natural domain for studying AI A natural domain for studying AI

The game is well structured. Perfect information game. Early programmers and AI researchers were

often amateur chess players as well.

Page 64: Chapter 5 Real Time Heuristic Search

Brief History of Computer ChessBrief History of Computer Chess

Maelzel’s Chess MachineMaelzel’s Chess Machine

1769 Chess automaton by Baron Wolfgang von Kempelen of Austria

Appeared to automatically move the pieces on a board on top of the machine and played excellent chess.

Puzzle of the machine playing solved in 1836 by Edgar Allen Poe.

Page 65: Chapter 5 Real Time Heuristic Search

Brief History of Computer ChessBrief History of Computer ChessMaelzel’s Chess MachineMaelzel’s Chess Machine

Page 66: Chapter 5 Real Time Heuristic Search

Early 1950’sEarly 1950’s - First serious paper on computer chess was written by Claude Shannon. Described minimax search with a heuristic static evaluation function and anticipated the need for more selective search algorithms.

19561956 - Invention of alpha-beta pruning by John McCarthy. Used in early programs such as Samuel’s checkers player and Newell, Shaw and Simon’s chess program.

Brief History of Computer ChessBrief History of Computer Chess

Page 67: Chapter 5 Real Time Heuristic Search

19821982 - Development of Belle by Condon and Thomson. Belle - first machine whose hardware was specifically designed to play chess, in order to achieve speed and search depth.

19971997 - Deep Blue machine was the first machine to defeat the human world champion, Garry Kasparov, in a six-game match.

Brief History of Computer ChessBrief History of Computer Chess

Page 68: Chapter 5 Real Time Heuristic Search

CheckersCheckers

19521952 - Samuel developed a checkers program that learned its own evaluation through self play.

19921992 - Chinook (J. Schaeffer) wins the U.S Open. At the world championship, Marion Tinsley beat Chinook.

Page 69: Chapter 5 Real Time Heuristic Search

OthelloOthello

Othello programs better than the best humans. Large number of pieces change hands in each

move. Best Othello program today is Logistello (Michael

Buro).

Page 70: Chapter 5 Real Time Heuristic Search

BackgammonBackgammon

Unlike the above games backgammon includes a roll of the dice, introducing a random element.

Best backgammon program TD -gammon(Gerry Tesauro). Comparable to best human players today.

Learns an evaluation function using temporal-difference.

Page 71: Chapter 5 Real Time Heuristic Search

Card gamesCard games In addition to a random element there is hidden

information introduced. Best bridge GIB (M.Ginsberg) Bridge games are not competitive with the best

human players. Poker programs are worse relative to their human

counterparts. Poker involves a strong psychological element

when played by people.

Page 72: Chapter 5 Real Time Heuristic Search

Other games - SummaryOther games - Summary

The greater the branching factor the worse the performance.

Go - branching factor 361 very poor performance. Checkers - branching factor 4 - very good performance.

Backgammon - exception. Large branching factor still gets good results.

Page 73: Chapter 5 Real Time Heuristic Search

Brute-Force SearchBrute-Force Search

We begin considering a purely brute-force approach to game playing.

Clearly, this will only be feasible for small games, but provides a basis for further discussions.

Example - 5-stone NimExample - 5-stone Nim

played with 2 players and pile of stones. Each player removes a stone from the pile. player who removes the last stone wins the game.

Page 74: Chapter 5 Real Time Heuristic Search

Example - Game Tree for 5-Stone NimExample - Game Tree for 5-Stone Nim

5

4 3

3 2 2 1

2 1 1 0 1 0 0

1 0 0 0 0

0

OR nodes

AND nodes

x

x

Page 75: Chapter 5 Real Time Heuristic Search

MinimaxMinimax

Minimax theoremMinimax theorem - Every two-person zero-sum game is a forced win for one player, or a forced draw for either player, in principle these optimal minimax strategies can be computed.

Performing this algorithm on tic-tac-toe results in the root being labeled a draw.

Page 76: Chapter 5 Real Time Heuristic Search

Heuristic Evaluation FunctionsHeuristic Evaluation Functions

ProblemProblem: How to evaluate positions, where brute force is out of the question?

SolutionSolution: Use a heuristic static evaluationheuristic static evaluation functionfunction to estimate the merit of a position when the final outcome has not yet been determined.

Page 77: Chapter 5 Real Time Heuristic Search

Example of heuristic FunctionExample of heuristic Function

ChessChess : Number of pieces on board of each type multiplied

by relative value summed up for each color. By subtracting the weighted material of the black player from the weighted material of the white player we receive the relative strength of the position for each player.

Page 78: Chapter 5 Real Time Heuristic Search

A heuristic static evaluation function for a two player game is a function from a state to a number.

The goal of a two player game is to reach a winning state, but the number of moves required to get there is unimportant.

Other features must be taken into account to get to an overall evaluation function.

Heuristic Evaluation FunctionsHeuristic Evaluation Functions

Page 79: Chapter 5 Real Time Heuristic Search

Given a heuristic static evaluation function, it is straightforward to write a program to play a game.

From any given position, we simply generate all the legal moves, apply our static evaluator to the position resulting from each move, and then move to the position with the largest or smallest evaluation, depending if we are MIN/MAX

Heuristic Evaluation FunctionsHeuristic Evaluation Functions

Page 80: Chapter 5 Real Time Heuristic Search

Example - tic-tac-toeExample - tic-tac-toeBehavior of Evaluation FunctionBehavior of Evaluation Function

Detect if game over.

If X is the Maximizer, the function should return if there are three X’s in a row and - if there are three O’s in a row.

Count of the number of different rows, columns, and diagonals occupied by O.

Page 81: Chapter 5 Real Time Heuristic Search

Example: First moves of tic-tac-toeExample: First moves of tic-tac-toe

X

XX

3-0 = 33-0 = 34-4=44-4=4 2-0 = 22-0 = 2

Page 82: Chapter 5 Real Time Heuristic Search

This algorithm is extremely efficient, requiring time that is only linear in the number of legal moves.

It’s drawback is that it only considers immediate consequences of each move (doesn’t look over the horizon).

Example - tic-tac-toeExample - tic-tac-toeBehavior of Evaluation FunctionBehavior of Evaluation Function

Page 83: Chapter 5 Real Time Heuristic Search

Minimax SearchMinimax Search

Where does X go?

1 0 1 0 -1 -1 0 -1 0 -2

X X X

4-3 = 14-3 = 1 4-2 = 24-2 = 2

Page 84: Chapter 5 Real Time Heuristic Search

Minimax searchMinimax search Search as deeply as possible given the computational

resources of the machine and the time constraints on the game.

Evaluate the nodes at the search frontier by the heuristic function.

Where MIN is to move, save the minimum of it’s children’s values. Where MAX is to move, save the maximum of it’s children’s values.

A move is made to a child of the root with the largest or smallest value, depending on whether MAX or MIN is moving.

Page 85: Chapter 5 Real Time Heuristic Search

Minimax searchMinimax search example Minmax Tree

4

4 141413121121019876235

1412218624

14284

24

MAX

MIN

Page 86: Chapter 5 Real Time Heuristic Search

Alpha-Beta PruningAlpha-Beta Pruning

By using alpha-beta pruning the minimax value of the root of a game tree can be determined without having to examine all the nodes.

Page 87: Chapter 5 Real Time Heuristic Search

Alpha-Beta Pruning ExampleAlpha-Beta Pruning Example

4

4 217635

<=2<=16<=34

<=2>=64

24

a

b

og

rl pkhfe

c

d j

i

q

n

m

MAX

MIN

Page 88: Chapter 5 Real Time Heuristic Search

Alpha-BetaAlpha-Beta

Deep pruning - Right half of tree in example.

Next slide code for alpha-beta pruning :

MAXIMINMAXIMIN - assumes that its argument node is a maximizing node.

MINIMAXMINIMAX - the same.

V(N)V(N) - Heuristic static evaluation of node N.

Page 89: Chapter 5 Real Time Heuristic Search

MAXIMIN ( node: N ,lowerbound : alpha ,upperbound: beta)

IF N is at the search depth, RETURN V(N)

FOR each child Ni of N

value = MINIMAX(Ni,alpha,beta)

IF value > alpha , alpha := value

IF alpha >= beta ,return alpha

RETURN alpha

MINIMAX ( node: N ,lowerbound : alpha ,upperbound: beta)

IF N is at the search depth, RETURN V(N)

FOR each child Ni of N

value = MAXIMIN(Ni,alpha,beta)

IF value < beta , beta := value

IF beta <= alpha, return alpha

RETURN beta

Page 90: Chapter 5 Real Time Heuristic Search

Performance of Alpha-BetaPerformance of Alpha-Beta

Efficiency depends on the order in which the nodes are encountered at the search frontier.

Optimal - b½ - if the largest child of a MAX node is generated first, and the smallest child of a MIN node is generated first.

Worst - b.

Average b¾ - random ordering.

Page 91: Chapter 5 Real Time Heuristic Search

Games with chance chance nodes: nodes where chance events

happen (rolling dice, flipping a coin, etc) Evaluate expected value by averaging outcome

probabilities: C is a chance node P(di) probability of rolling di (1,2, …, 12)

S(C,di) is the set of positions generated by applying all legal moves for roll di to C

Page 92: Chapter 5 Real Time Heuristic Search
Page 93: Chapter 5 Real Time Heuristic Search

Games with chance

Backgammon board

Page 94: Chapter 5 Real Time Heuristic Search

Search tree with probabilities MAX

MIN

2 4 7 4 6 0 5 -2

0.5 0.50.5 0.5

2 4 0 -2

3 -1

Page 95: Chapter 5 Real Time Heuristic Search

Search tree with probabilities

Page 96: Chapter 5 Real Time Heuristic Search

Additional EnhancementsAdditional Enhancements

A number of additional improvements have been developed to improve performance with limited computation.

We briefly discuss the most important of these below.

Page 97: Chapter 5 Real Time Heuristic Search

Node OrderingNode Ordering

By using node ordering we can get close to b½ .

Node ordering instead of generating the tree left-to-right, we reorder the tree based on the static evaluations of the interior nodes.

To save space only the immediate children are reordered after the parent is fully expanded.

Page 98: Chapter 5 Real Time Heuristic Search

Iterative DeepeningIterative Deepening

Another idea is to use iterative deepening. In two player games using time, when time runs out, the move recommended by the last completed iteration is made.

Can be combined with node ordering to improve pruning efficiency. Instead of using the heuristic value we can use the value from pervious iteration.

Page 99: Chapter 5 Real Time Heuristic Search

QuiescenceQuiescence

Quiescence search is to make a secondary search in the case of a position whose values are unstable.

This way obtains a stable evaluation.

Page 100: Chapter 5 Real Time Heuristic Search

Transposition TablesTransposition Tables

For efficiency, it is important to detect when a state has already been searched.

In order to detect a searched state, previously generated game states, with their minimax values are saved into a transposition tabletransposition table.

Page 101: Chapter 5 Real Time Heuristic Search

Opening BookOpening Book

Most board games start with the same initial state.

A table of good initial moves is used, based on human expertise, known as an opening bookopening book.

Page 102: Chapter 5 Real Time Heuristic Search

Endgame DatabasesEndgame Databases

A database of endgame moves, with minimax values, is used.

In checkers, endgame for less than eight or fewer pieces on board.

A technique for calculating endgame databases, retrograde analysis.

Page 103: Chapter 5 Real Time Heuristic Search

Special Purpose HardwareSpecial Purpose Hardware

The faster the machine ,the deeper the search in the time available and the better it plays.

The best machines today are based on special-purpose hardware designed and built only to play chess.

Page 104: Chapter 5 Real Time Heuristic Search

Selective SearchSelective Search

The fundamental reason that humans are competitive with computers is that they are very selectiveselective in their choice of positions to examine, unlike programs which do full-widthfull-width fixed depthfixed depth searches.

Selective search: to search only on a “interesting” domain.

ExampleExample - Best first minimax.

Page 105: Chapter 5 Real Time Heuristic Search

Best First MinimaxBest First Minimax

Given a partially expanded minimax tree, the backed up minimax value of the root is determined by one of the leaf nodes, as is the value of every node on the path from the root to that leaf.

This path is known as principal variationprincipal variation, and the leaf is known as principal leafprincipal leaf.

In general, the best-first minimax will generate an unbalanced tree, and make different move decisions than full-width-fixed-depth alpha-beta.

Page 106: Chapter 5 Real Time Heuristic Search

Best First minimax search- Best First minimax search- ExampleExample

6

64

Principal leaf -

expand it

Page 107: Chapter 5 Real Time Heuristic Search

Best First minimax search- Best First minimax search- ExampleExample

4

24

5 2Principal leaf -

expand it

Page 108: Chapter 5 Real Time Heuristic Search

Best First minimax search- Best First minimax search- ExampleExample

4

21

5 28 1

Principal leaf -

expand it

Page 109: Chapter 5 Real Time Heuristic Search

Best First minimax search- Best First minimax search- ExampleExample

4

21

5 78 1

73

Page 110: Chapter 5 Real Time Heuristic Search

Best First searchBest First search

Full width search is a good insurance against missing a move (and making a mistake).

Most game programs that use selective searches use a combined algorithm that starts with a full-width search to a nominal length, and then searches more selectively below that depth.