Download - Problem Solving by Search by Jin Hyung Kim Computer Science Department KAIST.

Problem Solving by SearchProblem Solving by Search

by Jin Hyung Kimby Jin Hyung Kim

Computer Science DepartmentComputer Science DepartmentKAISTKAIST

KAIST CS570 lecture note

Example of Example of RepresentationRepresentation

Euler Path


Graph TheoryGraph Theory Graph consists of

A set of nodes : may be infinite A set of arcs(links)

Directed graph, underlying graph, tree

Notations node, start node(root), leaf (tip node),

root, path, ancestor, descendant, child(children, son), parent(father), cycle, DAG, connected, locally finite graph, node expansion


State Space State Space RepresentationRepresentation

Basic Components set of states {s} set of operators { o : s -> s } control strategy { c: sn -> o }

State space graph State -> node operator -> arc

Four tuple representation [N, A, S, GD], solution path


Examples of SSRExamples of SSR TIC_TAC_TOE n2-1 Puzzle Traveling salesperson problem

(TSP)


Search StrategiesSearch Strategies A strategy is defined by picking the order of node

expansion Search Directions

Forward searching (from start to goal) Backward searching (from goal to start) Bidirectional

Irrevocable vs. revocable Irrevocable strategy : Hill-Climbing

Most popular in Human problem solving No shift of attention to suspended alternatives End up with local-maxima Commutative assumption

Applying an inappropriate operators may delay, but never prevent the eventual discovery of solutions.

Revocable strategy : Tentative control An alternative chosen, others reserve


Evaluation of Search Evaluation of Search StrategiesStrategies

Completeness Does it always find a solution if one exists ?

Time Complexity Number of nodes generated/expanded

Space complexity Maximum number of nodes in memory

Optimality Does it always find a least-cost solution ?

Algorithm is admissible if it terminate with optimal solution

Time and Space complexity measured by b – maximum branching factors of the search tree d – depth of least-cost solution m - maximum depth of the state space


Implementing Search Implementing Search StrategiesStrategies

Uninformed search Search does not depend on the nature of

solution Systematic Search Method

Breadth-First Search Depth-First Search (backtracking)

Depth-limited Search Uniform Cost Search Iterative deepening Search

Informed or Heuristic Search Best-first Search

Greedy search (h only) A* search (g + h) Iterative A* search

put s in OPEN

X-First Search AlgorithmX-First Search Algorithm

start

OPEN empty ? Fail

Select & Remove the a node of OPEN(call it n)

Expand n. Put its successors to OPEN

any succesor = goal ?

Success

yes

yes


Comparison of BFS and Comparison of BFS and DFSDFS

Selection strategy from OPEN BFS by FIFO – Queue DFS by LIFO - stack

BFS always terminate if goal existcf. DFS on locally finite infinite tree

Guarantee shortest path to goal - BFS Space requirement

BFS - Exponential DFS - Linear,

keep children of a single node

Which is better ? BFS or DFS ?


Depth-limited SearchDepth-limited Search = depth-first search with depth

limit Nodes at depth have no successors


Uniform Cost SearchUniform Cost Search

A Generalized version of Breadth-First Search C(ni, nj) = cost of going from ni to nj

g(n) =(tentative minimal) cost of a path from s to n.

Guarantee to find the minimum cost path Dijkstra Algorithm

put s in OPEN, set g(s) = 0

Uniform Cost Search AlgorithmUniform Cost Search Algorithm

start

OPEN empty ? Fail

Remove the node of OPEN whose g(.) value is smallestand put it in CLOSE (call it n)

Expand n. calculate g(.) of successorPut successors to OPENpointers back to n

n = goal ?

Success

yes

yes


Iterative Deepening Iterative Deepening SearchSearch

Compromise of BFS and DFS Iterative Deepening Search = depth first

search with depth-limit increment

Save on Storage, guarantee shortest path Additional node expansion is negligible

Can you apply this idea to uniform cost search ?

proc Iterative_Deeping_Search(Root)begin Success := 0; for (depth_bound := 1; depth_bound++; Success == 1)

{ depth_first_search(Root, depth_bound);if goal found, Success := 1;

}end


Iterative Deeping (l=0)Iterative Deeping (l=0)


Properties of IDSProperties of IDS Complete ?? Time Complexity

db1 + (d-1)b2 + … + bd = O(bd) Bottom level generated once, top-level generated dth

Space Complexity : O(bd) Optimal ?? Yes, if step cost = 1

Can be modified to explore uniform cost tree ? Numerical comparison of speed ( # of node expande

d) b=10 and d=5, solution at far right N(IDS) = 50 + 400 + 3,000 + 20,000 + 100,000 = 123,450 N(BFS) = 10 + 100 + 1000 + 10000 + 100000 + 999,990

= 1,111,100


Repeated StatesRepeated States Failure to detect repeated states can

turn a linear problem into an exponential one !

Search on Graph


Summary of AlgorithmsSummary of Algorithms

Informed SearchInformed Search


8-Puzzel Heuristics8-Puzzel Heuristics Which is the best move among a,

b, c ?

23

45 67

8 111

11

22

22

3

3 3

344

44

5

55

566

66

77

77

8

8

8 8

a b c

• # of Misplaced tiles

• Sum of Manhattan distance

Start node Goal node


Road Map ProblemRoad Map Problem To go Bucharest, which city do you choose to visit nex

t from Arad? Zerind, Siblu, Timisoara? Your rationale ?


Best-First SearchBest-First Search Idea : use of evaluation function

for each node Estimate of “desirability” We use notation f( )

Special cases depending on f() Greedy search Uniform cost search A* search algorithm

Best First Search Algorithm( for tree search)Best First Search Algorithm( for tree search)

start

OPEN empty ? Fail

Remove the node of OPEN whose f(.) value is smallestand put it in CLOSE (call it n)

Expand n. calculate f(.) of successorPut successors to OPENpointers back to n

n = goal ?

Success

yes

yes

put s in OPEN, compute f(s)


Generic form of Best-First Generic form of Best-First SearchSearch

Best-First Algorithm with f(n) = g(n) + h(n)where

g(n) : cost of n from start to node nh(n) : heuristic estimate of the cost from n to a goal

h(G) = 0, h(n) >= 0 Also called A algorithm

Uniform Cost algorithm When h(n) = 0 always

Greedy Search Algorithm When g(n) = 0 always Expands the node that appears to be close to goal Complete ??

Local optimal state Greedy may not lead to optimality

Algorithm A* if h(n) <= h*(n)


Examples of Admissible Examples of Admissible heuristicsheuristics

h(n) <= h*(n) for all n Air-distance never overestimate the actual road distance 8-tile problem

Number of misplaced tiles Sum of Manhattan distance

TSP Length of spanning tree

112 3 3

4455

6

6 778 8

Current node Goal node

Number of misplaced tiles : 4Sum of Manhattan distance : 1 + 2 + 0 + 0 + 0 + 2 + 0 + 1

2


Algorithm A*Algorithm A* f(n) = g(n) + h(n), when h

(n) <= h*(n) for all n Find minimum f(n) to exp

and next Role of h(n)

Direct to goal Role of g(n)

Guard from roaming due to not perfect heuristics

g(n)

h(n)

CurrentNode n

Start node

Goal


A* Search ExampleA* Search Example

Romania with cost in kmRomania with cost in km


Nodes not expanded(node pruning)


Algorithm A* is admissibleAlgorithm A* is admissible Suppose some suboptimal goal G2 has been generate

d and is in the OPEN. Let n be an unexpended node on a shortest path to an oprimal goal G1

f(G2) = g(G2) since h(G2) = 0 > g(G1) sinee G2 is suboptimal >= f(n) since h is admissibleSince f(G2) > f(n), A* will bever select G2 for expansion

start

G2

G1

n


A* expands nodes in the A* expands nodes in the order of increasing order of increasing ff value value Gradually adds “f-contours” of nodes (cf. breadth-first

adds layers) Contour i has all nodes with f = fi, where fi < fi+1

f(n) = g(n) + h(n) <= C* will be expanded eventually

A* terminate even in locally finite graph : completeness


Monotonicity (consistency)Monotonicity (consistency) A heuristic function is monotone if

for all states ni and nj = suc(ni)h(ni) - h(nj) cost(ni,nj)and h(goal) = 0

Monotone heuristic is admissible


Uniform Cost Search’s Uniform Cost Search’s ff--contourscontours

Start

Goal


A*’s A*’s ff-contours -contours

Start

Goal


Greedy’s Greedy’s ff-contours -contours

Start

Goal


More Informedness (Dominate)More Informedness (Dominate) For two admissible heuristic h1 and h2, h2 is more inf

ormed than h1 if h1(n) h2(n) for all n

for 8-tile problem h1 : # of misplaced tile h2 : sum of Manhattan distance

Combining several admissible heuristics h(n) = max{ h1(n), …, hn(n)}

0 h*(n)h1(n) h2(n)


Semi-admissible heuristics Semi-admissible heuristics andand

Risky heuristicsRisky heuristics If { h(n) h*(n)(1 + , C(n) (1+) C*

(n) Small cost sacrifice save a lot in search computation Semi-admissible heuristics saves a lot in difficult problem

s In case when costs of solution paths are quite similar

-admissible Use of non-admissible heuristics with risk

Utilize heuristic functions which are admissible in the most of cases

Statistically obtained heuristics

n

PEARL, J., AND KIM, J. H. Studies in semi-admissible heuristics. IEEE Trans. PAMI-4, 4 (1982), 392-399


Dynamic Use of HeuristicsDynamic Use of Heuristics f(n) = g(n) + h(n) + [1-d(n)/N] h(n)

d(n) : depth of node n N : (expected) depth of goal At shallow level : depth first excursion At deep level : assumes admissibility

Modify Heuristics during search Utilize information obtained in the of search

process to modify heuristics to use in the later part of search


Inventing Admissible Heuristics :Inventing Admissible Heuristics : Relaxed ProblemsRelaxed Problems

An admissible heuristics is exact solution of relaxed problem

8-puzzle Tile can jump – number of misplaced tiles Tile can move adjacent cell even if that is occupied - Manhattan

distance heuristic Automatic heuristic generator ABSOLVER (Prieditis, 1993)

Traveling salesperson Problem Cost of Minimum spanning tree < Cost of TSP tour Minimum spanning tree can be computed O(n2)


Inventing Admissible Heuristics :Inventing Admissible Heuristics : SubProblemsSubProblems

Solution of subproblems

Take Max of heuristics of sub-problems in the pattern database 1/ 1000 of nodes are generated in 15 puzzle compared with M

anhattan heuristic Disjoint sub-problems

1/ 10,000 in 24 puzzle compared with Manhattan


Iterative Deeping A*Iterative Deeping A* Iterative Deeping version of A*

use threshold as depth bound To find solution under the threshold of f(.)

increase threshold as minimum of f(.) ofprevious cycle

Still admissible same order of node expansion Storage Efficient – practical

but suffers for the real-valued f(.) large number of iterations

put s in OPEN, compute f(s)

Iterative Deepening A* Search Algorithm ( for tree search)Iterative Deepening A* Search Algorithm ( for tree search)

start

OPEN empty ?

Remove the node of OPEN whose f(.) value is smallestand put it in CLOSE (call it n)

Expand n. calculate f(.) of successorif f(suc) < threshold then

Put successors to OPEN if pointers back to n

n = goal ?

Success

yes

yes

threshold = min( f(.) , threshold )

set threshold as h(s)


Memory-bounded heuristic Memory-bounded heuristic SearchSearch

Recursive best-first search A variation of Depth-first search Keep track of f-value of the best alternative path Unwind if f-value of all children exceed its best

alternative When unwind, store f-value of best child as its f-

value When needed, the parent regenerate its children

again. Memory-bounded A*

When OPEN is full, delete worst node from OPEN storing f-value to its parent.

The deleted node is regenerated when all other candidates look worse than the node.


Performance MeasurePerformance Measure Penetrance

how search algorithm focus on goal rather than wander off in irrelevant directions

P = L / TL : depth of goalT : total number of node expande

d Effective Branching Factor (B)

B + B2 + B3 + ..... + BL = T less dependent on L

Local Search and Local Search and OptimizationOptimization


Local Search is Local Search is irrevocableirrevocable

Local search = irrevocable search less memory required Reasonable solutions in large (continuous)

space problems Can be formulated as Searching for extr

eme value of Objective function find i = ARGMAX { Obj(pi) } where pi is parameter


Search for Optimal Search for Optimal ParameterParameter

Deterministic Methods Step-by-step procedure Hill-Climbing search, gradient search ex: error back propagation algorithm

Finding Optimal Weight matrix in Neural Network training

Stochastic Methods Iteratively Improve parameters

Pseudo-random change and retain it if it improves Metroplis algorithm Simulated Annealing algorithm Genetic Algorithm


Hill Climbing SearchHill Climbing Search1. Set n to be the initial node2. If obj(n) > max { obj(childi(n)) } then exit3. Set n to be the highest-value child of n4. Return to step 2

No previous state information No backtracking No jumping

Gradient Search Hill climbing with continuous, differentiable functions step width ? Slow in near optimal


State space State space landscapelandscape

Real World


Hill-climbing :DrawbacksHill-climbing :Drawbacks

Local maxima At Ridge

Stray in Plateau Slow in Plateau Determination of

proper Step size Cure

Random restart Good for Only

few local maxima

Global Maximum


Local Beam SearchLocal Beam Search

Keep track of best k states instead of 1 in hill-climbing

Full utilization of given memory

Variation: Stochastic beam search Select k successors randomly


Iterative Improvement Iterative Improvement AlgorithmAlgorithm

Basic Idea Start with initial setting

Generate a random solution Iteratively improve the quality

Good For hard, practical problems Because it keeps current state only and No

look-ahead beyond neighbors Advanced Implementation

Metropolis algorithm Simulated Annealing algorithm Genetic algorithm


Metropolis algorithmMetropolis algorithm A Monte Carlo method (statistical simulation with rando

m number) objective is to reach state minimizing energy function Instead always going downhill, try to go downhill ‘most of

the time’ Escape local minima by allowing some “bad” moves

1. Randomly generate a new state, Y, from state X2. If E(energy difference between Y and X) < 0

then move to Y (set Y to X) and goto 13. Else

3.1 select a random number, 3.2 if < exp(- E / T)

then move to Y (set Y to X) and goto 13.3 else goto 1


From Statistical From Statistical MechanicsMechanics

In thermal equilibrium, probability of state i

energy of state i absolute temperature Boltzman constant

In NN

define

T

Eexp

π

π

i

j

Tk

E

Z B

ii exp

1

T

E

Zi

i exp1

ij EEE


ProbabilityProbability of of State Transition State Transition byby E

p

E0

1

eE/T


Simulated AnnealingSimulated Annealing Simulates slow cooling of an

nealing process Applied for combinatorial opti

mization problem by S. Kirkpatric (‘83)

To overcome local minima problem

Escape local minima by allowing some “bad” moves but gradually decrease their size and frequency

Widely used in VLSI layout and airline scheduling, etc.

What is annealing?

Process of slowly cooling down a compound or a substance

Slow cooling let the substance flow around thermodynamic equilibrium

Molecules get optimum conformation


Simulated Annealing Simulated Annealing algorithmalgorithm

function Simulated-Annealing(problem, schedule) returns a solution stateinputs: problem, a problemlocal variables: current, a node

next, a nodeT, a “temperature” controlling the probability of downward ste

ps

current Make-Node(Initial-State[problem])for t1 to infinity do

T schedule[t]if T=0 then return currentnext a randomly selected successor of currentE Value[next] – Value[current]if E>0 then currentnextelse currentnext only with probability eE/T


Simulated Annealing Simulated Annealing parametersparameters

Temperature T Used to determine the probability High T : large changes Low T : small changes

Cooling Schedule Determines rate at which the temperature T

is lowered Lowers T slowly enough, the algorithm will

find a global optimum In the beginning, aggressive for

searching alternatives, become conservative when time goes by


Simulated AnnealingSimulated Annealing Cooling Cooling ScheduleSchedule

if Ti is reduced too fast, poor quality if Tt >= T(0) / log(1+t) - Geman

System will converge to minimun configuration Tt = k/1+t - Szu Tt = a T(t-1) where a is in between 0.8 and 0.99

T0

Tf t

T(t)


Tips for Simulated Tips for Simulated AnnealingAnnealing

To avoid of entrainment in local minima Annealing schedule : by trial and error

Choice of initial temperature How many iterations are performed at each

temperature How much the temperature is decremented at

each step as cooling proceeds

Difficulties Determination of parameters If cooling is too slow Too much time to get

solution If cooling is too rapid Solution may not be

the global optimum


Iterative algorithm Iterative algorithm comparisoncomparison

Simple Iterative Algorithm1. find a solution s2. make s’, a variation of s3. if s’ is better than s, keep s’ as s4. goto 2

Metropolis Algorithm 3’ : if (s’ is better than s) or ( within Prob), then ke

ep s’ as s With fixed T

Simulated Annealing T is reduced to 0 by schedule as time passes


Simulated AnnealingSimulated Annealing Local Local MaximaMaxima

higher probability of escaping local maxima

Little chance of escaping local maxima, but local maxima may be good enough in practical problems.