1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf.
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf.
1
Evolving Hyper-Heuristics using Genetic Programming
Supervisor: Moshe Sipper
Achiya Elyasaf
22
Overview
Introduction• Searching Games State-Graphs
• Uninformed Search• Heuristics• Informed Search
Evolving Heuristics Previous Work
• Rush Hour• FreeCell
33
Representing Games as State-Graphs
Every puzzle/game can be represented as a state graph:• In puzzles, board games etc., every piece
move can be counted as a different state• In computer war games etc. – the place of the
player / the enemy, all the parameters (health, shield…) define a state
44
Rush-Hour as a state-graph
55
Searching Games State-GraphsUninformed Search
BFS – Exponential in the search depth DFS – Linear in the length of the current search
path. BUT:• We might “never” track down the right path.• Usually games contain cycles
Iterative Deepening: Combination of BFS & DFS• Each iteration DFS with a depth limit is performed.• Limit grows from one iteration to another
• Worst case - traverse the entire graph
66
Searching Games State-GraphsUninformed Search
Most of the game domains are PSPACE-Complete!
Worst case - traverse the entire graph We need an informed-search!
77
Searching Games State-GraphsHeuristics
h:states -> Real. • For every state s, h(s) is an estimation of the
minimal distance/cost from s to a solution• h is perfect: an informed search that tries states
with highest h-score first – will simply stroll to solution
• For hard problems, finding h is hard• Bad heuristic means the search might never
track down the solution
We need a good heuristic function to guide informed search
88
Searching Games State-Graphs Informed Search
Best-First search: Like DFS but select nodes with higher heuristic value first• Not necessarily optimal• Might enter cycles (local extremum)
A*: • Holds closed and sorted (by h-value) open lists.
Best node of all open nodes is selected• Maintenance and size of open and closed is not
admissible
99
Searching Games State-Graphs Informed Search (Cont.)
IDA*: Iterative-Deepening with A*• The expanded nodes are pushed to the DFS stack
by descending heuristic values• Let g(si) be the min depth of state si: Only nodes
with f(s)=g(s)+h(s)<depth-limit are visited
Near optimal solution (depends on path-limit) The heuristic need to be admissible
1313
Overview
Introduction• Searching Games State-Graphs
• Uninformed Search• Heuristics• Informed Search
Evolving Heuristics Previous Work
• Rush Hour• FreeCell
14
For H1, … ,Hn – building blocks (not necessarily
admissible or in the same range),How should we choose the fittest heuristic?• Minimum? Maximum? Linear combination?
GA/GP may be used for:• Building new heuristics from existing building blocks
• Finding weights for each heuristic (for applying linear combination)
• Finding conditions for applying each heuristic• H should probably fit stage of search
• E.g., “goal” heuristics when assuming we’re close
Evolving Heuristics
15
Evolving Heuristics: GA
W1=0.3 W2=0.01 W3=0.2 … Wn=0. 1
Genotype –
Phenotype –
16
Evolving Heuristics: GP
If
And
≤
H1 0.4
≥
H2 0.7
+
H2 *
H1 0.1
*
H5 /
H1 0.1
Condition True
False
17
Evolving Heuristics: Policies
Condition Result
Condition 1 Heuristics Weights 1
Condition 2 Heuristics Weights 2
Condition n Heuristics Weights n
Default Heuristics Weights
.
.
.
.
.
.
18
Evolving Heuristics: Fitness Function
foundsolution no if
0,
reduction node without foundsolution if
0}, /1000,cessNodes)FractionEx-max{(1
reduction node with foundsolution if
ratio,reduction node-search
if
1919
Overview
Introduction• Searching Games State-Graphs
• Uninformed Search• Heuristics• Informed Search
Evolving Heuristics Previous Work
• Rush Hour• FreeCell
20
Rush Hour
GP-Rush [Hauptman et al, 2009]
Bronze Humie award
21
Domain-Specific Heuristics
Hand-Crafted Heuristics / Guides:
• Blocker estimation – lower bound (admissible)
• Goal distance – Manhattan distance
• Hybrid blockers distance – combine above two
• Is Move To Secluded – did the car enter a secluded
area?
• Is Releasing Move
26
Policy “Ingredients”
Functions & Terminals:Conditions Results
Terminals IsMoveToSecluded, isReleasingMove, g, PhaseByDistance, PhaseByBlockers, NumberOfSyblings, DifficultyLevel,
BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, … , 0.9 , 1
BlockersLowerBound, GoalDistance, Hybrid,
0, 0.1, … , 0.9 , 1
Sets If, AND , OR , ≤ , ≥ + , *
27
Coevolving (Hard) 8x8 Boards
RED
H
F G
MP
I
S
K
K
K
K
RED
H
F G
MP
I
S
K
K
K
K
RED
H
F G
MP
I
S
K
K
K
K
28
Results
Average reduction of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search:
Heuristic:Problem
ID H1 H2 H3 Hc Policy
6x6 100% 28% 6% -2% 30% 60%
8x8 100% 31% 25% 30% 50% 90%
29
Results (cont’d)
Time (in seconds) required to solve problems JAM01 . . . JAM40:
30
FreeCell
FreeCell remained relatively obscure until Windows 95
There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, whichhas been proven to be unsolvable
Evolving hyper heuristic-based solvers for Rush-Hour and FreeCell [Hauptman et al, SOCS 2010]
GA-FreeCell: Evolving Solvers for the Game of FreeCell [Elyasaf et al, GECCO 2011]
31
FreeCell (cont’d)
As opposed to Rush Hour, blind search failed miserably
The best published solver to date solves 96% of Microsoft 32K
Reasons:• High branching factor• Hard to generate a good heuristic
32
Learning Methods: Random Deals
Which deals should we use for training?First method tested - random deals
• This is what we did in Rush Hour• Here it yielded poor results• Very hard domain
33
Learning Methods: Gradual Difficulty
Second method tested - gradual difficulty• Sort the problems by difficulty• Each generation test solvers against 5 deals
from the current difficulty level + 1 random deal
34
Learning Methods: Hillis-Style Coevolution
Third method tested - Hillis-style coevolution using “Hall-of-Fame”:• A deal population is composed of 40 deals
(=40 individuals) + 10 deals that represent a hall-of-fame
• Each hyper-heuristic is tested against 4 deal individuals and 2 hall-of-fame deals
Evolved hyper-heuristics failed to solve almost all Microsoft 32K! Why?
35
Learning Methods: Rosin-style Coevolution
Fourth method tested - Rosin-style coevolution:• Each deal individual consists of 6 deals• Mutation and crossover:
11897 3042 23845 7364
28371 18923 9834 12
17987 5984
30011 13498
p1
p2
11897 3042 23845 7364
28371 18923 9834 12
17987 5984
30011 13498
p1 11897 3042 23845 7364 17987 59842015
36
Results
Learning Method Run
NodeReduction
Time Reduction
Length Reduction Solved
- HSD 100% 100% 100% 96%
Gradual Difficulty
GA-1 23% 31% 1% 71%
GA-2 27% 30% -3% 70%
GP - - - -
Policy 28% 36% 6% 36%
Rosin-style coevolution
GA 87% 93% 41% 98%
Policy 89% 90% 40% 99%
45
Thank youfor listening
any questions?