1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf.

1

Evolving Hyper-Heuristics using Genetic Programming

Supervisor: Moshe Sipper

Achiya Elyasaf

22

Overview

Introduction• Searching Games State-Graphs

• Uninformed Search• Heuristics• Informed Search

Evolving Heuristics Previous Work

• Rush Hour• FreeCell

33

Representing Games as State-Graphs

Every puzzle/game can be represented as a state graph:• In puzzles, board games etc., every piece

move can be counted as a different state• In computer war games etc. – the place of the

player / the enemy, all the parameters (health, shield…) define a state

44

Rush-Hour as a state-graph

55

Searching Games State-GraphsUninformed Search

BFS – Exponential in the search depth DFS – Linear in the length of the current search

path. BUT:• We might “never” track down the right path.• Usually games contain cycles

Iterative Deepening: Combination of BFS & DFS• Each iteration DFS with a depth limit is performed.• Limit grows from one iteration to another

• Worst case - traverse the entire graph

66

Searching Games State-GraphsUninformed Search

Most of the game domains are PSPACE-Complete!

Worst case - traverse the entire graph We need an informed-search!

77

Searching Games State-GraphsHeuristics

h:states -> Real. • For every state s, h(s) is an estimation of the

minimal distance/cost from s to a solution• h is perfect: an informed search that tries states

with highest h-score first – will simply stroll to solution

• For hard problems, finding h is hard• Bad heuristic means the search might never

track down the solution

We need a good heuristic function to guide informed search

88

Searching Games State-Graphs Informed Search

Best-First search: Like DFS but select nodes with higher heuristic value first• Not necessarily optimal• Might enter cycles (local extremum)

A*: • Holds closed and sorted (by h-value) open lists.

Best node of all open nodes is selected• Maintenance and size of open and closed is not

admissible

99

Searching Games State-Graphs Informed Search (Cont.)

IDA*: Iterative-Deepening with A*• The expanded nodes are pushed to the DFS stack

by descending heuristic values• Let g(si) be the min depth of state si: Only nodes

with f(s)=g(s)+h(s)<depth-limit are visited

Near optimal solution (depends on path-limit) The heuristic need to be admissible

1313

Overview





14

For H1, … ,Hn – building blocks (not necessarily

admissible or in the same range),How should we choose the fittest heuristic?• Minimum? Maximum? Linear combination?

GA/GP may be used for:• Building new heuristics from existing building blocks

• Finding weights for each heuristic (for applying linear combination)

• Finding conditions for applying each heuristic• H should probably fit stage of search

• E.g., “goal” heuristics when assuming we’re close

Evolving Heuristics

15

Evolving Heuristics: GA

W1=0.3 W2=0.01 W3=0.2 … Wn=0. 1

Genotype –

Phenotype –

16

Evolving Heuristics: GP

If

And

≤

H1 0.4

≥

H2 0.7

+

H2 *

H1 0.1

*

H5 /

H1 0.1

Condition True

False

17

Evolving Heuristics: Policies

Condition Result

Condition 1 Heuristics Weights 1

Condition 2 Heuristics Weights 2

Condition n Heuristics Weights n

Default Heuristics Weights

.

.

.

.

.

.

18

Evolving Heuristics: Fitness Function

foundsolution no if

0,

reduction node without foundsolution if

0}, /1000,cessNodes)FractionEx-max{(1

reduction node with foundsolution if

ratio,reduction node-search

if

1919

Overview





20

Rush Hour

GP-Rush [Hauptman et al, 2009]

Bronze Humie award

21

Domain-Specific Heuristics

Hand-Crafted Heuristics / Guides:

• Blocker estimation – lower bound (admissible)

• Goal distance – Manhattan distance

• Hybrid blockers distance – combine above two

• Is Move To Secluded – did the car enter a secluded

area?

• Is Releasing Move

26

Policy “Ingredients”

Functions & Terminals:Conditions Results

Terminals IsMoveToSecluded, isReleasingMove, g, PhaseByDistance, PhaseByBlockers, NumberOfSyblings, DifficultyLevel,

BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, … , 0.9 , 1

BlockersLowerBound, GoalDistance, Hybrid,

0, 0.1, … , 0.9 , 1

Sets If, AND , OR , ≤ , ≥ + , *

27

Coevolving (Hard) 8x8 Boards

RED

H

F G

MP

I

S

K

K

K

K

RED

H

F G

MP

I

S

K

K

K

K

RED

H

F G

MP

I

S

K

K

K

K

28

Results

Average reduction of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search:

Heuristic:Problem

ID H1 H2 H3 Hc Policy

6x6 100% 28% 6% -2% 30% 60%

8x8 100% 31% 25% 30% 50% 90%

29

Results (cont’d)

Time (in seconds) required to solve problems JAM01 . . . JAM40:

30

FreeCell

FreeCell remained relatively obscure until Windows 95

There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, whichhas been proven to be unsolvable

Evolving hyper heuristic-based solvers for Rush-Hour and FreeCell [Hauptman et al, SOCS 2010]

GA-FreeCell: Evolving Solvers for the Game of FreeCell [Elyasaf et al, GECCO 2011]

31

FreeCell (cont’d)

As opposed to Rush Hour, blind search failed miserably

The best published solver to date solves 96% of Microsoft 32K

Reasons:• High branching factor• Hard to generate a good heuristic

32

Learning Methods: Random Deals

Which deals should we use for training?First method tested - random deals

• This is what we did in Rush Hour• Here it yielded poor results• Very hard domain

33

Learning Methods: Gradual Difficulty

Second method tested - gradual difficulty• Sort the problems by difficulty• Each generation test solvers against 5 deals

from the current difficulty level + 1 random deal

34

Learning Methods: Hillis-Style Coevolution

Third method tested - Hillis-style coevolution using “Hall-of-Fame”:• A deal population is composed of 40 deals

(=40 individuals) + 10 deals that represent a hall-of-fame

• Each hyper-heuristic is tested against 4 deal individuals and 2 hall-of-fame deals

Evolved hyper-heuristics failed to solve almost all Microsoft 32K! Why?

35

Learning Methods: Rosin-style Coevolution

Fourth method tested - Rosin-style coevolution:• Each deal individual consists of 6 deals• Mutation and crossover:

11897 3042 23845 7364

28371 18923 9834 12

17987 5984

30011 13498

p1

p2

11897 3042 23845 7364

28371 18923 9834 12

17987 5984

30011 13498

p1 11897 3042 23845 7364 17987 59842015

36

Results

Learning Method Run

NodeReduction

Time Reduction

Length Reduction Solved

- HSD 100% 100% 100% 96%

Gradual Difficulty

GA-1 23% 31% 1% 71%

GA-2 27% 30% -3% 70%

GP - - - -

Policy 28% 36% 6% 36%

Rosin-style coevolution

GA 87% 93% 41% 98%

Policy 89% 90% 40% 99%

45

Thank youfor listening

any questions?

1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf.

Documents

Transcript of 1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf.