Local search and Optimisation Introduction: global versus local Study of the key local search...

Post on 17-Jan-2016

215 views 0 download

Tags:

Transcript of Local search and Optimisation Introduction: global versus local Study of the key local search...

Local search and OptimisationLocal search and Optimisation

Introduction: global versus localIntroduction: global versus local

Study of the key local search techniquesStudy of the key local search techniques

Concluding RemarksConcluding Remarks

Introduction:Introduction:

Global versus Local searchGlobal versus Local search

Global versus Local searchGlobal versus Local search Global search:Global search:

interest: find interest: find a patha path to a goal to a goal

properties:properties: search through search through partial pathspartial paths

in a systematic way (consider all paths – completeness)in a systematic way (consider all paths – completeness) opportunity to detect loops opportunity to detect loops

Global versus Local search (2)Global versus Local search (2)

4

Local search:Local search: interest: find interest: find a goal a goal

or or a statea state that maximizes/minimizes some that maximizes/minimizes some objective objective functionfunction. .

4-queens example:4-queens example:

interest in the solutionsinterest in the solutions not in the way we find themnot in the way we find them

or or a statea state that maximizes/minimizes some that maximizes/minimizes some objective objective functionfunction. .

Global versus Local search (3)Global versus Local search (3)

5

objective function: estimates quality of rosterobjective function: estimates quality of roster optimize the objective functionoptimize the objective function

rostering:rostering:

Local search:Local search: interest: find interest: find a goal a goal

or or a statea state that maximizes/minimizes some that maximizes/minimizes some objective objective functionfunction. .

6

Is the path relevant or not ?Is the path relevant or not ?

The 8-puzzle: The 8-puzzle: path relevantpath relevant

Chess: Chess: path relevantpath relevant

Water jugs: Water jugs: path relevantpath relevant

Traveling sales person: Traveling sales person: path relevant *path relevant *

Symbolic integrals: Symbolic integrals: could be bothcould be both

Blocks planning: Blocks planning: path relevantpath relevant

q-queens puzzle: q-queens puzzle: not relevantnot relevant

rostering: rostering: not relevantnot relevant

7

Local search! Local search! - the path is encoded in every state- the path is encoded in every state- just find a good/optimal state- just find a good/optimal state

Traveling sales person:Traveling sales person:

Representation is a potential Representation is a potential solution:solution:

(New York, Boston, Miami, SanFran, Dallas, New York)(New York, Boston, Miami, SanFran, Dallas, New York)

Representation is a partial Representation is a partial sequence:sequence:

Global search !Global search !

(New York, Boston)(New York, Boston)

8

General observations on Local General observations on Local Search:Search:

Applicable if path to solution is not importantApplicable if path to solution is not important but see comment on TSPbut see comment on TSP

Keeps only Keeps only 11 (or a fixed (or a fixed kk) state(s).) state(s). kk for local beam search and genetic algorithms for local beam search and genetic algorithms

Most often, does not systematically investigate all Most often, does not systematically investigate all possibilitiespossibilities and as a result, may be incomplete or suboptimaland as a result, may be incomplete or suboptimal

Does not avoid loopsDoes not avoid loops unless explicitly designed to (e.g. Tabu search) or unless explicitly designed to (e.g. Tabu search) or

included in the state representationincluded in the state representation

Is often for optimization of an objective functionIs often for optimization of an objective function

Local Search Algorithms:Local Search Algorithms:

Hill Climbing (3) (local Hill Climbing (3) (local version)version)

Simulated AnnealingSimulated Annealing

Local k-Beam SearchLocal k-Beam Search

Genetic AlgorithmsGenetic Algorithms

Tabu SearchTabu Search

Heuristics and MetaheuristicsHeuristics and Metaheuristics

Hill-Climbing (3)Hill-Climbing (3)or Greedy local searchor Greedy local search

The really “local-search” variant of The really “local-search” variant of

Hill ClimbingHill Climbing

11

Hill Climbing (3) algorithm:Hill Climbing (3) algorithm:

State:= S;State:= S;STOP := False;STOP := False;WHILEWHILE not STOPnot STOP DODO Neighbors := Neighbors := successors(State);successors(State); IFIF max(max(hh(Neighbors)) > (Neighbors)) > hh(State)(State)

State:= State:= maximal_h_neighbor;maximal_h_neighbor;

ElseElse STOP:= True;STOP:= True;

ReturnReturn StateState

Let h be the objective functionLet h be the objective function

or, minimization (see 8-queens)or, minimization (see 8-queens)

Hill Climbing 2,Hill Climbing 2,but without pathsbut without paths

The problems: The problems:

12

Foothills:Foothills:

LocalLocalmaximummaximum

PlateausPlateaus

RidgesRidges

More properties:More properties:

13

Termination ?Termination ?

If h is bounded (in relevant direction)If h is bounded (in relevant direction)and there is a minimal step in h-and there is a minimal step in h-function. function.

Completeness ?Completeness ? No !No !

Case study: 8-queensCase study: 8-queens

14

h = the number of pairs of queens h = the number of pairs of queens attacking each other in the stateattacking each other in the state

n1 n2 n3 n4 n5 n6 n7 n8n1 n2 n3 n4 n5 n6 n7 n8

h h = 17 = 17

Minimization !Minimization !

8-queens (cont.)8-queens (cont.)

15

Neighbors of (n1, n2,…., n8):Neighbors of (n1, n2,…., n8): obtained by changing obtained by changing only 1 only 1 nini

56 neighbors 56 neighbors a local minimum a local minimum h = 1h = 1

How well does it work?How well does it work?

For 8-queens:For 8-queens:

But how to improve the success rate ? But how to improve the success rate ?

Plateau’s: sideway movesPlateau’s: sideway moves

17

At plateau: allow to move to equal-h neighborAt plateau: allow to move to equal-h neighbor

Danger: Non-termination !Danger: Non-termination !Allow only a maximum number of consecutive Allow only a maximum number of consecutive sideway moves (say: 100)sideway moves (say: 100)

Result:Result:

Success rate:Success rate: 94% : success 6 % : local minimum94% : success 6 % : local minimum

Variants on HC (3)Variants on HC (3)

18

Move to a random neighbor with a better hMove to a random neighbor with a better h

Stochastic Hill Climbing:Stochastic Hill Climbing:

Gets more solutions / but is slowerGets more solutions / but is slower

Move to the first-found neighbor with a better Move to the first-found neighbor with a better hh

First-choice Hill Climbing:First-choice Hill Climbing:

Useful if there are VERY many neighborsUseful if there are VERY many neighbors

Garanteed completeness:Garanteed completeness:Random-Restart Hill ClimbingRandom-Restart Hill Climbing

19

IfIf HC terminates HC terminates withoutwithout producing a solution: producing a solution:ThenThen restart HC with a restart HC with a random newrandom new initial state initial state

If there are only finitely many states, andIf there are only finitely many states, andIf each HC terminates, then:If each HC terminates, then:

RRCH is complete (with probability RRCH is complete (with probability 1)1)

Analysis of RRHC:Analysis of RRHC:

20

Pure RRHC:Pure RRHC:

If HC has a probability p of reaching success, If HC has a probability p of reaching success, then we need (average) 1/p restarts in RRHC.then we need (average) 1/p restarts in RRHC.

With sideway moves added:With sideway moves added:

For 8-queens:For 8-queens: - p= 0.14 1/p ≈ 7 iterations- p= 0.14 1/p ≈ 7 iterations - Cost (average) ?- Cost (average) ? (6 (6 (failures) (failures) * 3) + (1 * 3) + (1 (success) (success) * 4) = 22 * 4) = 22 steps steps

For 8-queens:For 8-queens: - p= 0.94 1/p = 1.06 iterations- p= 0.94 1/p = 1.06 iterations

of which of which (1-p)/p = 0.06/0.94 fail(1-p)/p = 0.06/0.94 fail - Cost (average) ?- Cost (average) ? (0.06/0.94 (0.06/0.94 (failures) (failures) * 64) + (1 * 64) + (1 (success) (success) * 21) ≈ * 21) ≈

25 25 steps steps

1/p-11/p-1

Conclusion ?Conclusion ?

21

Local Search is replacing more and more other Local Search is replacing more and more other solvers in solvers in _many_ domains_many_ domains continuouslycontinuously

including optimization problems in ML.including optimization problems in ML.

Simulated AnnealingSimulated AnnealingKirkpatrick et al. 1983Kirkpatrick et al. 1983

Simulate the process of annealingSimulate the process of annealing

from metallurgy from metallurgy

Motivations:Motivations:

HC (3): best movesHC (3): best moves fast, but stuck in local optimafast, but stuck in local optima

Stochastic HC: random movesStochastic HC: random movesslow, but completeslow, but complete

1)1)

Combine !Combine !

RRHV: resart after failureRRHV: resart after failure why wait until failure?why wait until failure?

include ‘jumps’ during the include ‘jumps’ during the process ! process !

2)2)

At high ‘temperature’ : frequent big At high ‘temperature’ : frequent big jumpsjumpsAt low ‘temperature’ : few smaller At low ‘temperature’ : few smaller onesonesGet ping-pong ball to deepest hole, by rolling Get ping-pong ball to deepest hole, by rolling

ball and shaking surfaceball and shaking surface3)3)

24

The algorithm:The algorithm:

State:= S;State:= S;

FORFOR Time = 1 to ∞Time = 1 to ∞ DODO Temp:= DecreaseFunction(Time);Temp:= DecreaseFunction(Time); IFIF Temp = 0 Temp = 0 ThenThen ReturnReturn State;State; ElseElse Next:= random_neighborg(State); Next:= random_neighborg(State); ΔΔhh:= := hh(Next) – (Next) – hh(State);(State); IFIF ΔΔhh > 0 > 0 ThenThen State:= Next; State:= Next; ElseElse State:= Next with probability State:= Next with probability e^(e^(ΔΔh/Temp)h/Temp)End_FOREnd_FOR

For slowly deceasing temperature, will reach For slowly deceasing temperature, will reach global optimum (with probablity 1)global optimum (with probablity 1)

Local k-Beam SearchLocal k-Beam Search

Beam search, without keeping partial Beam search, without keeping partial pathspaths

Local k-Beam SearchLocal k-Beam Search

26

≠ ≠ k parallel HC(3) searchesk parallel HC(3) searches

The k new states are the k best The k new states are the k best of ALL the neighborsof ALL the neighbors

Stochastic Beam SearchStochastic Beam Search

Genetic AlgorithmsGenetic AlgorithmsHolland 1975Holland 1975

Search inspired by evolution theorySearch inspired by evolution theory

28

General ContextGeneral Context Similar to stochastic k-beam searchSimilar to stochastic k-beam search

keeps track of k Stateskeeps track of k States

Different:Different: generation of new states is “sexual”generation of new states is “sexual”

In addition has: In addition has: selectionselection and and mutationmutation

States must be represented as strings over States must be represented as strings over some alphabetsome alphabet e.g. 0/1 bits or decimal numberse.g. 0/1 bits or decimal numbers

Objective function is called Objective function is called fitness functionfitness function

CrossoverCrossover

8-queens example:8-queens example:

State representation: 8-string of numbers [1,8]State representation: 8-string of numbers [1,8]

Population: set of k states -- here: k=4Population: set of k states -- here: k=4

8-queens (cont.)8-queens (cont.)

Fitness function applied to populationFitness function applied to population Probability of being selected: Probability of being selected:

proportional to fitnessproportional to fitness

Select: k/2 Select: k/2 pairspairs of states of states

Step 1: Selection:Step 1: Selection:

8-queens (cont.)8-queens (cont.)

Select random crossover point Select random crossover point here: 3 for pair one, 5 for pair twohere: 3 for pair one, 5 for pair two

Crossover applied to the stringsCrossover applied to the strings

Step 2: Crossover:Step 2: Crossover:

8-queens (cont.)8-queens (cont.)

With a small probability:With a small probability: change a string member to a random change a string member to a random valuevalue

Step 3: Mutation:Step 3: Mutation:

The algorithm:The algorithm:

PopPop:= the set of k initial states;:= the set of k initial states;REPEATREPEAT New_PopNew_Pop:= {};:= {}; FORFOR i=1,k i=1,k DoDo x:= RandomSelect(x:= RandomSelect(PopPop, , FitFit);); y:= RandomSelect(y:= RandomSelect(PopPop, , FitFit);); child:= child:= crossovercrossover(x,y);(x,y); IFIF (small_random_probability) (small_random_probability) ThenThen child:= child:= mutatemutate(child);(child); New_PopNew_Pop:= := New_PopNew_Pop U {child}; U {child};End_ForEnd_ForPopPop:= := New_PopNew_Pop;;UntilUntil a member of Pop fit enough or time is a member of Pop fit enough or time is upup

Given: Given: FitFit (a fitness function) (a fitness function)

Different from theDifferent from theexample, where bothexample, where bothcrossovers were usedcrossovers were used

34

Comments on GAComments on GA

Very many variants – this is only one instance!Very many variants – this is only one instance! keep part of Pop, different types of crossover, …keep part of Pop, different types of crossover, …

What is added value?What is added value? If the encoding is well-constructed: substrings If the encoding is well-constructed: substrings

may represent useful building blocksmay represent useful building blocks

Then crossover may produce more useful states !Then crossover may produce more useful states !

In general: advantages of GA are not well understoodIn general: advantages of GA are not well understood

Ex: (246*****) is a usefull pattern for 8-queensEx: (246*****) is a usefull pattern for 8-queensEx Circuit design: some substring may Ex Circuit design: some substring may represent a useful subcircuitrepresent a useful subcircuit

Interpretation of crossover:Interpretation of crossover:

If we change our representation from 8 decimal If we change our representation from 8 decimal numbers to 24 binary digits, how does the numbers to 24 binary digits, how does the interpretation change?interpretation change?

Tabu SearchTabu SearchGlover 1986Glover 1986

Another way to get HC out of local Another way to get HC out of local minimaminima

Tabu = forbiddenTabu = forbidden

In order to get HC out of a local maximum:In order to get HC out of a local maximum:

Naïve idea:Naïve idea: Allow one/some moves downhillAllow one/some moves downhill

Problem: When switching back to HC, it will just move Problem: When switching back to HC, it will just move back!back!

TabuList:TabuList: Where you are Where you are forbiddenforbidden to go next.to go next.

The Tabu search idea:The Tabu search idea: Keep a list Keep a list TabuListTabuList with information on which with information on which

new states are not allowednew states are not allowed

n3n3 n3n3

n3: 2 6 n3: 2 6

Add to Add to TabuListTabuList:: (n3, 6, 2) : (n3, 6, 2) : don’t make the opposite move don’t make the opposite move oror

(n3, 2) : (n3, 2) : don’t place n3 back on 2 don’t place n3 back on 2 oror

(n3) : (n3) : don’t move n3 don’t move n3

Hoped effect: visualizedHoped effect: visualized

TabuListTabuList is kept short: only recent history is kept short: only recent history determines it.determines it.

TabuListTabuList determines an area where NOT determines an area where NOT to to move back.move back.

The algorithm:The algorithm:Given: Given: FitFit (a fitness function) (a fitness function)

State:= S;State:= S;Best:= S;Best:= S;TabuListTabuList:= {};:= {};WHILEWHILE not(StopCondition) not(StopCondition) DoDo Candidates:= {};Candidates:= {}; FORFOR every Child in Neighbors(State) every Child in Neighbors(State) DoDo IFIF not_forbidden(Child, not_forbidden(Child, TabuListTabuList) ) thenthen Candidates:= Candidates U {Child}; Candidates:= Candidates U {Child}; Succ:= Maximal_Succ:= Maximal_FitFit(Candidates);(Candidates); State:= Succ;State:= Succ; IfIf FitFit(Succ) > (Succ) > FitFit(Best) (Best) thenthen TabuListTabuList:= := TabuListTabuList U {ExcludeCondition(Succ, U {ExcludeCondition(Succ, Best)};Best)}; Best:= Succ;Best:= Succ; Eliminate_old(Eliminate_old(TabuListTabuList););Return Best;Return Best;

Example Personel Rostering: Example Personel Rostering:

Initialise to satisfy the Initialise to satisfy the required amountsrequired amounts

Allow only vertical Allow only vertical swaps (swaps (neighbors)neighbors)

If a swap has If a swap has influenced a certain influenced a certain region of the timetable, region of the timetable, do not allow any other do not allow any other swap to influence this swap to influence this region for a specified region for a specified number of moves number of moves (Tabu (Tabu list, Tabu attributes)list, Tabu attributes)

Shift 1 2 3 4 5 7 1 2 3 C

                     

Pjotr A A A     T T F 1

Ludwig C C     R T T T 0

Clara T T R   R F T T 1

Hildegard A   A A   T T T 0

Johann   A C C     T T F 1

Wolfgang R T T T C   T T T 0

Guiseppe R     F T T 1

Antonio R R   R C F T T 1

             

Arranger 2 2 2 1 0 0

Tonesetter 1 2 1 1 0 0

Composer 1 1 1 1 2 0

Reader 3 1 1 1 2 0

Heuristics and Meta-heuristicsHeuristics and Meta-heuristics

More differences with Global searchMore differences with Global search

Examples of problems and heuristicsExamples of problems and heuristics

Meta-heuristicsMeta-heuristics

HeuristicsHeuristics

In Global search: In Global search: hh: States N: States N

In Local search:In Local search:

- How to represent a State?How to represent a State?- How to define the Neighbors?How to define the Neighbors?- How to define the objective or How to define the objective or FitFit function? function?

Are all Are all heuristicheuristic choices that influence the choices that influence the search search VERYVERY much. much.

Finding routes:Finding routes:

• Given a weighted graph, (V,E), and two Given a weighted graph, (V,E), and two vertices, ‘source’ and ‘destination’, find the vertices, ‘source’ and ‘destination’, find the path from source to destination with the path from source to destination with the smallest accumulated weight.smallest accumulated weight.

• DijkstraDijkstra: O(|E|: O(|E|2 + 2 + |V|) |V|) (for sparse (for sparse graphs graphs OO(|(|EE|log||log|VV|))|))

The objective function:The objective function:

OROR

In generalIn general: many : many different functions different functions possiblepossible

X 10000X 10000

Out of Out of rectangular rectangular sheets with sheets with fixed fixed dimensionsdimensions

• NP-complete, even in one NP-complete, even in one dimension (pipe-cutting)dimension (pipe-cutting)

Stock cutting:Stock cutting:

Objective function?Objective function?

Minimize Minimize waste!waste!

Personel rosteringPersonel rostering

– Constraints: • Shifts have start times

and end times• Employees have a

qualification• A required capacity per

shift is given per qualification

• Employees can work subject to specific regulations

• …

Shift 1 2 3 4 5 7 1 2 3 C

                     

Pjotr A A A     C T T F 1

Ludwig C C     R R T T T 0

Clara T T C   R R F T T 1

Hildegard     A A A   T T T 0

Johann     C C     T T F 1

Wolfgang   C T T C   T T T 0

Guiseppe R R     A A F T T 1

Antonio R R     C C F T T 1

             

Arranger 2 2 2 1 0 0

Tonesetter 1 2 1 1 0 0

Composer 1 1 1 1 2 0

Reader 3 1 1 1 2 0

Consists of Consists of assignments of assignments of employees to employees to working shifts while working shifts while satisfying all satisfying all constraintsconstraintsConstraints:

• Shifts have start times and end times

• Employees have a qualification

• A required capacity per shift is given per qualification

• Employees can work subject to specific regulations

• …

Objective function?Objective function?

Just solve the problem Just solve the problem (CP)(CP)

Number of constraints Number of constraints violatedviolated

Weighted number of Weighted number of constraints violatedconstraints violated

Amount under Amount under assignmentassignment

Amount over Amount over assignmentassignment

This may lead to the This may lead to the definition of a definition of a goal goal functionfunction (representing (representing a lot of a lot of domaindomain information)information)

Shift 1 2 3 4 5 7 1 2 3 C

                     

Pjotr A A A     C T T F 1

Ludwig C C     R R T T T 0

Clara T T C   R R F T T 1

Hildegard     A A A   T T T 0

Johann     C C     T T F 1

Wolfgang   C T T C   T T T 0

Guiseppe R R     A A F T T 1

Antonio R R     C C F T T 1

             

Arranger 2 2 2 1 0 0

Tonesetter 1 2 1 1 0 0

Composer 1 1 1 1 2 0

Reader 3 1 1 1 2 0

Neighbors for RosteringNeighbors for Rostering One can easily think ofOne can easily think of

SwapsSwapsRemovalsRemovals InsertionsInsertions ‘‘Large Swaps’Large Swaps’

These ‘easy’ options do These ‘easy’ options do depend on the depend on the domaindomain

They define ‘steps’ in a They define ‘steps’ in a ‘solution space’ with an ‘solution space’ with an associated change in associated change in the goal function.the goal function.

One obvious heuristic One obvious heuristic is a is a hill-climberhill-climber based based on a selection of these on a selection of these possible steps.possible steps.

Shift 1 2 3 4 5 7 1 2 3 C

                     

Pjotr A A A     C T T F 1

Ludwig C C     R R T T T 0

Clara T T C   R R F T T 1

Hildegard     A A A   T T T 0

Johann     C C     T T F 1

Wolfgang   C T T C   T T T 0

Guiseppe R R     A A F T T 1

Antonio R R     C C F T T 1

             

Arranger 2 2 2 1 0 0

Tonesetter 1 2 1 1 0 0

Composer 1 1 1 1 2 0

Reader 3 1 1 1 2 0

Traveling Sales Person:Traveling Sales Person:Neighbors:Neighbors:

n citiesn cities n! routesn! routes 2-change connects them all !2-change connects them all !

52

Meta-heuristicsMeta-heuristics

All the methods, HC, RRHC, Sim.Ann., GA, Tabu All the methods, HC, RRHC, Sim.Ann., GA, Tabu Search, … are Search, … are meta-heuristicsmeta-heuristics

They provide frameworks in which the user can They provide frameworks in which the user can plug in heuristics.plug in heuristics.

At a higher level:At a higher level: Meta-heuristics can be combined:Meta-heuristics can be combined:

use algorithm 1 until condition 1 holdsuse algorithm 1 until condition 1 holds then, use algorithm 2 until condition 2 holdthen, use algorithm 2 until condition 2 holds, ….s, ….

Specific combinations are known to work well for Specific combinations are known to work well for certain types of problems.certain types of problems.

ML is used to ‘learn’ which algorithms work ML is used to ‘learn’ which algorithms work better in which problems.better in which problems.

Concluding remarksConcluding remarks

Local Search in Continuous SpacesLocal Search in Continuous Spaces

Variable Neighborhoods SearchVariable Neighborhoods Search

Relation to BDA: some pointersRelation to BDA: some pointers

Continuous Search SpacesContinuous Search Spaces

Some basic ideasSome basic ideas

In 1-dimension: the derivativeIn 1-dimension: the derivative All problems studied so far were for All problems studied so far were for discretediscrete

search spaces!search spaces!

How is HC different for continuous spaces?How is HC different for continuous spaces?

x1x1

ddhhdxdx

(x1) > 0(x1) > 0

(3)(3)

Vector points in Vector points in ascending directionascending direction

In 1-dimension: the derivativeIn 1-dimension: the derivative All problems studied so far were for All problems studied so far were for discretediscrete

search spaces!search spaces!

How is HC different for continuous spaces?How is HC different for continuous spaces?

ddhhdxdx

(x2) < 0(x2) < 0

(-5)(-5)

Vector still points in Vector still points in ascending direction !ascending direction !

x2x2

Let HC move in Let HC move in the direction of the direction of dd

hhddxx For instance:For instance:

x:= x + a.x:= x + a. ddhhddxx

In 1-dimension: the derivativeIn 1-dimension: the derivative All problems studied so far were for All problems studied so far were for discretediscrete

search spaces!search spaces!

How is HC different for continuous spaces?How is HC different for continuous spaces?

Eventually we get Eventually we get to a (local) maximum.to a (local) maximum.

ddhhdxdx

(x3) = 0(x3) = 0

(0)(0) x3x3

In n-dimensions: the gradientIn n-dimensions: the gradient

Gives: gradient ascent / gradient descent approaches.Gives: gradient ascent / gradient descent approaches.

(x1,y1)(x1,y1)

The direction of the strongest ascent: The direction of the strongest ascent: The gradient: The gradient: hh = ( d = ( dhh/dx1, d/dx1, dhh/dx2, …, d/dx2, …, dhh/dxn)/dxn)

ΔΔ

Example: airport placementExample: airport placement Place an airport, nearest to n given cities.Place an airport, nearest to n given cities.

Cities: C1, C2, …, CnCities: C1, C2, …, Cn hh(x,y) = (x,y) = ΣΣ (x-xCi)^2 + (x-xCi)^2 + ΣΣ (y-yCi)^2 (y-yCi)^2

i=1,ni=1,n i=1,ni=1,n

dh/dx = 2 dh/dx = 2 ΣΣ(x-xCi) dh/dy = 2 (x-xCi) dh/dy = 2 ΣΣ(y-yCi) (y-yCi)

Solve: Solve: ΣΣ(x-xCi) = 0 , (x-xCi) = 0 , ΣΣ(y-yCi) = 0 (y-yCi) = 0

Iterative method: Newton-Raphson: converges to rootsIterative method: Newton-Raphson: converges to roots

Solution:Solution: x = x = ΣΣ(xCi) /n y = (xCi) /n y = ΣΣ(yCi) /n (yCi) /n

Obviously correct: center Obviously correct: center of the x- and y-coordinatesof the x- and y-coordinates

Variable Neighborhood SearchVariable Neighborhood Search

Exploit various different ways of defining Exploit various different ways of defining neighborhoods to move out of local optimaneighborhoods to move out of local optima

Variable neighborhood search Variable neighborhood search (Mladenovic and Hanssen 1997)(Mladenovic and Hanssen 1997)

Facts: Facts:

A local minimum with respect to one A local minimum with respect to one neighborhood structure is not necessarily so neighborhood structure is not necessarily so for another.for another.

A global minimum is a local minimum with A global minimum is a local minimum with respect to all neighborhood structuresrespect to all neighborhood structures

For many problems local minima with respect to For many problems local minima with respect to one or several neighbourhoods are relatively one or several neighbourhoods are relatively close to each otherclose to each other

Idea: use different neighborhoodsIdea: use different neighborhoods

By moving to a different neighborhood: you By moving to a different neighborhood: you may get out of the local optimum !may get out of the local optimum !

Define a Define a numbernumber of different neighborhoods of different neighborhoods different ways to compute successorsdifferent ways to compute successors

If you can not get out of local optimum in If you can not get out of local optimum in one, try the next neighborhood.one, try the next neighborhood.

AlgorithmAlgorithm

Select a set of neighbourhood structures Select a set of neighbourhood structures NNll ( (ll == 11 toto llmaxmax) ) State:= S;State:= S;ll:= := 11;;RepeatRepeat until termination condition metuntil termination condition met

Exploration:Exploration:find the best neighbour Succ of State infind the best neighbour Succ of State in NNll((StateState));;

Acceptance:Acceptance:IfIf hh(Succ) > (Succ) > hh(State) (State) thenthen State:= Succ;State:= Succ; l l := := 11;;ElseElse ll:= := l+1l+1;;IfIf l > l > llmaxmax thenthen ll:= := 11; ;

Broad subdomainBroad subdomain

Many variants exist !Many variants exist ! possible topic for you presentation.possible topic for you presentation. find a paper on variable neighborhood searchfind a paper on variable neighborhood search

technique or application.technique or application.

Relation to BDA: some pointersRelation to BDA: some pointers

ExamplesExamples

Sub-modularitySub-modularity

Discrete example:Discrete example: Given Given AA, a set of 30 possible features to diagnose the flu. , a set of 30 possible features to diagnose the flu.

Temp Temp >38>38

DiareDiareaa

Wife had Wife had the fluthe flu

CoughCoughss......

Given Given hh, a function from 2^, a function from 2^AA -> N, giving how well -> N, giving how well this subset allows to discriminate flu versus not-flu.this subset allows to discriminate flu versus not-flu.

Find the Find the best discriminating subset best discriminating subset with 5 elements.with 5 elements.

42% 42% precisionprecision

hh

Discrete example:Discrete example:

Start with empty subset,Start with empty subset, Add the one element that increase Add the one element that increase hh the most, the most, Then add the next element that increase Then add the next element that increase hh most, most, etc. etc.

Temp Temp >38>38

DiareDiareaa

Wife had Wife had the fluthe flu

CoughCoughss......

This is Hill Climbing !!This is Hill Climbing !!

Contiuous example:Contiuous example:

Find the vector/line that discriminates best.Find the vector/line that discriminates best. E.g.: maximize the minimal distance to the pointsE.g.: maximize the minimal distance to the points Continuous optimization problem.Continuous optimization problem.

++--

E.g.: by continuous local search.E.g.: by continuous local search.

In discrete case:In discrete case:SubmodularitySubmodularity

If A C B and s If A C B and s ЄЄ S, then S, then

The property of Diminishing The property of Diminishing Returns:Returns:

hh(A U {s}) – (A U {s}) – hh(A) ≥ (A) ≥ hh(B U {s}) - (B U {s}) - hh(B)(B)

BB

hh

ΔΔ

Definition ofDefinition ofSubmodularitySubmodularity

Submodularity holds for MANY objectives in ML !!Submodularity holds for MANY objectives in ML !!

AA ssaddadd

Relevance of HC for ML:Relevance of HC for ML:

71

In ML: if In ML: if hh is a submodular function: is a submodular function:

TheoremTheorem::IfIf Greedy Local returns A_greedy, Greedy Local returns A_greedy, thenthenhh(A_greedy) ≥ (1- 1/e) max (A_greedy) ≥ (1- 1/e) max hh(A)(A)

ACS ACS

IF P ≠ NP: IF P ≠ NP: This is the very best one can hope for (in Poly. time)This is the very best one can hope for (in Poly. time)

≈ ≈ 63 %63 %

VERY many problems in ML are submodular !!VERY many problems in ML are submodular !! Local search in very relevant for ML.Local search in very relevant for ML.

Reading assignment and Reading assignment and presentations:presentations:

Applications of Local SearchApplications of Local Search

Other Local Search or variants of the studied methodsOther Local Search or variants of the studied methods

Applications of Local Search in MLApplications of Local Search in ML

Further aspects of SubmodularityFurther aspects of Submodularity

For the coming SAT-solving:For the coming SAT-solving:

MAX-SAT solvingMAX-SAT solving

Mini-SatMini-Sat

Further relations between SAT and Local SearchFurther relations between SAT and Local Search

Start with Google and WikiStart with Google and Wiki

Study at least one “real”/scientific sourceStudy at least one “real”/scientific source

Provide the reference on your sources.Provide the reference on your sources.