Elements of Evolutionary Algorithms - Luis...

80
Elements of Evolutionary Algorithms Luis Martí Orosa LIRA/DEE/PUC-Rio

Transcript of Elements of Evolutionary Algorithms - Luis...

  • Elements of Evolutionary Algorithms Luis Martí Orosa LIRA/DEE/PUC-Rio

  • From the previous class

    • Roots of evolutionary computation: • Biological inspiration. • Optimization and search problems. • Briefly overviewed the main approaches. • Outlined the contents of the course.

  • In this class:

    • What are the key elements of an evolutionary algorithm? • Representation; • Evolutionary operators. • Constraint handling.

    We will focus mostly in Genetic Algorithms but many of these topics apply to other EAs.

  • Evolutionary Operators

  • A Simple Evolutionary Algorithm

    Simple Evolutionary Algorithm

    Generate the initial population P(0) at random, and set t 0.repeat

    Evaluate the fitness of each individual in P(t).Select parents from P(t) based on their fitness.Obtain population P(t + 1) by

    applying crossover and mutation to parents.Set t t + 1.

    until termination criterion satisfied.

    I Basic idea from natural evolution and population genetics.

    I Survival of the fittest.

  • Exploration and Exploitation

    Exploration of new parts of search space

    I Mutation operators

    I Recombination operators

    Exploitation of promising genetic material

    I Selection mechanism

  • Mutation operators for bitstrings

    The mutation operator introduces small,random changes to an individual’s chromosome.

    Local Mutation

    I One randomly chosen bit is flipped.

    Global Mutation

    I Each bit flipped independently with a given probability pm,called the per bit mutation rate, which is often 1/n,where n is the chromosome length.

    Pr [kbits flipped] =

    ✓n

    k

    ◆· pkm · (1� pm)n�k .

    Mutation rate

    I Note the di↵erence between per bit (gene)and per chromosome (individual) mutation rates.

  • Recombination operators - One point crossoverThe recombination operator generates an o↵spring individualwhose chromosome is composed from the parents’ chromosomes.

    Crossover rate

    I probability of applying crossover to parents

    One point crossover between parents x and y

    Randomly select a crossover point p in {1, 2, ..., n}.O↵spring 1 is x

    1

    · · · xp · yp+1 · · · yn.O↵spring 2 is y

    1

    · · · yp · xp+1 · · · xn.

    Example

    Parent x : 101011 | 1010 O↵spring 1: 101011 | 1110Parent y : 010100 | 1110 O↵spring 2: 010100 | 1010

  • Recombination operators - Multi-point crossover

    k-point crossover between parents x and y

    Randomly select k crossover points p1

    < · · · < pk in {1, 2, ..., n}.O↵spring 1 is x

    1

    · · · xp1

    · yp1

    +1

    · · · yp2

    · xp2

    +1

    · · · xp3

    · · · etc.O↵spring 2 is y

    1

    · · · yp1

    · xp1

    +1

    · · · xp2

    · yp2

    +1

    · · · yp3

    · · · etc.

    Example (2-point crossover)

    Parent x : 101 | 011 | 1010 O↵spring 1: 101 | 100 | 1010Parent y : 010 | 100 | 1110 O↵spring 2: 010 | 011 | 1110

  • Recombination operators - Uniform crossover

    Uniform crossover between parents x and y

    Select a bitstring z of length n uniformly at random.for all i in 1 to n

    if zi = 1 then bit i in o↵spring 1 is xi else yi .if zi = 1 then bit i in o↵spring 2 is yi else xi .

    Example

    z =1010001110

    Parent x : 1010111010 O↵spring 1: 1111001010

    Parent y : 0101001110 O↵spring 2: 0000111110

  • Selection and Reproduction

    Selection emphasizes the better solutions in a population

    I One or more copies of good solutions.

    I Inferior solutions are much less likely to be selected.

    I Not normally considered a search operator,but influences search significantly

    Selection can be used either before or after search operators.

    I When selection is used before search operators, the process ofchoosing the next generation from the union of all parentsand o↵spring is sometimes called reproduction.

    Generational gap of EA

    I refers to the overlap (i.e., individuals that did not go throughany search operators) between the old and new generations.

    I The two extremes are generational EAs and steady-state EAs.

    I 1-elitism can be regarded as having a generational gap of 1.

  • Fitness Proportional Selection

    Probability of selecting individual x from population P is

    Pr [x ] =f (x)Py2P f (y)

    .

    I Use raw fitness in computing selection probabilities.Does not allow negative fitness values.

    I Also known as roulette wheel selection.

    Weaknesses

    I Domination of “super individuals” in early generations.

    I Slow convergence in later generations.

    Fitness scaling often used in early days to combat problem

    I Fitness function f replaced with a scaled fitness function f̃ .

  • Fitness Scaling 1/2

    Simple scaling

    f̃ (x) := f (x)� fmin,!, where

    I ! is scaling window

    If

    min,! is lowest observed fitness in last ! generations

    Sigma scaling

    f̃ (x) := min{0, f (x)� (f̄ � c · �f )}, where

    Ic is a constant, e.g. 2

    If̄ is average fitness in current population

    I �f is the standard deviation of the fitnessin the current population

  • Fitness Scaling 2/2

    Power scaling

    f̃ (x) := f (x)k , where k > 0.

    Exponential scaling

    f̃ (x) := exp(f (x)/T ), where

    IT > 0 is the temperature, approaching zero.

  • Ranking Selection

    1. Sort population from best to worst according to fitness:

    x

    (��1), x(��2), x(��3), ..., x(0)

    2. Select the �-ranked individual x(�) with probability Pr [�],

    where Pr [�] is a ranking function, e.g.I linear rankingI exponential rankingI power rankingI geometric ranking

  • Linear ranking

    Population size �, and rank�, 0 � �� 1, (0 worst)

    Linear ranking

    Pr

    linear

    [�] :=↵+ (� � ↵) · ���1

    whereP��1

    �=0 Prlinear [�] = 1 implies↵+ � = 2 and 1 � 2.

    In expectation

    I best individual reproduced � times

    I worst individual reproduced ↵ times.

    0

    �� 1

    ↵ �

    Rank

  • Other ranking functions

    Power ranking

    Pr

    power

    [�] :=↵+ (� � ↵) ·

    ⇣�

    ��1

    ⌘k

    C

    ,

    Geometric ranking

    Pr

    geom

    [�] :=↵ · (1� ↵)��1��

    C

    ,

    Exponential ranking

    Pr

    exp

    [�] :=1� e��

    C

    ,

    where C is a normalising factor and 0 < ↵ < �.

  • Tournament Selection

    Tournament selection with tournament size k

    Randomly sample a subset P 0 of k individuals from population P .Select the individual in P 0 with highest fitness.

    I Often, tournament size k = 2 is used.

  • (µ+ �) and (µ,�) selection

    Origins in Evolution Strategies.

    (µ+ �)-selection

    Parent population of size µ.Generate � o↵spring from randomly chosen parents.Next population is µ best among parents and o↵spring.

    (µ,�)-selection (where � > µ)

    Parent population of size µ.Generate � o↵spring from randomly chosen parents.Next population is µ best among o↵spring.

  • Selection pressureDegree to which selection emphasizes the better individuals.How can selection pressure be measured and adjusted?

    Take-over time ⌧⇤ [Goldberg and Deb, 1991, Bäck, 1994].1. Initial population with unique fittest individual x⇤.2. Apply selection operator reapeadly with no other operators.3. ⌧⇤ is number of generations until population consists of x⇤

    only.Higher take-over time ! lower selection pressure.

    Fitness prop. ⌧⇤ ⇡ � ln�c assuming fitness f (x) = exp(cx)

    Linear ranking ⌧⇤ ⇡ 2 ln(��1)��1 1 < � < 2

    Tournament ⌧⇤ ⇡ ln�+ln ln�ln k tournament size k

    (µ,�) ⌧⇤ = ln�ln(�/µ)

  • Summary

    I Exploration and exploitation

    I Mutation operators

    I Recombination operators

    I Selection mechanisms

    I Selection pressure

  • Main References

    Bäck, T. (1994).Selective pressure in evolutionary algorithms: Acharacterization of selection mechanisms.In Proceedings of the 1st IEEE Conf. on EvolutionaryComputation, pages 57–62. IEEE Press.

    Goldberg, D. E. and Deb, K. (1991).A comparative analysis of selection schemes used in geneticalgorithms.In Foundations of Genetic Algorithms, pages 69–93. MorganKaufmann.

  • Representation

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Introduction

    Representation of a problem! Each individual corresponds to a solution x = x1x2 . . . xn! We can modify the solution by means of crossover and mutation

    Last lecture: Discrete representations! Each gene has a value taken from a finite set! E.g., xi ∈ {0, 1} or xi ∈ {4, 5, 6, 7, 8}

    This lecture: Real-valued (continuous) representations! Each gene has a value taken from a continuous interval! E.g., xi ∈ [−5,5] or xi ∈ [0, 1]

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Problems with Discrete Representations

    Q: Why do we need another type of representation?

    A binary encoding can represent any integer! n bits encode 2n integers from 0 to 2n − 1! E.g., 0000 = 0, 0001 = 1, 0010 = 2, 0011 = 3, . . .

    Hamming Cliffs! Locality may not be preserved! 0111 = 7 BUT 1000 = 8 (think mutation operator)

    Gray Coding partly overcomes this issue! Converts the mapping from binary to integer! Can reach any adjacent integer by single bit-flip

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Gray Coding

    Integer Binary Gray0 0000 00001 0001 00012 0010 00113 0011 00104 0100 01105 0101 01116 0110 01017 0111 01008 1000 11009 1001 110110 1010 111111 1011 111012 1100 101013 1101 101114 1110 100115 1111 1000

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Gray Coding

    Integer Binary Gray0 0000 00001 0001 00012 0010 00113 0011 00104 0100 01105 0101 01116 0110 01017 0111 01008 1000 11009 1001 110110 1010 111111 1011 111012 1100 101013 1101 101114 1110 100115 1111 1000

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Gray Coding

    Integer Binary Gray0 0000 00001 0001 00012 0010 00113 0011 00104 0100 01105 0101 01116 0110 01017 0111 01008 1000 11009 1001 110110 1010 111111 1011 111012 1100 101013 1101 101114 1110 100115 1111 1000

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Binary-based Real-valued Representations

    Can also represent most floating point numbers as binary! Discretising the continuous domain

    How? Given domain [−2, 2] and precision of 6 decimal places! Divide domain [−2, 2] into 4 · 1000000 intervals! We need 22 bits (4× 1000000 < 222)! Convert binary string to integer x ′, then convert to real

    x = −2+ x ′ · 4222 − 1

    ! But same problem as before

    Using a real-valued representation overcomes these issues! Real-valued representation may be a more natural choice

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Discrete Recombination

    Does not change actual (gene) values! Very similar to the crossover operators on binary strings

    Multi-point Recombination (1-point, 2-point, k-point crossover)! Similar to that for the binary representation

    Global Discrete Recombination! Similar to uniform crossover for the binary representation

    x1: 0.21 1.87 3.66 | 1.11 2.25 x ′1: 0.21 1.87 3.66 | 2.56 0.11x2: 2.32 0.77 2.99 | 2.56 0.11 x ′2: 2.32 0.77 2.99 | 1.11 2.25

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Discrete Recombination

    Does not change actual (gene) values! Very similar to the crossover operators on binary strings

    Multi-point Recombination (1-point, 2-point, k-point crossover)! Similar to that for the binary representation

    Global Discrete Recombination! Similar to uniform crossover for the binary representation

    x1: 0.21 1.87 3.66 | 1.11 2.25 x ′1: 0.21 1.87 3.66 | 2.56 0.11x2: 2.32 0.77 2.99 | 2.56 0.11 x ′2: 2.32 0.77 2.99 | 1.11 2.25

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Intermediate Recombination

    We can actually modify the genetic values! In the discrete case, resulting value might not be valid! E.g., average in binary (x1i = x2i = 1): x ′i = (x1i + x2i )/2 = 0.5 /∈ {0, 1}

    For 2 parents x1 and x2 and i = 1, 2, . . . , n:

    x ′i = αx1i + (1− α)x2i where α ∈ [0, 1]

    Given α = 0.5:

    x1: 0.21 1.87 3.66 1.11 2.25 x2: 2.32 0.77 2.99 2.56 0.11

    x ′: 1.27 1.32 3.33 1.84 1.18

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Intermediate Recombination

    We can actually modify the genetic values! In the discrete case, resulting value might not be valid! E.g., average in binary (x1i = x2i = 1): x ′i = (x1i + x2i )/2 = 0.5 /∈ {0, 1}

    For 2 parents x1 and x2 and i = 1, 2, . . . , n:

    x ′i = αx1i + (1− α)x2i where α ∈ [0, 1]

    Given α = 0.5:

    x1: 0.21 1.87 3.66 1.11 2.25 x2: 2.32 0.77 2.99 2.56 0.11

    x ′: 1.27 1.32 3.33 1.84 1.18

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Arithmetic & Heuristic Recombination

    Arithmetic crossover (for p parents):

    x ′i = α1x1i + α2x2i + α3x3i + . . . wherep

    X

    i=1αi = 1

    ! Generalised (simple) intermediate crossover

    Heuristic crossover (where x1 is no worse than x2):

    x ′ = u(x1 − x2) + x1 where u = rand([0,1])

    ! Partially reflected point

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Arithmetic & Heuristic Recombination

    Arithmetic crossover (for p parents):

    x ′i = α1x1i + α2x2i + α3x3i + . . . wherep

    X

    i=1αi = 1

    ! Generalised (simple) intermediate crossover

    Heuristic crossover (where x1 is no worse than x2):

    x ′ = u(x1 − x2) + x1 where u = rand([0,1])

    ! Partially reflected point

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Heuristic Recombination

    Heuristic crossover example: Assume f (x1) ≥ f (x2) and xi ∈ [0, 4]

    x1: 0.21 1.87 3.66 1.11 2.25 x2: 2.32 0.77 2.99 2.56 0.11

    0.13 · (0.21− 2.32) + 0.21 , 0.47 · (1.87− 0.77) + 1.87 . . .

    x ′: −0.06 2.39 . . .

    What is happening here?What if resulting values are out of bound (e.g., −0.06)?

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Heuristic Recombination

    Heuristic crossover example: Assume f (x1) ≥ f (x2) and xi ∈ [0, 4]

    x1: 0.21 1.87 3.66 1.11 2.25 x2: 2.32 0.77 2.99 2.56 0.11

    0.13 · (0.21− 2.32) + 0.21 , 0.47 · (1.87− 0.77) + 1.87 . . .

    x ′: −0.06 2.39 . . .

    What is happening here?What if resulting values are out of bound (e.g., −0.06)?

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Simplex Recombination I

    Randomly select a group (> 2) of parents from the population

    Find the best xmax and worst xmin in the group! The offspring x ′ will eventually replace xmin

    Compute the centroid, xc , of the group ignoring xmin

    xc =1n

    n−1X

    i=1xi

    Generate a reflected point using xmin and xc:

    x r = xc + (xc − xmin)

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Simplex Recombination II

    If f (x r ) ≥ f (xmax ), then create expanded point:

    xe = x r + (x r − xc)

    If f (xe) ≥ f (x r ), then x ′ ← xe, else x ′ ← x r

    If f (x r ) ≥ f (xmin), then x ′ ← x rIf f (x r ) ≤ f (xmin), then compute contracted point xq

    xq = (xmin + xc)/2

    If f (xq) ≥ f (xmin), then x ′ ← xqElse, offspring is contracted point towards xmax

    x ′ = (xmin + xmax )/2

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Geometric & Quadratic Recombination

    Geometric recombination! For 2 parents:

    x ′ = [(x11x21)12 , . . . , (x1nx2n)

    12 ]

    ! Can be generalised to k parents:x ′ = [(xα111 x

    α221 x

    αkk1 ), . . . , (x

    α11n x

    α22n x

    αkkn )]

    wherePk

    i=1 αk = 1

    Quadratic recombination! From 3 parents, generate offspring using quadratic interpolation:

    x ′j =12·(x22j − x23j)f (x1) + (x23j − x21j)f (x2) + (x21j − x22j )f (x3)(x2j − x3j)f (x1) + (x3j − x1j)f (x2) + (x1j − x2j )f (x3)

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Geometric & Quadratic Recombination

    Geometric recombination! For 2 parents:

    x ′ = [(x11x21)12 , . . . , (x1nx2n)

    12 ]

    ! Can be generalised to k parents:x ′ = [(xα111 x

    α221 x

    αkk1 ), . . . , (x

    α11n x

    α22n x

    αkkn )]

    wherePk

    i=1 αk = 1

    Quadratic recombination! From 3 parents, generate offspring using quadratic interpolation:

    x ′j =12·(x22j − x23j)f (x1) + (x23j − x21j)f (x2) + (x21j − x22j )f (x3)(x2j − x3j)f (x1) + (x3j − x1j)f (x2) + (x1j − x2j )f (x3)

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    What Does Quadratic Recombination Mean

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    A Hybrid EA With Local Search

    1. Initialize µ individuals at random

    2. Perform local search on each individual

    3. REPEAT3.1 Generate P1,P2,P3 by global discrete recombination3.2 Perform quadratic approximation using P1,P2,P3 to produce P43.3 Perform local search from P43.4 Place P1,P2,P3,P4 into population3.5 Perform a (µ + 4) truncation selection

    4. UNTIL termination criteria are met

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Local Search With Random Memorising I

    Local search in continuous domain! Usually differs from local search in the discrete domain! Use techniques like Simplex method, quasi Newton procedures, etc.! Even with restarts, not good for finding global optimum of rugged functions

    Combine local search with global search method (e.g., EA)! Local search is expensive, especially if same local optimum is found multiple times

    Use random memorising to improve efficiency:! Store best solutions in a sequential memory (up to certain depth)! Retrieve a random one when a new best solution is found! Search along the direction of old→ new

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Local Search With Random Memorising II

    Search along the direction of old→ newFirst, compute direction

    s = xnew − xold∥xnew − xold∥

    Second, perform local search with increasing step size

    xcb = xcb−1 + s · ba · dg

    ! where s is the direction! ba the search step multiplier (which increases over time)! dg is the original, global step size

    New point added to memory as used for EA to continue search

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Experimental Studies

    18 multimodal benchmark functions! f8 − f25 (a set of functions)! n ∈ {2, 4, 6, 30}, domains differ for each problem

    Population size is N = 30

    Maximum function evaluation F = 500000

    50 independent runs for each functionindependent: Different random seeds, starting from different initial populations

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Results on f8–f25

    f F µ σ %8 199244 -11834.65 298.78 1/509 306280 2.67 1.58 5/5010 370686 2.08e-12 2.11e-12 50/5011 173592 1.87e-10 1.31e-9 50/5012 476245 5.84e-9 2.16e-8 50/5013 504212 1.19e-2 3.01e-2 33/5014 3052 1.04 0.20 48/5015 111748 3.0749e-4 2.45e-10 50/5016 2817 -1.031628 2.70e-8 50/5017 5496 0.3979 8.26e-9 50/5018 4676 3 0 50/5019 6852 -3.86 0 50/5020 17504 -3.32 2.35e-7 50/5021 13790 -10.1532 0 50/5022 13354 -10.4029 2.16e-7 50/5023 14312 -10.5364 4.58e-7 50/5024 10754 3.89e-4 1.90e-3 48/5025 15614 1.07e-4 7.51e-4 50/50

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Results on f8 and f9 with N ∈ {50,60}

    f F µ σ %8 199244 -11834.65 298.78 1/509 306280 2.67 1.58 5/50

    f N F µ σ %8 50 327434 -12296.68 119.27 1/508 60 391634 -12358.64 166.33 12/509 50 508021 2.98e-1 4.97e-1 36/509 60 610163 2.19e-1 4.12e-1 39/50

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Some Observations

    Different problems require different operators and selection schemes! There is no universally best one! Important to understand what to use when!

    Real valued representations may be more appropriate than binary! A more natural choice for function optimisation! Some guidelines exists for choosing good representations (neighbourhood, etc.)

    Many search operators are heuristics-based! Domain knowledge can often be incorporated into operators and representation

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Global Optimisation by Mutation-Based EAs I

    1. Generate initial population of µ individuals, and set k = 1! Each individual is a real-valued vector (xi ∈ [l ,u])

    2. Evaluate the fitness of each individual

    3. Each individual creates a single offspring: For j = 1, · · · , n

    xij ′ = xij + Nj(0,1)

    4. Calculate the fitness of each offspring

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Global Optimisation by Mutation-Based EAs II

    5. For each individual, q opponents are chosen from parents and offspring! If the individual’s fitness is no less than the opponent’s, it receives a win

    6. Select the µ best individuals (from 2µ) that have the most wins! They constitute the next generation.

    7. Stop if the stopping criterion is satisfied;! Otherwise, k = k + 1 and go to Step 3

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Analysis of mutation operator

    N(0,1) denotes a normally distributed random number! The mean is µ = 0 and the standard deviation is σ = 1 (i.e., N(µ, σ))

    Nj(0, 1) means a newly sampled value for each index j

    µ determines the search step size of the mutation.! It is a crucial parameter

    Unfortunately, the optimal search step size is problem-dependent! Even for a single function, different search step sizes may be optimal! Self-adaptation can be used to get around this problem partially

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Function Optimisation by Classical EP (CEP)

    Each individual (xi , ηi) creates a single offspring (xi ′, ηi ′)For j = 1, · · · , n

    ηij′ = ηij exp(τ ′N(0, 1) + τNj(0, 1)) (1)

    xij ′ = xij + ηijNj (0, 1) (2)

    The factors τ ′ and τ are commonly set to:!

    “√2n

    ”−1(global step-size)

    !

    p

    2√n

    ”−1(local step-size)

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Fast EP

    The idea comes from fast simulated annealing.

    Replace Gaussian distribution with a Cauchy one in Eq.(2)

    xij ′ = xij + ηijCj(1) (3)

    Cj(1) is an Cauchy random number variable with the scale parameter t = 1! It is generated anew for each value of j

    Everything else, including Eq.(1), is kept unchanged

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Cauchy Distribution

    Its density function is

    ft(x) =1π

    tt2 + x2 , −∞ < x 0 is a scale parameter (step size)

    The corresponding cumulative distribution function is

    Ft (x) =12 +

    1πarctan

    “xt

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Gaussian and Cauchy Density Functions

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Summary on Mutation

    Cauchy mutation performs well when distance to global optimum is far away! Its behaviour can be explained theoretically and empirically

    Can derive optimal search step size if global optimum is known! Unfortunately, such information is unavailable for real-world problems

    The performance of FEP can be improved by a set of more suitable parameters! Instead of copying CEP’s parameter setting (as done here for comparison)

    Many further possibilities! Use both Gaussian and Cauchy (generate 2 offspring, keep the better)! There are other distributions that may be used (see later slide)

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Improved Fast Evolutionary Programming f1 and f10

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    Continuous Gray Coding Distribution

    J. E. Rowe & D. Hidovic “An evolution strategy using a continuous version of the Gray-code neighbourhood distribution”

  • Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

    References (Essential Reading!)

    1. H.-M. Voigt and J. Lange “Local evolutionary search enhancement by randommemorizing” Proc. of the 1998 IEEE Int. Conf. on Evolutionary Computation.IEEE Press, Piscataway, NJ, USA, pp.547-552, 1998.

    2. K.-H. Liang, X. Yao and C. S. Newton “Combining landscape approximation andlocal search in global optimization” Proc. of the 1999 Congress on EvolutionaryComputation. Vol. 2, IEEE Press, Piscataway, NJ, USA, pp.1514-1520, July1999.

    3. X. Yao, Y. Liu and G. Lin “Evolutionary programming made faster” IEEETransactions on Evolutionary Computation. 3(2):82-102, July 1999.

    4. M-level: Rowe, J.E. and Hidovic, D. (2004) An Evolution Strategy using acontinuous version of the Gray-code neighbourhood distribution. Proceedings ofGECCO 2004, Part 1 (Lecture Notes in Computer Science, vol. 3102), K. Deb etal. (eds). Springer-Verlag, pages 725-736.

  • Handling constrains

  • Outline

    •  Introduction

    •  Penalty methods

    •  Approach based on tournament selection

    •  Decoders

    •  Repair algorithms

    •  Constraint-preserving operators

  • Introduction

    •  So far we have only looked at unconstrained optimization

    problems.

    •  But most real world problems have constraints that must

    be satisfied.

    •  In this lecture we will look at techniques that can be used

    to address constraints within an EA.

  • Introduction

    •  Naïve approach: If a solution is infeasible, discard it.

    •  Problem:

    –  In many cases, the feasible region of the search space is

    very small.

    –  We need to use the information from infeasible

    solutions to guide the search towards feasible regions.

    •  NOTE: The feasible region often consists of several

    disconnected, non-convex, regions.

  • Infeasible Region

    Note: Image adapted from Carlos Coello’s tutorial presented at GECCO 2010

    Feasible Region

  • General definition of a constrained

    optimization problem

    •  Objective function defined over n variables, x1, x2, …, xn •  Each xi has a value within a given range.

    •  J inequality constraints.

    •  K equality constraints.

  • Penalty methods

    •  Idea:

    –  Penalize infeasible solutions by adding a penalty term

    to the objective function value (for minimization

    problems).

    Fitness value Objective

    function value Penalty term

  • Penalty methods (cont.)

    •  Rj is the penalty parameter of the jth inequality constraint.

    •  Purpose of it is to make the constraint violation of the

    same order of magnitude as the objective function value.

    •  is the absolute value of the operand if operand is

    negative, and zero otherwise.

  • Equality constraints

    •  Equality constraints are usually transformed into inequality

    constraints as follows:

    •  where is a very small number.

    •  Number of inequality constraints becomes m = J + K

  • •  With m constraints we need m Rj’s (j=1…m) penalty

    parameters.

    •  By normalizing the constraints, only one parameter R is

    needed.

    •  Normalization makes all constraint violations of the same

    order of magnitude.

  • •  Problems with the approach:

    –  Parameter R (or Rj’s) difficult to select.

    –  User has to try different values and see what works

    best.

    Problems with the penalty approach

  • •  Static

    –  Penalty term is constant through time.

    •  Dynamic

    –  Penalty term depends on the phase of evolution

    (generation number).

    –  The idea is to increase the penalty term through time

    because at the end of the search we don’t want to have

    infeasible solutions in the population.

    Different penalty method strategies

  • •  Adaptive

    –  Penalty term changes according to the feasibility or

    infeasibility of the best solution during most recent k

    generations.

    •  Penalty increases if in the previous k generations, the best

    solution was always infeasible.

    •  Penalty decreases if in the previous k generations, the best

    solution was always feasible.

    •  Penalty stays the same otherwise.

    Different penalty method strategies (cont.)

  • •  An elegant solution that does not require penalty

    parameters.

    •  Idea based on binary tournament selection. 3 cases:

    1.  If the two solutions are feasible, choose the one with

    the best objective function value.

    2.  If one is feasible and the other infeasible, choose the

    feasible one.

    3.  If both are infeasible, choose the one that violates less

    the constraints.

    Kalyanmoy Deb’s proposal based on

    tournament selection

  • •  The use of decoders in another approach to handle

    constraints.

    •  Decoders interpret the chromosome of an individual in

    such a way that a feasible solution is always constructed.

    Decoders

  • •  Example for the 0-1 knapsack problem

    –  Given a set of items Xi (i=1..n), each with weight Wi

    and profit Pi, find a subset of items such that the total

    profit is maximum, and such that the total weight does

    not exceed a maximum capacity C.

    •  Traditional encoding for this problem uses bitstrings:

    –  Xi=1 means item i belongs to the knapsack.

    –  Xi=0 means item i does not belong to the knapsack.

    Decoders (cont.)

  • •  Decoder approach:

    –  A sequence of items for the knapsack is interpreted as

    “take an item if possible” (i.e., take the item if its

    inclusion does not violate the capacity constraint).

    –  The sequence is usually sorted in decreasing order

    profit-per-weight ratio, Pi / Wi.

    –  Example: 110011

    •  Take the 1st item if it fits in the knapsack.

    •  Take the 2nd item if it fits in the knapsack.

    •  Take the 5th item …

    •  Take the 6th item …

    Decoders (cont.)

  • •  This is another approach for handling constraints.

    •  A repair algorithm maps an infeasible solution into a

    feasible one.

    •  Two approaches:

    –  The repair is made for evaluation purpose only.

    –  The repaired individual replaces the original one in the

    population. (Choice can be made probabilistically)

    Repair algorithms

  • •  Example with the 0-1 knapsack problem:

    –  If solution is infeasible, keep removing items from the

    knapsack until the capacity constraint is not violated.

    Repair algorithms (cont.)

  • •  The idea is to design genetic operators so that there is the

    guarantee that only feasible solutions are produced.

    •  We have seen examples of this earlier in the course with

    the permutation operators for ordering problems.

    Constraint-preserving operators