Elements of Evolutionary Algorithms - Luis...

Elements of Evolutionary Algorithms Luis Martí Orosa LIRA/DEE/PUC-Rio

From the previous class

• Roots of evolutionary computation: • Biological inspiration. • Optimization and search problems. • Briefly overviewed the main approaches. • Outlined the contents of the course.

In this class:

• What are the key elements of an evolutionary algorithm? • Representation; • Evolutionary operators. • Constraint handling.

We will focus mostly in Genetic Algorithms but many of these topics apply to other EAs.

Evolutionary Operators

A Simple Evolutionary Algorithm

Simple Evolutionary Algorithm

Generate the initial population P(0) at random, and set t 0.repeat

Evaluate the fitness of each individual in P(t).Select parents from P(t) based on their fitness.Obtain population P(t + 1) by

applying crossover and mutation to parents.Set t t + 1.

until termination criterion satisfied.

I Basic idea from natural evolution and population genetics.

I Survival of the fittest.

Exploration and Exploitation

Exploration of new parts of search space

I Mutation operators

I Recombination operators

Exploitation of promising genetic material

I Selection mechanism

Mutation operators for bitstrings

The mutation operator introduces small,random changes to an individual’s chromosome.

Local Mutation

I One randomly chosen bit is flipped.

Global Mutation

I Each bit flipped independently with a given probability pm,called the per bit mutation rate, which is often 1/n,where n is the chromosome length.

Pr [kbits flipped] =

✓n

k

◆· pkm · (1� pm)n�k .

Mutation rate

I Note the di↵erence between per bit (gene)and per chromosome (individual) mutation rates.

Recombination operators - One point crossoverThe recombination operator generates an o↵spring individualwhose chromosome is composed from the parents’ chromosomes.

Crossover rate

I probability of applying crossover to parents

One point crossover between parents x and y

Randomly select a crossover point p in {1, 2, ..., n}.O↵spring 1 is x

1

· · · xp · yp+1 · · · yn.O↵spring 2 is y

1

· · · yp · xp+1 · · · xn.

Example

Parent x : 101011 | 1010 O↵spring 1: 101011 | 1110Parent y : 010100 | 1110 O↵spring 2: 010100 | 1010

Recombination operators - Multi-point crossover

k-point crossover between parents x and y

Randomly select k crossover points p1

< · · · < pk in {1, 2, ..., n}.O↵spring 1 is x

1

· · · xp1

· yp1

+1

· · · yp2

· xp2

+1

· · · xp3

· · · etc.O↵spring 2 is y

1

· · · yp1

· xp1

+1

· · · xp2

· yp2

+1

· · · yp3

· · · etc.

Example (2-point crossover)

Parent x : 101 | 011 | 1010 O↵spring 1: 101 | 100 | 1010Parent y : 010 | 100 | 1110 O↵spring 2: 010 | 011 | 1110

Recombination operators - Uniform crossover

Uniform crossover between parents x and y

Select a bitstring z of length n uniformly at random.for all i in 1 to n

if zi = 1 then bit i in o↵spring 1 is xi else yi .if zi = 1 then bit i in o↵spring 2 is yi else xi .

Example

z =1010001110

Parent x : 1010111010 O↵spring 1: 1111001010

Parent y : 0101001110 O↵spring 2: 0000111110

Selection and Reproduction

Selection emphasizes the better solutions in a population

I One or more copies of good solutions.

I Inferior solutions are much less likely to be selected.

I Not normally considered a search operator,but influences search significantly

Selection can be used either before or after search operators.

I When selection is used before search operators, the process ofchoosing the next generation from the union of all parentsand o↵spring is sometimes called reproduction.

Generational gap of EA

I refers to the overlap (i.e., individuals that did not go throughany search operators) between the old and new generations.

I The two extremes are generational EAs and steady-state EAs.

I 1-elitism can be regarded as having a generational gap of 1.

Fitness Proportional Selection

Probability of selecting individual x from population P is

Pr [x ] =f (x)Py2P f (y)

.

I Use raw fitness in computing selection probabilities.Does not allow negative fitness values.

I Also known as roulette wheel selection.

Weaknesses

I Domination of “super individuals” in early generations.

I Slow convergence in later generations.

Fitness scaling often used in early days to combat problem

I Fitness function f replaced with a scaled fitness function f̃ .

Fitness Scaling 1/2

Simple scaling

f̃ (x) := f (x)� fmin,!, where

I ! is scaling window

If

min,! is lowest observed fitness in last ! generations

Sigma scaling

f̃ (x) := min{0, f (x)� (f̄ � c · �f )}, where

Ic is a constant, e.g. 2

If̄ is average fitness in current population

I �f is the standard deviation of the fitnessin the current population

Fitness Scaling 2/2

Power scaling

f̃ (x) := f (x)k , where k > 0.

Exponential scaling

f̃ (x) := exp(f (x)/T ), where

IT > 0 is the temperature, approaching zero.

Ranking Selection

1. Sort population from best to worst according to fitness:

x

(��1), x(��2), x(��3), ..., x(0)

2. Select the �-ranked individual x(�) with probability Pr [�],

where Pr [�] is a ranking function, e.g.I linear rankingI exponential rankingI power rankingI geometric ranking

Linear ranking

Population size �, and rank�, 0 � �� 1, (0 worst)

Linear ranking

Pr

linear

[�] :=↵+ (� � ↵) · ��1

�

whereP��1

�=0 Prlinear [�] = 1 implies↵+ � = 2 and 1 � 2.

In expectation

I best individual reproduced � times

I worst individual reproduced ↵ times.

�

0

�� 1

↵ �

Rank

Other ranking functions

Power ranking

Pr

power

[�] :=↵+ (� � ↵) ·

⇣�

��1

⌘k

C

,

Geometric ranking

Pr

geom

[�] :=↵ · (1� ↵)��1��

C

,

Exponential ranking

Pr

exp

[�] :=1� e��

C

,

where C is a normalising factor and 0 < ↵ < �.

Tournament Selection

Tournament selection with tournament size k

Randomly sample a subset P 0 of k individuals from population P .Select the individual in P 0 with highest fitness.

I Often, tournament size k = 2 is used.

(µ+ �) and (µ,�) selection

Origins in Evolution Strategies.

(µ+ �)-selection

Parent population of size µ.Generate � o↵spring from randomly chosen parents.Next population is µ best among parents and o↵spring.

(µ,�)-selection (where � > µ)

Parent population of size µ.Generate � o↵spring from randomly chosen parents.Next population is µ best among o↵spring.

Selection pressureDegree to which selection emphasizes the better individuals.How can selection pressure be measured and adjusted?

Take-over time ⌧⇤ [Goldberg and Deb, 1991, Bäck, 1994].1. Initial population with unique fittest individual x⇤.2. Apply selection operator reapeadly with no other operators.3. ⌧⇤ is number of generations until population consists of x⇤

only.Higher take-over time ! lower selection pressure.

Fitness prop. ⌧⇤ ⇡ � ln�c assuming fitness f (x) = exp(cx)

Linear ranking ⌧⇤ ⇡ 2 ln(��1)��1 1 < � < 2

Tournament ⌧⇤ ⇡ ln�+ln ln�ln k tournament size k

(µ,�) ⌧⇤ = ln�ln(�/µ)

Summary

I Exploration and exploitation

I Mutation operators

I Recombination operators

I Selection mechanisms

I Selection pressure

Main References

Bäck, T. (1994).Selective pressure in evolutionary algorithms: Acharacterization of selection mechanisms.In Proceedings of the 1st IEEE Conf. on EvolutionaryComputation, pages 57–62. IEEE Press.

Goldberg, D. E. and Deb, K. (1991).A comparative analysis of selection schemes used in geneticalgorithms.In Foundations of Genetic Algorithms, pages 69–93. MorganKaufmann.

Representation

Discrete vs Real-valued Crossover Operators Hybrids with Local Search Mutation Operators Reading & Assessment

Introduction

Representation of a problem! Each individual corresponds to a solution x = x1x2 . . . xn! We can modify the solution by means of crossover and mutation

Last lecture: Discrete representations! Each gene has a value taken from a finite set! E.g., xi ∈ {0, 1} or xi ∈ {4, 5, 6, 7, 8}

This lecture: Real-valued (continuous) representations! Each gene has a value taken from a continuous interval! E.g., xi ∈ [−5,5] or xi ∈ [0, 1]


Problems with Discrete Representations

Q: Why do we need another type of representation?

A binary encoding can represent any integer! n bits encode 2n integers from 0 to 2n − 1! E.g., 0000 = 0, 0001 = 1, 0010 = 2, 0011 = 3, . . .

Hamming Cliffs! Locality may not be preserved! 0111 = 7 BUT 1000 = 8 (think mutation operator)

Gray Coding partly overcomes this issue! Converts the mapping from binary to integer! Can reach any adjacent integer by single bit-flip


Gray Coding

Integer Binary Gray0 0000 00001 0001 00012 0010 00113 0011 00104 0100 01105 0101 01116 0110 01017 0111 01008 1000 11009 1001 110110 1010 111111 1011 111012 1100 101013 1101 101114 1110 100115 1111 1000


Binary-based Real-valued Representations

Can also represent most floating point numbers as binary! Discretising the continuous domain

How? Given domain [−2, 2] and precision of 6 decimal places! Divide domain [−2, 2] into 4 · 1000000 intervals! We need 22 bits (4× 1000000 < 222)! Convert binary string to integer x ′, then convert to real

x = −2+ x ′ · 4222 − 1

! But same problem as before

Using a real-valued representation overcomes these issues! Real-valued representation may be a more natural choice


Discrete Recombination

Does not change actual (gene) values! Very similar to the crossover operators on binary strings

Multi-point Recombination (1-point, 2-point, k-point crossover)! Similar to that for the binary representation

Global Discrete Recombination! Similar to uniform crossover for the binary representation

x1: 0.21 1.87 3.66 | 1.11 2.25 x ′1: 0.21 1.87 3.66 | 2.56 0.11x2: 2.32 0.77 2.99 | 2.56 0.11 x ′2: 2.32 0.77 2.99 | 1.11 2.25


Intermediate Recombination

We can actually modify the genetic values! In the discrete case, resulting value might not be valid! E.g., average in binary (x1i = x2i = 1): x ′i = (x1i + x2i )/2 = 0.5 /∈ {0, 1}

For 2 parents x1 and x2 and i = 1, 2, . . . , n:

x ′i = αx1i + (1− α)x2i where α ∈ [0, 1]

Given α = 0.5:

x1: 0.21 1.87 3.66 1.11 2.25 x2: 2.32 0.77 2.99 2.56 0.11

x ′: 1.27 1.32 3.33 1.84 1.18


Arithmetic & Heuristic Recombination

Arithmetic crossover (for p parents):

x ′i = α1x1i + α2x2i + α3x3i + . . . wherep

X

i=1αi = 1

! Generalised (simple) intermediate crossover

Heuristic crossover (where x1 is no worse than x2):

x ′ = u(x1 − x2) + x1 where u = rand([0,1])

! Partially reflected point


Heuristic Recombination

Heuristic crossover example: Assume f (x1) ≥ f (x2) and xi ∈ [0, 4]

x1: 0.21 1.87 3.66 1.11 2.25 x2: 2.32 0.77 2.99 2.56 0.11

0.13 · (0.21− 2.32) + 0.21 , 0.47 · (1.87− 0.77) + 1.87 . . .

x ′: −0.06 2.39 . . .

What is happening here?What if resulting values are out of bound (e.g., −0.06)?


Simplex Recombination I

Randomly select a group (> 2) of parents from the population

Find the best xmax and worst xmin in the group! The offspring x ′ will eventually replace xmin

Compute the centroid, xc , of the group ignoring xmin

xc =1n

n−1X

i=1xi

Generate a reflected point using xmin and xc:

x r = xc + (xc − xmin)


Simplex Recombination II

If f (x r ) ≥ f (xmax ), then create expanded point:

xe = x r + (x r − xc)

If f (xe) ≥ f (x r ), then x ′ ← xe, else x ′ ← x r

If f (x r ) ≥ f (xmin), then x ′ ← x rIf f (x r ) ≤ f (xmin), then compute contracted point xq

xq = (xmin + xc)/2

If f (xq) ≥ f (xmin), then x ′ ← xqElse, offspring is contracted point towards xmax

x ′ = (xmin + xmax )/2


Geometric & Quadratic Recombination

Geometric recombination! For 2 parents:

x ′ = [(x11x21)12 , . . . , (x1nx2n)

12 ]

! Can be generalised to k parents:x ′ = [(xα111 x

α221 x

αkk1 ), . . . , (x

α11n x

α22n x

αkkn )]

wherePk

i=1 αk = 1

Quadratic recombination! From 3 parents, generate offspring using quadratic interpolation:

x ′j =12·(x22j − x23j)f (x1) + (x23j − x21j)f (x2) + (x21j − x22j )f (x3)(x2j − x3j)f (x1) + (x3j − x1j)f (x2) + (x1j − x2j )f (x3)


What Does Quadratic Recombination Mean


A Hybrid EA With Local Search

1. Initialize µ individuals at random

2. Perform local search on each individual

3. REPEAT3.1 Generate P1,P2,P3 by global discrete recombination3.2 Perform quadratic approximation using P1,P2,P3 to produce P43.3 Perform local search from P43.4 Place P1,P2,P3,P4 into population3.5 Perform a (µ + 4) truncation selection

4. UNTIL termination criteria are met


Local Search With Random Memorising I

Local search in continuous domain! Usually differs from local search in the discrete domain! Use techniques like Simplex method, quasi Newton procedures, etc.! Even with restarts, not good for finding global optimum of rugged functions

Combine local search with global search method (e.g., EA)! Local search is expensive, especially if same local optimum is found multiple times

Use random memorising to improve efficiency:! Store best solutions in a sequential memory (up to certain depth)! Retrieve a random one when a new best solution is found! Search along the direction of old→ new


Local Search With Random Memorising II

Search along the direction of old→ newFirst, compute direction

s = xnew − xold∥xnew − xold∥

Second, perform local search with increasing step size

xcb = xcb−1 + s · ba · dg

! where s is the direction! ba the search step multiplier (which increases over time)! dg is the original, global step size

New point added to memory as used for EA to continue search


Experimental Studies

18 multimodal benchmark functions! f8 − f25 (a set of functions)! n ∈ {2, 4, 6, 30}, domains differ for each problem

Population size is N = 30

Maximum function evaluation F = 500000

50 independent runs for each functionindependent: Different random seeds, starting from different initial populations


Results on f8–f25

f F µ σ %8 199244 -11834.65 298.78 1/509 306280 2.67 1.58 5/5010 370686 2.08e-12 2.11e-12 50/5011 173592 1.87e-10 1.31e-9 50/5012 476245 5.84e-9 2.16e-8 50/5013 504212 1.19e-2 3.01e-2 33/5014 3052 1.04 0.20 48/5015 111748 3.0749e-4 2.45e-10 50/5016 2817 -1.031628 2.70e-8 50/5017 5496 0.3979 8.26e-9 50/5018 4676 3 0 50/5019 6852 -3.86 0 50/5020 17504 -3.32 2.35e-7 50/5021 13790 -10.1532 0 50/5022 13354 -10.4029 2.16e-7 50/5023 14312 -10.5364 4.58e-7 50/5024 10754 3.89e-4 1.90e-3 48/5025 15614 1.07e-4 7.51e-4 50/50


Results on f8 and f9 with N ∈ {50,60}

f F µ σ %8 199244 -11834.65 298.78 1/509 306280 2.67 1.58 5/50

f N F µ σ %8 50 327434 -12296.68 119.27 1/508 60 391634 -12358.64 166.33 12/509 50 508021 2.98e-1 4.97e-1 36/509 60 610163 2.19e-1 4.12e-1 39/50


Some Observations

Different problems require different operators and selection schemes! There is no universally best one! Important to understand what to use when!

Real valued representations may be more appropriate than binary! A more natural choice for function optimisation! Some guidelines exists for choosing good representations (neighbourhood, etc.)

Many search operators are heuristics-based! Domain knowledge can often be incorporated into operators and representation


Global Optimisation by Mutation-Based EAs I

1. Generate initial population of µ individuals, and set k = 1! Each individual is a real-valued vector (xi ∈ [l ,u])

2. Evaluate the fitness of each individual

3. Each individual creates a single offspring: For j = 1, · · · , n

xij ′ = xij + Nj(0,1)

4. Calculate the fitness of each offspring


Global Optimisation by Mutation-Based EAs II

5. For each individual, q opponents are chosen from parents and offspring! If the individual’s fitness is no less than the opponent’s, it receives a win

6. Select the µ best individuals (from 2µ) that have the most wins! They constitute the next generation.

7. Stop if the stopping criterion is satisfied;! Otherwise, k = k + 1 and go to Step 3


Analysis of mutation operator

N(0,1) denotes a normally distributed random number! The mean is µ = 0 and the standard deviation is σ = 1 (i.e., N(µ, σ))

Nj(0, 1) means a newly sampled value for each index j

µ determines the search step size of the mutation.! It is a crucial parameter

Unfortunately, the optimal search step size is problem-dependent! Even for a single function, different search step sizes may be optimal! Self-adaptation can be used to get around this problem partially


Function Optimisation by Classical EP (CEP)

Each individual (xi , ηi) creates a single offspring (xi ′, ηi ′)For j = 1, · · · , n

ηij′ = ηij exp(τ ′N(0, 1) + τNj(0, 1)) (1)

xij ′ = xij + ηijNj (0, 1) (2)

The factors τ ′ and τ are commonly set to:!

“√2n

”−1(global step-size)

!

“

p

2√n

”−1(local step-size)


Fast EP

The idea comes from fast simulated annealing.

Replace Gaussian distribution with a Cauchy one in Eq.(2)

xij ′ = xij + ηijCj(1) (3)

Cj(1) is an Cauchy random number variable with the scale parameter t = 1! It is generated anew for each value of j

Everything else, including Eq.(1), is kept unchanged


Cauchy Distribution

Its density function is

ft(x) =1π

tt2 + x2 , −∞ < x 0 is a scale parameter (step size)

The corresponding cumulative distribution function is

Ft (x) =12 +

1πarctan

“xt

”


Gaussian and Cauchy Density Functions


Summary on Mutation

Cauchy mutation performs well when distance to global optimum is far away! Its behaviour can be explained theoretically and empirically

Can derive optimal search step size if global optimum is known! Unfortunately, such information is unavailable for real-world problems

The performance of FEP can be improved by a set of more suitable parameters! Instead of copying CEP’s parameter setting (as done here for comparison)

Many further possibilities! Use both Gaussian and Cauchy (generate 2 offspring, keep the better)! There are other distributions that may be used (see later slide)


Improved Fast Evolutionary Programming f1 and f10


Continuous Gray Coding Distribution

J. E. Rowe & D. Hidovic “An evolution strategy using a continuous version of the Gray-code neighbourhood distribution”


References (Essential Reading!)

1. H.-M. Voigt and J. Lange “Local evolutionary search enhancement by randommemorizing” Proc. of the 1998 IEEE Int. Conf. on Evolutionary Computation.IEEE Press, Piscataway, NJ, USA, pp.547-552, 1998.

2. K.-H. Liang, X. Yao and C. S. Newton “Combining landscape approximation andlocal search in global optimization” Proc. of the 1999 Congress on EvolutionaryComputation. Vol. 2, IEEE Press, Piscataway, NJ, USA, pp.1514-1520, July1999.

3. X. Yao, Y. Liu and G. Lin “Evolutionary programming made faster” IEEETransactions on Evolutionary Computation. 3(2):82-102, July 1999.

4. M-level: Rowe, J.E. and Hidovic, D. (2004) An Evolution Strategy using acontinuous version of the Gray-code neighbourhood distribution. Proceedings ofGECCO 2004, Part 1 (Lecture Notes in Computer Science, vol. 3102), K. Deb etal. (eds). Springer-Verlag, pages 725-736.

Handling constrains

Outline

•  Introduction

•  Penalty methods

•  Approach based on tournament selection

•  Decoders

•  Repair algorithms

•  Constraint-preserving operators

Introduction

•  So far we have only looked at unconstrained optimization

problems.

•  But most real world problems have constraints that must

be satisfied.

•  In this lecture we will look at techniques that can be used

to address constraints within an EA.

Introduction

•  Naïve approach: If a solution is infeasible, discard it.

•  Problem:

–  In many cases, the feasible region of the search space is

very small.

–  We need to use the information from infeasible

solutions to guide the search towards feasible regions.

•  NOTE: The feasible region often consists of several

disconnected, non-convex, regions.

Infeasible Region

Note: Image adapted from Carlos Coello’s tutorial presented at GECCO 2010

Feasible Region

General definition of a constrained

optimization problem

•  Objective function defined over n variables, x1, x2, …, xn •  Each xi has a value within a given range.

•  J inequality constraints.

•  K equality constraints.

Penalty methods

•  Idea:

–  Penalize infeasible solutions by adding a penalty term

to the objective function value (for minimization

problems).

Fitness value Objective

function value Penalty term

Penalty methods (cont.)

•  Rj is the penalty parameter of the jth inequality constraint.

•  Purpose of it is to make the constraint violation of the

same order of magnitude as the objective function value.

•  is the absolute value of the operand if operand is

negative, and zero otherwise.

Equality constraints

•  Equality constraints are usually transformed into inequality

constraints as follows:

•  where is a very small number.

•  Number of inequality constraints becomes m = J + K

•  With m constraints we need m Rj’s (j=1…m) penalty

parameters.

•  By normalizing the constraints, only one parameter R is

needed.

•  Normalization makes all constraint violations of the same

order of magnitude.

•  Problems with the approach:

–  Parameter R (or Rj’s) difficult to select.

–  User has to try different values and see what works

best.

Problems with the penalty approach

•  Static

–  Penalty term is constant through time.

•  Dynamic

–  Penalty term depends on the phase of evolution

(generation number).

–  The idea is to increase the penalty term through time

because at the end of the search we don’t want to have

infeasible solutions in the population.

Different penalty method strategies

•  Adaptive

–  Penalty term changes according to the feasibility or

infeasibility of the best solution during most recent k

generations.

•  Penalty increases if in the previous k generations, the best

solution was always infeasible.

•  Penalty decreases if in the previous k generations, the best

solution was always feasible.

•  Penalty stays the same otherwise.

Different penalty method strategies (cont.)

•  An elegant solution that does not require penalty

parameters.

•  Idea based on binary tournament selection. 3 cases:

1.  If the two solutions are feasible, choose the one with

the best objective function value.

2.  If one is feasible and the other infeasible, choose the

feasible one.

3.  If both are infeasible, choose the one that violates less

the constraints.

Kalyanmoy Deb’s proposal based on

tournament selection

•  The use of decoders in another approach to handle

constraints.

•  Decoders interpret the chromosome of an individual in

such a way that a feasible solution is always constructed.

Decoders

•  Example for the 0-1 knapsack problem

–  Given a set of items Xi (i=1..n), each with weight Wi

and profit Pi, find a subset of items such that the total

profit is maximum, and such that the total weight does

not exceed a maximum capacity C.

•  Traditional encoding for this problem uses bitstrings:

–  Xi=1 means item i belongs to the knapsack.

–  Xi=0 means item i does not belong to the knapsack.

Decoders (cont.)

•  Decoder approach:

–  A sequence of items for the knapsack is interpreted as

“take an item if possible” (i.e., take the item if its

inclusion does not violate the capacity constraint).

–  The sequence is usually sorted in decreasing order

profit-per-weight ratio, Pi / Wi.

–  Example: 110011

•  Take the 1st item if it fits in the knapsack.

•  Take the 2nd item if it fits in the knapsack.

•  Take the 5th item …

•  Take the 6th item …

Decoders (cont.)

•  This is another approach for handling constraints.

•  A repair algorithm maps an infeasible solution into a

feasible one.

•  Two approaches:

–  The repair is made for evaluation purpose only.

–  The repaired individual replaces the original one in the

population. (Choice can be made probabilistically)

Repair algorithms

•  Example with the 0-1 knapsack problem:

–  If solution is infeasible, keep removing items from the

knapsack until the capacity constraint is not violated.

Repair algorithms (cont.)

•  The idea is to design genetic operators so that there is the

guarantee that only feasible solutions are produced.

•  We have seen examples of this earlier in the course with

the permutation operators for ordering problems.

Constraint-preserving operators

Elements of Evolutionary Algorithms - Luis...

Documents

Transcript of Elements of Evolutionary Algorithms - Luis...