TPAM: A Simulation-Based Model for Quantitatively ...Introduction Adaptive DEs TPAM Experimental...

Introduction Adaptive DEs TPAM Experimental settings Experiment using TPAM Conclusion

TPAM:A Simulation-Based Model for QuantitativelyAnalyzing Parameter Adaptation Methods

Ryoji Tanabe (SUSTech, CHN) and Alex Fukunaga (U. Tokyo, JP)

The first quantitative analysis of PAMs in isolationParameter Adaptation Methods (PAMs) are poorly understoodWe propose a Target function-based PAM simulation (TPAM)

framework for analyzing PAMs in adaptive DETPAM measures the ability of PAMs to track predefined target

parameters and enables quantitative analysis of PAMs

E.g., PAM-JADE tracks this particular target behavior

1.4 times better than PAM-SHADE

1 / 21


Differential Evolution (DE) [Storn 97]

The two main control parameters of DE are:

1. Scale factor F ∈ (0, 1]:

F controls the magnitude of the differential mutation

2. Crossover rate C ∈ [0, 1]:

C controls the number of inherited variables from x

1 t← 1, initialize the population P t = {x1,t, ...,xN,t};2 while The termination criteria are not met do3 for i ∈ {1, ..., N} do4 vi,t ← differentialMutation(P t, Fi,t);5 ui,t ← crossover(xi,t,vi,t, Ci,t);

6 for i ∈ {1, ..., N} do7 if f(ui,t) ≤ f(xi,t) then xi,t+1 ← ui,t ;8 else xi,t+1 ← xi,t ;

9 t← t+ 1;

2 / 21


The performance of DE depends on the setting of F and C

Number of function evaluations (FEvals)

Rastrigin (separable) Rosenbrock (nonseparable)

C = 0.1 is suitable

C = 0.9 is unsuitable

C = 0.9 is suitable

C = 0.1 is unsuitable

E.g., An appropriate setting of C depends on separability

The classical DE is not so efficient for black-box optimization

DE needs automated parameter control methods for F and C

A large number of adaptive DE algorithms have been proposed

jDE [Brest 06], JADE [Zhang 09], SaDE [Qin 09],EPSDE [Mallipeddi 11], MDE [Islam 12], SHADE [Tanabe 13], ...

3 / 21


Parameter Adaptation Methods (PAMs) are poorly understood

We are interested in PAMs in adaptive DE, not adaptive DE

PAM of jDE rand/1 binomial

Mutation strategy

PAM of JADE current-to-pbest/1 binomial

jDE =

JADE =

PAMs (our interest) Cross.

Several previous works have tried to analyze PAMs in adaptive DE

Their results/analysis limited to plots of changes in F and C

Rosenbrock

Rastrigin

Number of iterations

The meta-parameter of C quickly drops down

The meta-parameter of C gradually increases

Analysis limited toqualitative descriptions

Such qualitative results do notprovide sufficient insight

E.g., Adaptation process of the meta-parameter of JADE for C

4 / 21


Proposal: Target function-based PAM simulation (TPAM)

Traditional analyses based on plotting parameter values:

Limited to qualitative descriptionsMay be useful for analyzing overall behavior of adaptive DEs,but cannot analyze the behavior of PAMs in isolation

Our proposed TPAM:

Quantitatively evaluates the tracking performance of PAMsEnables analysis for PAMs in isolation

Independent of other mechanisms in adaptive DE

Measures the ability of PAMs to track predefined targetparameters and enables quantitative analysis of PAMs

E.g., PAM-JADE tracks this particular target behavior 1.4

times better than PAM-SHADE

5 / 21


A generalized Parameter Adaptation Method (PAM) in adaptive DE

1. At the beginning of each iteration t, generate Fi,t and Ci,t foreach individual xi,t using meta-parameters

2. Decide whether {Fi,t, Ci,t} is a success or a failure

Success: if the child ui,t is better than the parent xi,t

I.e., if f(ui,t) ≤ f(xi,t)

Failure: Otherwise

3. At the end of each iteration t, update the meta-parametersbased on the success/failure decisions

Strictly speaking:

Some PAMs do not use meta-parameters

E.g., PAM-jDE, PAM-EPSDE,...

But, such PAMs can also be generalized into the aboveframework

6 / 21


Example: PAM-JADE [Zhang 09]

PAM-JADE uses two meta-parameters µF and µC forparameter adaptation of F and C, respectively

1 t← 1, initialize the population P t, µF , µC ← 0.5;2 while The termination criteria are not met do

3 SF ← ∅, SC ← ∅;4 for i ∈ {1, ..., N} do5 Fi,t ← CauchyRand(µF , 0.1), Ci,t ← NomalRand(µC , 0.1);

6 vi,t ← differentialMutation(P t, Fi,t);7 ui,t ← crossover(xi,t,vi,t, Ci,t);

8 for i ∈ {1, ..., N} do9 if f(ui,t) ≤ f(xi,t) then

10 xi,t+1 ← ui,t, SF ← Fi,t, SC ← Ci,t

11 else xi,t+1 ← xi,t ;

12 µF ← (1− c)µF + c meanA(SF ), µC ← (1− c)µC + c meanL(SC);

13 t← t+ 1;

7 / 21


Example: PAM-SHADE [Tanabe 13]

PAM-SHADE uses MF = (MF1 , ...,MF

H)T and MC = (MC1 , ...,MC

H )T

1 t← 1, initialize the population P t, MF ,MC ← 0.5, k ← 1;2 while The termination criteria are not met do

3 SF ← ∅, SC ← ∅;4 for i ∈ {1, ..., N} do5 ri,t ← Randi{1, ..., N}, Fi,t ← CauchyRand(MF

ri,t, 0.1), Ci,t ←

NomalRand(MCri,t

, 0.1);

6 vi,t ← differentialMutation(P t, Fi,t);7 ui,t ← crossover(xi,t,vi,t, Ci,t);

8 for i ∈ {1, ..., N} do9 if f(ui,t) ≤ f(xi,t) then

10 xi,t+1 ← ui,t, SF ← Fi,t, SC ← Ci,t

11 else xi,t+1 ← xi,t ;

12 MFk ← meanA(SF ), MC

k ← meanL(SC), k ← mod(k + 1,M);

13 t← t+ 1;8 / 21


Basic idea of our proposed TPAM: Space reduction

A traditional approach must consider three complex spaces

Control parameter space Solution space Objective function space

ALL previous work used the objective function value of thesolution f(x) to evaluate the generated control parametersSince x is also affected by variation operators, suchapproaches cannot evaluate the control parameters in isolation

PROPOSAL:

Let’s remove the solution space from optimization problems!

The TPAM approach considers only the two simplified spaces

Control parameter space TPAM success/failure space

The solution space is eliminated, and the objective functionspace is replaced by the TPAM success/failure spaceControl parameters can be directly evaluated in isolation 9 / 21


Some notes on this presentation

Recall: We are NOT interested in “the whole adaptive DE”

We want to focus only on Parameter Adaptation Methods

We are interested in PAM-jDE, not jDE

TPAM is NOT a class of benchmark function for optimization

The TPAM is a simulation framework to analyze PAMs

TPAM does NOT seek to optimize a static objective function f

TPAM measures the target-tracking behavior of a PAMIndividuals in TPAM only have F , C values but no base levelgenome, so variation operators (e.g., mutation) are irrelevant

TPAM can deal with both F and C simultaneously

We investigated (i) C, (ii) F , and (iii) {F,C}But, the tendency of their results is not different

For simplicity, we focus on C in this talk

10 / 21


The process of PAMs in adaptive DE only depend onwhether each child generation is a success or a failure

Analyzing the PAM behavior does not require modeling the

absolute objective function values of the individualsA simulation model of the success/failure decisions is sufficientParameter adaptation of PAMs can be simulated by using a

surrogate model deciding whether {Fi,t, Ci,t} is a success or not

1 t← 1;2 while The termination criteria are not met do3 SF ← ∅, SC ← ∅;4 for i ∈ {1, ..., N} do5 Generate Fi,t and Ci,t according to meta-parameters;

6 for i ∈ {1, ..., N} do

7 if The pair of Fi,t and Ci,t is successful then8 SF ← Fi,t, S

C ← Ci,t

9 Update the meta-parameters based on SF and SC ;10 t← t+ 1;

11 / 21


A model of the success/failure decisions in the proposed TPAM

Target parameters θtarget1 , θtarget2 , ... are introduced in TPAM

The dicision is made based on the distance from θtarget

The closer θ is from θtarget, the higher its success probability

Successful parametersFailed parameters

Target parameters

The number of iterations

Success probability depends on two parameters: pmax and d ∈ [0, 1]

A smaller pmax value makes a simulation model difficult

0d: Distance between a generated parameter and the target parameter

p max

Max. probability

Success probability = -d + p max

1

0.5

0.1

12 / 21


Target parameters θtarget1 , θtarget2 , ... are given by a target function g

1. Two types of the linear functions glin/inc and glin/dec

0 10000 20000 30000 40000 500000.0

0.2

0.4

0.6

0.8

1.0

Target ParametersPAM-JADE’s µ

0 10000 20000 30000 40000 500000.0

0.2

0.4

0.6

0.8

1.0

2. The sinusoidal function gsin (left: ω = 10, right: ω = 40)

0 10000 20000 30000 40000 500000.0

0.2

0.4

0.6

0.8

1.0

0 10000 20000 30000 40000 500000.0

0.2

0.4

0.6

0.8

1.0

3. The random-walk function gran (left: s = 0.04, right: s = 0.1)

0 10000 20000 30000 40000 500000.0

0.2

0.4

0.6

0.8

1.0

0 10000 20000 30000 40000 500000.0

0.2

0.4

0.6

0.8

1.0

13 / 21


The overall TPAM framework

1 t← 1, initialize a meta-parameter;2 while The termination criteria are not met do3 θtargett ← g(t);

4 Sθ ← ∅;5 for i ∈ {1, ..., N} do6 Generate θi,t according to the meta-parameter;

7 for i ∈ {1, ..., N} do8 if isParametersSuccessful(θi,t, θ

targett ) = successful then

9 Sθ ← θi,t;

10 Update the meta-parameter based on Sθ;11 t← t+ 1;

A parameter θ is (i) C, (ii) F , or (iii) {F,C}

14 / 21


Experimental settings

Settings for Parameter Adaptation Methods (PAMs)

PAM-jDE, PAM-EPSDE, PAM-JADE, PAM-MDE, PAM-SHADE

For each PAM, the hyperparameters recommended bythe original authors were used

Population size = 50

Settings for the proposed TPAM

Number of parameter sampling steps = 50, 000

To evaluate PAMs over a large window of activityThis does not correspond to any specific number ofsearch steps executed by a DE

101 independent runs. pmax ∈ {0.1, 0.2, ..., 1.0}

Evaluation metric for TPAM:The percentage of successful parameters rsucc ∈ [0, 1]

A higher rsucc represents that the PAM is able to track giventarget parameters θtarget1 , θtarget2 , ... 15 / 21


The tracking performance of the five PAMs on the linear functions

PAM-JADE has best tracking for all the pmax valuesFor the smaller pmax values, PAM-EPSDE cannot track θtarget

PAM-MDE and PAM-SHADE have a hidden bias

Max. success probability pmax (Smaller is more difficult)

Linearly increasing function

The tracking performance of PAM-JADE is 1.4 times better

than that of PAM-SHADE

Linearly decreasing function

16 / 21


The tracking performance of the five PAMs on the sinusoidal function

PAM-EPSDE has best tracking for the larger pmax valuesPAM-SHADE has best tracking for the smaller pmax valuesThe tracking performance of PAM-SHADE on hard TPAMinstances is better than that of the other PAMs

2

Max. success probability pmax (Smaller is more difficult)

Angular frequency = 10 (easy) Angular frequency = 40 (hard)

PAM-EPSDE has 1.2 times better tracking than PAM-SHADE

PAM-SHADE has 1.04 times the tracking

performance of PAM-JADE

PAM-SHADE has 1.3 times the tracking performance

of PAM-JADE

17 / 21


Why does PAM-SHADE track the target better than PAM-JADE?

The diversity of values in M enables PAM-SHADE to be muchmore robust than PAM-JADE on hard TPAM instances

μ of PAM-JADE

M of PAM-SHADE

PAM-JADE clearly fails to track the target parameters

PAM-SHADE succeeds in tracking the target

ω=10

ω=10

ω=40

ω=40

μ of PAM-JADE mostly overlaps the target

ω=10

ω=40

PAM-SHADE tracks the target with broadly distributed values

ω=10

ω=40

18 / 21


The tracking performance of the PAMs on the random-walk function

Easy instances: PAM-JADE and PAM-EPSDE track wellHard instances: PAM-SHADE has the best tracking performance

Step size of random walk s [001, 0.1]∈

pmax = 1 (easy) pmax = 0.1 (hard)

The tracking performance of PAM-SHADE is 1.18 times

worse than that of PAM-JADE

Larger s harder to track

PAM-SHADE has 1.13 times better tracking

than PAM-JADE

19 / 21


How relevant are the target tracking accuracy of PAMs to the search

performance of adaptive DEs? Results on the BBOB functions [Hansen 09]

The rand/1/bin operators are used for all PAMsAn adaptive DE using PAM-SHADE performs wellThe tracking performance of a PAM in the TPAM model iscorrelated with search performance of DE using that PAM

0 1 2 3 4 5log10 of (# f-evals / dimension)

0.0

0.2

0.4

0.6

0.8

1.0

Pro

port

ion

offu

nctio

n+ta

rget

pairs

PAM-JADE

PAM-EPSDE

PAM-MDE

PAM-jDE

PAM-SHADE

best 2009f1-24,10-D

20 / 21


Conclusion:

The first quantitative investigation of the adaptation ability of PAMs

We proposed a Target function-based PAM simulation (TPAM)

TPAM evaluates the tracking performance of PAMs andenables quantitative comparison of PAMs

E.g., PAM-SHADE tracks this particular target behavior 1.3

times better than PAM-JADE

TPAM can provide important insights on PAMs,

E.g., Why an adaptive DE using PAM-SHADE performs well

TPAM is a promising simulation framework for analyzing PAMs

Future work

Apply TPAM to analysis of PAMs in other EAs (e.g., ESs)Design an unified, systematic simulation framework

TPAM evaluates only the tracking performance of PAMsOther simulation models that evaluate other importantaspects of PAMs (e.g., diversity) are necessary

21 / 21

Appendix

Is our definition of success probability appropriate?

All of the parameter values generated by an adaptive DE using thecurrent-to-pbest/1/bin and PAM-JADE on the Rosenbrock

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

Number of function evaluations ×104

0.0

0.2

0.4

0.6

0.8

1.0F

Failed FSuccessful FSpline curve

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

Number of function evaluations ×104

0.0

0.2

0.4

0.6

0.8

1.0C

Failed CSuccessful CSpline curve

Success probability as a function of the distance of the F (left)and C (right) from the spline curve

0.0 0.1 0.2 0.3 0.4 0.5 0.60.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.045F

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.450.00

0.01

0.02

0.03

0.04

0.05

0.06C

Distance

Suc

cess

prob

abili

ty

The assumption that the probability ofgenerating successful trial vectors ishighly correlated with the ability togenerate control parameters valueswhich accurately track a targetparameter is justifiable

22 / 21

TPAM: A Simulation-Based Model for Quantitatively ...Introduction Adaptive DEs TPAM Experimental...

Documents

Transcript of TPAM: A Simulation-Based Model for Quantitatively ...Introduction Adaptive DEs TPAM Experimental...