Noisy Optimization Convergence Rates (GECCO2013)

1
Log-log Convergence for Noisy Optimization S. Astete-Morales, J. Liu, O. Teytaud [email protected] TAO, INRIA-CNRS-LRI, Univ. Paris-Sud, 91190 Gif-sur-Yvette,France Abstract We consider: Noisy optimization problems, without the assumption of variance vanishing in the neighborhood of the optimum. We show mathematically: Exponential number of resamplings and number of resamplings poly- nomial in the inverse step-size lead to a log-log convergence rate. We show empirically: Convergence rate is obtained also with polynomial resampling setting. Compared to the state of the art: Our results provide i) Proofs of log-log convergence for evolution strategies (which were not covered by existing results) in the case of objective functions with quadratic expectations and noise with constant variance. ii) Log-log rates also for objective functions with expectation Ef (x)= ||x - x * || p . iii) Experiments with different parametrizations. Notation d: dimension n: # iteration x n : parent at n r n : # eval. at n for each e n : # eval. at n p: power in fit σ n : step-size N : Gaussian Algorithm: (μ, σ )-ES Initialize Parameters Input: initial individual and initial step-size n 1 while (true) do Generate λ individuals independently, each: i j = x n + σ n N Evaluate each of them and average their fit- ness values. Select the μ best individuals Update n n +1 end while Theoretical Analysis Focus: Simple revaluation rules, choosing the number of resamplings. Preliminary: Noise-free [Auger,A.] Some ES verify: log(||x n ||) n <C< 0 Non Adaptive, Scale Invariance We prove: If r n = dn e then log(||x n ||) n <C 0 < 0 Adaptive, no Scale Invariance We prove: If r n = d-η n e then log(||x n ||) n <C 00 < 0 Different settings and Same property! ad hoc resampling Experimental Results Polynomial number of resamplings: Setting: No Scale Invariance f itness(x)= ||x|| p + N , p =2 We experiment: r n = dLn ζ e Results: ζ =0 poor results ζ =1, 2 or 3 slopes close to - 1 2p Better for ζ =2 or ζ =3 than ζ =1 Conjecture: Asymptotic regime is - 1 2p , reached later for ζ large 0 5 10 15 5 4 3 2 1 0 1 log(Evaluations) log(||y||) K=2 ζ=2 p=2 d=2 slope=0.3267 0 5 10 15 5 4 3 2 1 0 1 log(Evaluations) log(||y||) slope=0.2829 K=2 ζ=2 p=2 d=4 Conclusion Log-log Convergence: Proof of convergence, no scale invariance, real algorithms in noisy case log(||x n ||) log(e n ) C Room for Improvement: Constants Less noise better rates Algorithms with surrogate models References [1] Mohamed Jebalia and Anne Auger and Nikolaus Hansen: Log linear convergence and divergence of the scale-invariant (1+1)-ES in noisy environments, Springer (2010) [2] Rémi Coulom and Philippe Rolet and Nataliya Sokolovska and Olivier Teytaud: Handling Expensive Optimization with Large Noise, 19th International Symposium on Foundations of Genetic Algorithms (2011) [3] Olivier Teytaud and Jérémie Decock: Noisy Optimization Complexity, FOGA - Foundations of Genetic Algorithms XII (2013)

Transcript of Noisy Optimization Convergence Rates (GECCO2013)

Log-logConvergence forNoisyOptimizationS. Astete-Morales, J. Liu, O. Teytaud

[email protected], INRIA-CNRS-LRI, Univ. Paris-Sud, 91190 Gif-sur-Yvette,France

Abstract

We consider: Noisy optimization problems, without the assumption of variance vanishing in theneighborhood of the optimum.We show mathematically: Exponential number of resamplings and number of resamplings poly-nomial in the inverse step-size lead to a log-log convergence rate.We show empirically: Convergence rate is obtained also with polynomial resampling setting.

Compared to the state of the art: Our results provide

i) Proofs of log-log convergence for evolution strategies (which were not covered byexisting results) in the case of objective functions with quadratic expectations andnoise with constant variance.

ii) Log-log rates also for objective functions with expectation Ef(x) = ||x− x∗||p.

iii) Experiments with different parametrizations.

Notationd: dimensionn: # iterationxn: parent at nrn: # eval. at n for each

en: # eval. at np: power in fitσn: step-sizeN : Gaussian

Algorithm: (µ, σ)-ESInitialize ParametersInput: initial individual and initial step-sizen← 1while (true) doGenerate λ individuals independently, each:ij = xn + σnNEvaluate each of them and average their fit-ness values.Select the µ best individualsUpdaten← n+ 1

end while

Theoretical Analysis

Focus: Simple revaluation rules, choosing thenumber of resamplings.

• Preliminary: Noise-free [Auger,A.]Some ES verify:

log(||xn||)n

< C < 0

• Non Adaptive, Scale InvarianceWe prove: If rn = dKζne then

log(||xn||)n

< C ′ < 0

• Adaptive, no Scale InvarianceWe prove: If rn = dY σ−ηn e then

log(||xn||)n

< C ′′ < 0

Different settingsand ⇒ Same property!

ad hoc resampling

Experimental Results• Polynomial number of resamplings:

Setting:No Scale Invariancefitness(x) = ||x||p +N , p = 2We experiment: rn = dLnζe

Results:ζ = 0⇒ poor resultsζ = 1, 2 or 3⇒ slopes close to − 1

2pBetter for ζ = 2 or ζ = 3 than ζ = 1

Conjecture:Asymptotic regime is − 1

2p , reached later for ζ large

0 5 10 15−5

−4

−3

−2

−1

0

1

log(Evaluations)

log(||y||)

K=2ζ=2p=2d=2

slope=−0.3267

0 5 10 15−5

−4

−3

−2

−1

0

1

log(Evaluations)

log(||y||)

slope=−0.2829 K=2ζ=2p=2d=4

Conclusion• Log-log Convergence:

Proof of convergence, no scale invariance, real algorithms in noisy caselog(||xn||)log(en)

→ C

• Room for Improvement:ConstantsLess noise ⇒ better ratesAlgorithms with surrogate models

References[1] Mohamed Jebalia and Anne Auger and Nikolaus Hansen: Log linear convergence and divergence of the scale-invariant (1+1)-ES in noisy environments, Springer (2010)[2] Rémi Coulom and Philippe Rolet and Nataliya Sokolovska and Olivier Teytaud: Handling Expensive Optimization with Large Noise, 19th International Symposium on

Foundations of Genetic Algorithms (2011)[3] Olivier Teytaud and Jérémie Decock: Noisy Optimization Complexity, FOGA - Foundations of Genetic Algorithms XII (2013)