Course in Bayesian Optimization - Web Javier...
Transcript of Course in Bayesian Optimization - Web Javier...
![Page 1: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/1.jpg)
Course in Bayesian Optimization
Javier Gonzalez
University of Sheffield, Sheffield, UK
29th October 2015
![Page 2: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/2.jpg)
Recap
Yesterday we discussed:
I Bayesian Optimization is an efficient strategy to make MLcompletely automatic.
I We can use Bayesian optimization principles to designexperiments sequentially.
I Probability theory and uncertainty are the keys.
Bayesian Optimization is AI for AI.
![Page 3: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/3.jpg)
Today’s agenda
I Which are the current challenges in BayesianOptimisation?
I Can we extend BO ideas to other domains?I Some fresh research results.
BO is very active field with still many open questions.
![Page 4: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/4.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 5: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/5.jpg)
Multi task Bayesian Optimization[Wersky et al., 2013]
I We want to optimise an objective that it is very expensiveto evaluate but we have access to another function,correlated with objective, that is cheaper to evaluate.
I The idea is to use the correlation among the function toimprove the optimization.
Multi-output Gaussian process
k(x, x′) = B ⊗ k(x, x′)
![Page 6: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/6.jpg)
Multi task Bayesian Optimization[Wersky et all., 2013]
I Correlation among tasks reduces global uncertainty.I The choice (acquisition) changes.
![Page 7: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/7.jpg)
Multi task Bayesian Optimization[Wersky et al,, 2013]
I In other cases we want to optimize several tasks at thesame time.
I We need to use a combination of them (the mean, forinstance) or have a look to the Pareto frontiers of theproblem.
Averaged expected improvement.
![Page 8: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/8.jpg)
Multi task Bayesian Optimization[Wersky et al., 2013]
I In other cases we want to optimize several tasks at thesame time.
I We need to use a combination of them (the mean, forinstance) or have a look to the Pareto frontiers of theproblem.
Averaged expected improvement.
![Page 9: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/9.jpg)
Multi task Bayesian Optimization[Wersky et al., 2013]
![Page 10: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/10.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 11: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/11.jpg)
Non-stationary Bayesian Optimization[Snoek et al., 2014]
The beta distributions allows for a rich family oftransformations.
![Page 12: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/12.jpg)
Non-stationary Bayesian Optimization[Snoek et al., 2014]
Idea: transform the function to make it stationary.
![Page 13: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/13.jpg)
Non-stationary Bayesian Optimization[Snoek et al., 2014]
Results improve in many experiments by warping the inputs.
Extensions to multi-task warping.
![Page 14: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/14.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 15: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/15.jpg)
Inequality Constraints[Gardner et al., 2014]
In many optimization problems the domain of the function isnot an hypercube.
![Page 16: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/16.jpg)
Inequality Constraints[Gardner et al., 2014]
An option is to penalize the EI with an indicator function thatvanishes the acquisition out the domain of interest.
![Page 17: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/17.jpg)
Inequality Constraints[Gardner et al., 2014]
Much more efficient than standard approaches.
![Page 18: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/18.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 19: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/19.jpg)
Scalable BO: REMBO[Wang et al., 2013]
![Page 20: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/20.jpg)
Scalable BO: REMBO[Wang et al., 2013]
A function f : X → < is called to have effective dimensionalityd with d ≤ D if there exist a linear subspace T of dimension dsuch that for all x⊥ ⊂ T and x> ⊂T> ⊂ T we havef (x⊥) = f (x⊥ + x>) where T> is the orthogonal complement ofT .
![Page 21: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/21.jpg)
Scalable BO: REMBO[Wang et al., 2013]
![Page 22: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/22.jpg)
Scalable BO: REMBO[Wang et al., 2013]
I Better in cases in the which the intrinsic dimensionality ofthe function is low.
I Hard to implement (need to define the bounds of theoptimization after the embedding).
![Page 23: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/23.jpg)
Scalable BO: Additive models
Use the Sobol-Hoeffding decompostion
f (x) = f0 +
D∑i=1
fi(xi) +∑i< j
fi j(xi, x j) + · · · + f1,...,D(x)
where
I f0 =∫X
f (x)dx
I fi(xi) =∫X−i
f (x)dx−i - f0I etc...
and assume that the effects of high order than q are null.
![Page 24: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/24.jpg)
Scalable BO: Additive models
![Page 25: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/25.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 26: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/26.jpg)
Scalable BO: Parallel/batch BOAvoiding the bottleneck of evaluating f
I Cost of f (xn) = cost of { f (xn,1), . . . , f (xn,nb)}.I Many cores available, simultaneous lab experiments, etc.
![Page 27: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/27.jpg)
Considerations when designing a batch
I Available pairs {(x j, yi)}ni=1 are augmented with theevaluations of f on Bnb
t = {xt,1, . . . , xt,nb}.
I Goal: design Bnb1 , . . . ,B
nbm .
Notation:
I In: represents the available data setDn and the GPstructure when n data points are available.
I α(x;In): generic acquisition function given In.
![Page 28: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/28.jpg)
Selecting xt,k, the k-th element of the t-th batch
Sequential policy: Maximize:
α(x;It,k−1)
Greedy batch policy: it is not tractable: Maximize:∫α(x;It,k−1)
k−1∏j=1
p(yt, j|xt, j,It, j−1)p(xt, j|It, j−1)dxt, jdyt, j
I p(yt, j|x j,It, j−1): predictive distribution of the GP.I p(x j|It, j−1) = δ(xt, j − arg maxx∈X α(x;It, j−1)).
![Page 29: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/29.jpg)
Selecting xt,k, the k-th element of the t-th batch
Sequential policy: Maximize:
α(x;It,k−1)
Greedy batch policy: it is not tractable: Maximize:∫α(x;It,k−1)
k−1∏j=1
p(yt, j|xt, j,It, j−1)p(xt, j|It, j−1)dxt, jdyt, j
I p(yt, j|x j,It, j−1): predictive distribution of the GP.I p(x j|It, j−1) = δ(xt, j − arg maxx∈X α(x;It, j−1)).
![Page 30: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/30.jpg)
Available approaches[Azimi et al., 2010; Azimi et al., 2011; Azimi et al., 2012; Desautels et al., 2012; Chevalieret al., 2013; Contal et al. 2013]
I Exploratory approaches, reduction in system uncertainty.I Generate ‘fake’ observations of f using p(yt, j|x j,It, j−1).I Simultaneously optimize elements on the batch using the
joint distribution of yt1 , . . . yt,nb.
Bottleneck
All these methods require to iteratively update p(yt, j|x j,It, j−1)to model the iteration between the elements in the batch: O(n3)
How to design batches reducing this cost? BBO-LP
![Page 31: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/31.jpg)
Goal: eliminate the marginalization step
“To develop an heuristic approximating the ’optimal batch designstrategy’ at lower computational cost, while incorporating
information about global properties of f from the GP model into thebatch design”
Lipschitz continuity:
| f (x1) − f (x2)| ≤ L‖x1 − x2‖p.
![Page 32: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/32.jpg)
Interpretation of the Lipschitz continuity of f
M = maxx∈X f (x) and Brxj(x j) = {x ∈ X : ‖x − x j‖ ≤ rx j}where
rx j =M − f (x j)
L
0.4 0.6 0.8 1.0 1.2x
30
20
10
0
10
20f(x
)
True functionSamplesExclusion conesActive regions
xM < Brxj(x j) otherwise, the Lipschitz condition is violated.
![Page 33: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/33.jpg)
Probabilistic version of Brx(x)We can do this because f (x) ∼ GP(µ(x), k(x, x′))
I rx j is Gaussian with µ(rx j) =M−µ(x j)
L and σ2(rx j) =σ2(x j)
L2 .
Local penalizers: ϕ(x; x j) = p(x < Brx j(x j))
ϕ(x; x j) = p(rx j < ‖x − x j‖)= 0.5erfc(−z)
where z = 1√2σ2
n(x j)(L‖x j − x‖ −M + µn(x j)).
I Reflects the size of the ’Lipschitz’ exclusion areas.I Approaches to 1 when x is far form x j and decreases
otherwise.
![Page 34: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/34.jpg)
Probabilistic version of Brx(x)We can do this because f (x) ∼ GP(µ(x), k(x, x′))
I rx j is Gaussian with µ(rx j) =M−µ(x j)
L and σ2(rx j) =σ2(x j)
L2 .
Local penalizers: ϕ(x; x j) = p(x < Brx j(x j))
ϕ(x; x j) = p(rx j < ‖x − x j‖)= 0.5erfc(−z)
where z = 1√2σ2
n(x j)(L‖x j − x‖ −M + µn(x j)).
I Reflects the size of the ’Lipschitz’ exclusion areas.I Approaches to 1 when x is far form x j and decreases
otherwise.
![Page 35: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/35.jpg)
Idea to collect the batchesWithout using explicitly the model.
Optimal batch: maximization-marginalization∫α(x;It,k−1)
k−1∏j=1
p(yt, j|xt, j,It, j−1)p(xt, j|It, j−1)dxt, jdyt, j
Proposal: maximization-penalization.
Use the ϕ(x; x j) to penalize the acquisition and predict the expectedchange in α(x;It,k−1).
![Page 36: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/36.jpg)
Local penalization strategy[Gonzalez, Dai, Hennig, Lawrence, 2015]
10 5 0 5 10x
0
1
2
3
4
5
6
7
8
9va
lue
1st batch elementα(x)
10 5 0 5 10x
0
1
2
3
4
5
6
7
8
9
valu
e
2nd batch elementα(x)
α(x)ϕ1 (x)
ϕ1 (x)
10 5 0 5 10x
0
1
2
3
4
5
6
7
8
9
valu
e
3th batch elementα(x)ϕ1 (x)
α(x)ϕ1 (x)ϕ2 (x)
ϕ2 (x)
The maximization-penalization strategy selects xt,k as
xt,k = arg maxx∈X
g(α(x;It,0))k−1∏j=1
ϕ(x; xt, j)
,g is a transformation of α(x;It,0) to make it always positive.
![Page 37: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/37.jpg)
Local penalization strategy[Gonzalez, Dai, Hennig, Lawrence, 2015]
10 5 0 5 10x
0
1
2
3
4
5
6
7
8
9va
lue
1st batch elementα(x)
10 5 0 5 10x
0
1
2
3
4
5
6
7
8
9
valu
e
2nd batch elementα(x)
α(x)ϕ1 (x)
ϕ1 (x)
10 5 0 5 10x
0
1
2
3
4
5
6
7
8
9
valu
e
3th batch elementα(x)ϕ1 (x)
α(x)ϕ1 (x)ϕ2 (x)
ϕ2 (x)
The maximization-penalization strategy selects xt,k as
xt,k = arg maxx∈X
g(α(x;It,0))k−1∏j=1
ϕ(x; xt, j)
,g is a transformation of α(x;It,0) to make it always positive.
![Page 38: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/38.jpg)
Example for L = 50
L controls the exploration-exploitation balance within the batch.
![Page 39: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/39.jpg)
Example for L = 100
L controls the exploration-exploitation balance within the batch.
![Page 40: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/40.jpg)
Example for L = 150
L controls the exploration-exploitation balance within the batch.
![Page 41: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/41.jpg)
Example for L = 250
L controls the exploration-exploitation balance within the batch.
![Page 42: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/42.jpg)
Finding an unique Lipschitz constant
Let f : X → IR be a L-Lipschitz continuous function defined ona compact subset X ⊆ IRD. Then
Lp = maxx∈X‖∇ f (x)‖p,
is a valid Lipschitz constant.
The gradient of f at x∗ is distributed as a multivariate Gaussian
∇ f (x∗)|X,y, x∗ ∼ N(µ∇(x∗),Σ2∇
(x∗))
We choose:LGP−LCA = max
X
‖µ∇(x∗)‖
![Page 43: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/43.jpg)
Sobol function
Best (average) result for some given time budget.
![Page 44: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/44.jpg)
2D experiment with ‘large domain’
Comparison in terms of the wall clock time
0 50 100 150 200 250 300
Time(seconds)
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1.0
Best
found v
alu
e
EI
UCB
Rand-EI
Rand-UCB
SM-UCB
B-UCB
PE-UCB
Pred-EI
Pred-UCB
qEI
LP-EI
LP-UCB
![Page 45: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/45.jpg)
Maximizing gene translation
I Maximization of a 70 dimensional surface representing theefficiency of hamster cells producing proteins.
0 200 400 600 800 1000 1200
Time(seconds)
5.066
5.064
5.062
5.060
5.058
5.056
5.054
5.052
5.050
Best
found v
alu
e
EI
UCB
Rand-EI
Rand-UCB
SM-UCB
B-UCB
PE-UCB
Pred-EI
Pred-UCB
LP-EI
LP-UCB
![Page 46: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/46.jpg)
Support Vector Regression
I Minimization of the RMSE on a test set over 3 parameters.I ’Physiochemical’ properties of protein tertiary structure?.I 45730 instances and 9 continuous attributes.
0 500 1000 1500 2000 2500 3000 3500
Time(seconds)
5.860
5.865
5.870
5.875
5.880
5.885
5.890
Best
found v
alu
e EI
UCB
Rand-EI
Rand-UCB
Pred-EI
Pred-UCB
LP-EI
LP-UCB
![Page 47: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/47.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 48: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/48.jpg)
Non myopic methods
I Most global optimisation techniques are myopic, inconsidering no more than a single step into the future.
I Relieving this myopia requires solving the multi-steplookahead problem: the global optimisation of an functionby considering the significance of the next functionevaluation on function evaluations (steps) further into thefuture.
![Page 49: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/49.jpg)
Myopic loss
Denote by η = min{y0}, the current best found value. We candefine the loss of evaluating f this last time at x∗ assuming it isreturning y∗ as
λ(y∗) ,{
y∗; if y∗ ≤ ηη; if y∗ > η.
Its expectation is
Λ1(x∗|I0) , E[min(y∗, η)] =
∫λ(y∗)p(y∗|x∗,I0)dy∗
![Page 50: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/50.jpg)
The myopic loss has closed form under Gaussianlikelihoods
Λ1(x∗|I0) , η
∫∞
ηN(y∗;µ, σ2)dy∗
+
∫ η
−∞
y∗N(y∗;µ, σ2)dy∗
= η + (µ − η)Φ(η;µ, σ2) − σ2N(η, µ, σ2),
where we have abbreviated σ2(y∗|I0) as σ2 and µ(y∗|I0) as µ.
![Page 51: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/51.jpg)
Looking many steps ahead
Λn(x∗|I0) =
∫λ(yn)
n∏j=1
p(y j|x j,I j−1)p(x j|I j−1)
dy∗ . . . dyndx2 . . .dxn
where
p(y j|x j,I j−1) = N(y j;µ(x j;I j−1), σ2(x j|I j−1)
)is the predictive distribution of the GP at x j and
p(x j|I j−1) = δ(x j − arg min
x∗∈XΛn− j+1(x∗|I j−1)
)reflects the optimization step.
![Page 52: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/52.jpg)
Looking many steps ahead
Graphical model representing the decision process of a myopicloss.
![Page 53: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/53.jpg)
Problems
I The myopic loss is very expensive to compute.I As in the batch Bayesian optimization cases, it requires to
iterative solve an expectation-optimization problem.
![Page 54: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/54.jpg)
Relieving the myopia of Bayesian optimization
We present... GLASSES!
Global optimisation with Look-Ahead through Stochastic Simulationand Expected-loss Search
[Gonzalez, Osborne, Lawrence, 2015]
![Page 55: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/55.jpg)
Relieving the myopia of Bayesian optimization
We present...
GLASSES!
Global optimisation with Look-Ahead through Stochastic Simulationand Expected-loss Search
[Gonzalez, Osborne, Lawrence, 2015]
![Page 56: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/56.jpg)
Relieving the myopia of Bayesian optimization
We present... GLASSES!
Global optimisation with Look-Ahead through Stochastic Simulationand Expected-loss Search
[Gonzalez, Osborne, Lawrence, 2015]
![Page 57: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/57.jpg)
GLASSES
Idea: jointly model the epistemic uncertainty in all steps ahead.
![Page 58: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/58.jpg)
GLASSES
p(x2, . . . , xn|I0, x∗): joint probability distribution over the stepsahead:
Γn(x∗|I0) =
∫λ(yn)p(y|X,I0, x∗)p(X|I0, x∗)dydX
I y = {y∗, . . . , . . . , yn} the vector of future evaluations of f .I X the (n − 1) × q dimensional matrix whose rows are the
future evaluations x2, . . . , xn.I p(y|X,I0, x∗) is multivariate Gaussian.
![Page 59: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/59.jpg)
GLASSES
I Select a good p(X|I0, x∗) is complicated.I We fix some x: the result of some oracle Fn(x∗).I Denote by y = (y∗, . . . , . . . , yn)T the vector of future
locations evaluations of f at Fn(x∗).
I It is possible to rewrite the expected loss Λn(x∗ | I0,Fn(x∗)
)as
Λn(x∗ | I0,Fn(x∗)
)= E
[min(y, η)
]
![Page 60: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/60.jpg)
GLASSES: Computing the value of the expected loss
Use Expectation Propagation observing that
E[min(y, η)] = η
∫Rn
n∏i=1
hi(y)N(y;µ,Σ)dy (0)
+
n∑j=1
∫Rn
y j
n∏i=1
t j,i(y)N(y;µ,Σ)dy
where hi(y) = I{yi > η} and
t j,i(y) =
I{y j ≤ η} if i=j
I{0 ≤ yi − y j} otherwise.
![Page 61: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/61.jpg)
GLASSES: predicting the steps ahead
To predict the steps ahead we use the batch method in[Gonzalez, Dai, Hennig and Lawrence, 2015]
![Page 62: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/62.jpg)
GLASSES: loss function
The more steps remain, the more explorative is the non-myopicloss.
![Page 63: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/63.jpg)
GLASSES: results
GLASSES is overal the best method.
Make sense to use GLASSES!
![Page 64: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/64.jpg)
Challenges and extensions in Bayesian Optimization
I Multi-task Bayesian optimization.
I Non-stationary Bayesian optimization.
I Inequality constrains
I Scalable BO: high dimensional problems.
I Scalable BO: parallel approaches.
I Non-myopic methods.
I Applications: molecule design.
![Page 65: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/65.jpg)
Application: Synthetic gene design
I Use mammalian cells to make protein products.I Control the ability of the cell-factory to use synthetic DNA.
Optimize genes (ATTGGTUGA...) to best enable thecell-factory to operate most efficiently [Gonzalez et al. 2014].
![Page 66: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/66.jpg)
Central dogma of molecular biology
![Page 67: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/67.jpg)
Big question
Remark: ‘Natural’ gene sequences are not necessarilyoptimized to maximize protein production.
ATGCTGCAGATGTGGGGGTTTGTTCTCTATCTCTTCCTGACTTTGTTCTCTATCTCTTCCTGACTTTGTTCTCTATCTCTTC...
Considerations
I Different gene sequences→ same protein.I The sequence affects the synthesis efficiency.
Which is the most efficient sequence to produce a protein?
![Page 68: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/68.jpg)
Redundancy of the genetic code
I Codon: Three consecutive bases: AAT, ACG, etc.I Protein: sequence of amino acids.I Different codons may encode the same aminoacid.I ACA=ACU encodes for Threonine.
ATUUUGACA = ATUUUGACU
synonyms sequences→ same protein but different efficiency
![Page 69: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/69.jpg)
Redundancy of the genetic code
![Page 70: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/70.jpg)
How to design a synthetic gene?
A good model is crucial—: Gene sequence features→ proteinproduction efficiency.
Bayesian Optimization principles for gene design
do:
1. Build a GP model as an emulator of the cell behavior.2. Obtain a set of gene design rules (features optimization).3. Design one/many new gene/s coherent with the design
rules.4. Test genes in the lab (get new data).
until the gene is optimized (or the budget is over...).
![Page 71: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/71.jpg)
Model as an emulator of the cell behavior
Model inputsFeatures (xi) extracted gene sequences (si): codon frequency,cai, gene length, folding energy, etc.
Model outputsTranscription and translation rates f := ( fα, fβ).
Model typeMulti-output Gaussian process f ≈ GP(m,K) where K is acorregionalization covariance for the two-output model (+ SEwith ARD).
The correlation in the outputs help!
![Page 72: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/72.jpg)
Model as an emulator of the cell behavior
![Page 73: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/73.jpg)
Obtaining optimal gene design rules
Maximize the averaged EI [Swersky et al. 2013]
α(x) = σ(x)(−uΦ(−u) + φ(u))
where u = (ymax − m(x))/σ(x) and
m(x) =12
∑l=α,β
f∗(x), σ2(x) =122
∑l,l′=α,β
(K∗(x, x))l,l′ .
A batch method is used when several experiments can be runin parallel
![Page 74: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/74.jpg)
Designing new genes coherent with the optimaldesign rules
Simulating-matching approach:
1. Simulate genes ‘coherent’ with the target (sameamino-acids).
2. Extract features.3. Rank synthetic genes according to their similarity with the
‘optimal’ design rules.
Ranking criterion: eval(s|x?) =∑p
j=1 w j|x j − x?j |
I x?: optimal gene design rules.I s, x j generated ‘synonyms sequence’ and its features.I w j: weights of the p features (inverse length-scales of the
model covariance).
![Page 75: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/75.jpg)
Results for 10 low-expressed genes
![Page 76: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/76.jpg)
Wrapping up
I BO is fantastic tool for parameter optimization in ML andexperimental design.
I The model and acquisition function are the two mostimportant bits.
I Many useful extensions for BO.
I To scale BO is a current challenge.
I Software available!
![Page 77: Course in Bayesian Optimization - Web Javier Gonzalezjaviergonzalezh.github.io/courseBO/lectures/CourseBO_Lecture3.pdf · Recap Yesterday we discussed: I Bayesian Optimization is](https://reader036.fdocuments.in/reader036/viewer/2022062604/5fbc68507366052fdc345026/html5/thumbnails/77.jpg)
Picture source: http://peakdistrictcycleways.co.uk
Use Bayesian optimization!