Electrochem BO Slides - People @ EECS at UC Berkeley

59
An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017

Transcript of Electrochem BO Slides - People @ EECS at UC Berkeley

Page 1: Electrochem BO Slides - People @ EECS at UC Berkeley

An Introduction to Bayesian Optimisation and(Potential) Applications in Materials Science

Kirthevasan Kandasamy

Machine Learning Dept, CMU

Electrochemical Energy Symposium

Pittsburgh, PA, November 2017

Page 2: Electrochem BO Slides - People @ EECS at UC Berkeley

Designing Electrolytes in Batteries

1/19

Page 3: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation in Computational Astrophysics

Cosmological Simulator

Observation

E.g:Hubble ConstantBaryonic Density

Likelihood Score

Likelihood computation

1/19

Page 4: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

Expensive Blackbox Function

Other Examples:- Pre-clinical Drug Discovery- Optimal policy in Autonomous Driving- Synthetic gene design

1/19

Page 5: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.

Let x? = argmaxx f (x).

x

f(x)

2/19

Page 6: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.

Let x? = argmaxx f (x).

x

f(x)

2/19

Page 7: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.Let x? = argmaxx f (x).

x

f(x)

x∗

f(x∗)

2/19

Page 8: Electrochem BO Slides - People @ EECS at UC Berkeley

Outline

I Part I: Bayesian Optimisation

I Bayesian Models for f

I Two algorithms: upper confidence bounds & Thompsonsampling

I Part II: Some Modern Challenges

I Multi-fidelity Optimisation

I Parallelisation

3/19

Page 9: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Functions with no observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 10: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Functions with no observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 11: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Prior GP

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 12: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 13: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Posterior GP given observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 14: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Posterior GP given observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 15: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x)

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19

Page 16: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x)

1) Construct posterior GP.

2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19

Page 17: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x) ϕt = µt−1 + β1/2t σt−1

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19

Page 18: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x) ϕt = µt−1 + β1/2t σt−1

xt

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x).

4) Evaluate f at xt .

5/19

Page 19: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x) ϕt = µt−1 + β1/2t σt−1

xt

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .5/19

Page 20: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

x

f(x)

6/19

Page 21: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 1x

f(x)

6/19

Page 22: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 2x

f(x)

6/19

Page 23: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 3x

f(x)

6/19

Page 24: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 4x

f(x)

6/19

Page 25: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 5x

f(x)

6/19

Page 26: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 6x

f(x)

6/19

Page 27: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 7x

f(x)

6/19

Page 28: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 11x

f(x)

6/19

Page 29: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 25x

f(x)

6/19

Page 30: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 31: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP.

2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 32: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 33: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

xt

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x).

4) Evaluate f at xt .

7/19

Page 34: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

xt

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 35: Electrochem BO Slides - People @ EECS at UC Berkeley

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Page 36: Electrochem BO Slides - People @ EECS at UC Berkeley

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Page 37: Electrochem BO Slides - People @ EECS at UC Berkeley

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Page 38: Electrochem BO Slides - People @ EECS at UC Berkeley

Some Modern Challenges/Opportunities

1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b,

Kandasamy et al. ICML 2017)

2. Parallelisation (Kandasamy et al. Arxiv 2017)

9/19

Page 39: Electrochem BO Slides - People @ EECS at UC Berkeley

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

Page 40: Electrochem BO Slides - People @ EECS at UC Berkeley

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆

f1

f2

f3

f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

Page 41: Electrochem BO Slides - People @ EECS at UC Berkeley

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆

f1

f2

f3

f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

Page 42: Electrochem BO Slides - People @ EECS at UC Berkeley

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Page 43: Electrochem BO Slides - People @ EECS at UC Berkeley

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Page 44: Electrochem BO Slides - People @ EECS at UC Berkeley

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Page 45: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Cosmological Maximum Likelihood Inference

I Type Ia Supernovae Data

I Maximum likelihood inference for 3 cosmological parameters:

I Hubble Constant H0

I Dark Energy Fraction ΩΛ

I Dark Matter Fraction ΩM

I Likelihood: Robertson Walker metric (Robertson 1936)

Requires numerical integration for each point in the dataset.

12/19

Page 46: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Cosmological Maximum Likelihood Inference

3 cosmological parameters. (d = 3)Fidelities: integration on grids of size (102, 104, 106). (M = 3)

500 1000 1500 2000 2500 3000 3500-10

-5

0

5

10

13/19

Page 47: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Hartmann-3D

2 Approximations (3 fidelities).We want to optimise the m = 3rd fidelity, which is the mostexpensive. m = 1st fidelity is cheapest.

0 0.5 1 1.5 2 2.5 3 3.50

5

10

15

20

25

30

35

40

Num.of

Queries

Query frequencies for Hartmann-3D

f (3)(x)

m=1m=2m=3

14/19

Page 48: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 49: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 50: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 51: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 52: Electrochem BO Slides - People @ EECS at UC Berkeley

Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS

At any given time,1. (x ′, y ′)← Wait for

a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.

4. Re-deploy worker atargmax g .

Synchronous: synTS

At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for

all workers to finish.2. Compute posterior GP.3. Draw M samples

gm ∼ GP, ∀m.4. Re-deploy worker m at

argmax gm, ∀m.

16/19

Page 53: Electrochem BO Slides - People @ EECS at UC Berkeley

Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS

At any given time,1. (x ′, y ′)← Wait for

a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.

4. Re-deploy worker atargmax g .

Synchronous: synTS

At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for

all workers to finish.2. Compute posterior GP.3. Draw M samples

gm ∼ GP, ∀m.4. Re-deploy worker m at

argmax gm, ∀m.

16/19

Page 54: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

0 10 20 30 40

10 -2

10 -1

17/19

Page 55: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

0 10 20 30 40

10 -2

10 -1

17/19

Page 56: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS

0 10 20 30 40

10 -2

10 -1

17/19

Page 57: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Hartmann-18D M = 25Evaluation time sampled from an exponential distribution

synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS

0 5 10 15 20 25 30

2.5

3

3.5

4

4.5

55.5

66.5

18/19

Page 58: Electrochem BO Slides - People @ EECS at UC Berkeley

Summary

I Black-box Optimisation methods are used in several scientificand engineering applications.

I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .

I Some modern challenges

I Multi-fidelity optimisation

I Parallel evaluations

I and several more . . .

Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa

19/19

Page 59: Electrochem BO Slides - People @ EECS at UC Berkeley

Summary

I Black-box Optimisation methods are used in several scientificand engineering applications.

I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .

I Some modern challenges

I Multi-fidelity optimisation

I Parallel evaluations

I and several more . . .

Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa

19/19