Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine...

57
Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor [email protected] ESAT-STADIUS, KU Leuven iMinds Medical IT Department STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Transcript of Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine...

Page 1: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Hyperparameter Search in Machine Learning

Marc Claesen and Bart De Moor

[email protected]

ESAT-STADIUS, KU LeuveniMinds Medical IT Department

STADIUSCenter for Dynamical Systems,

Signal Processing and Data Analytics

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 2: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Outline

1 Introduction

2 Example: optimizing hyperparameters for an SVM classifier

3 Challenges in hyperparameter search

4 State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 3: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Machine learning

Methods capable of learning patterns of interest from data.

by formulating the learning task as an optimization problem

Machine learning is situated on the intersection of various fields:

statistics, computer science, optimization, (biology), . . .

The field encompasses learning methods with various origins, e.g.:

biology, e.g. neural networks [1]

convex optimization, e.g. support vector machines [2]

statistics, e.g. hidden Markov models [3]

tensor decompositions, e.g. recommender systems [4]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 4: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Machine learning

Methods capable of learning patterns of interest from data.

by formulating the learning task as an optimization problem

Machine learning is situated on the intersection of various fields:

statistics, computer science, optimization, (biology), . . .

The field encompasses learning methods with various origins, e.g.:

biology, e.g. neural networks [1]

convex optimization, e.g. support vector machines [2]

statistics, e.g. hidden Markov models [3]

tensor decompositions, e.g. recommender systems [4]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 5: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Machine learning

Methods capable of learning patterns of interest from data.

by formulating the learning task as an optimization problem

Machine learning is situated on the intersection of various fields:

statistics, computer science, optimization, (biology), . . .

The field encompasses learning methods with various origins, e.g.:

biology, e.g. neural networks [1]

convex optimization, e.g. support vector machines [2]

statistics, e.g. hidden Markov models [3]

tensor decompositions, e.g. recommender systems [4]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 6: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Hyperparameter search

Most machine learning methods are (hyper)parameterized.

e.g. Occam’s razor: model complexity and overfitting

Hyperparameters can significantly impact performance

suitable hyperparameters must be determined for each task

occurs in both supervised and unsupervised learning→ need for disciplined, automated optimization methods

Some examples:

SVM: regularization and kernel hyperparameters

ANN: regularization, network architecture, transfer functions

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 7: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Hyperparameter search

Most machine learning methods are (hyper)parameterized.

e.g. Occam’s razor: model complexity and overfitting

Hyperparameters can significantly impact performance

suitable hyperparameters must be determined for each task

occurs in both supervised and unsupervised learning→ need for disciplined, automated optimization methods

Some examples:

SVM: regularization and kernel hyperparameters

ANN: regularization, network architecture, transfer functions

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 8: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Hyperparameter search

Most machine learning methods are (hyper)parameterized.

e.g. Occam’s razor: model complexity and overfitting

Hyperparameters can significantly impact performance

suitable hyperparameters must be determined for each task

occurs in both supervised and unsupervised learning→ need for disciplined, automated optimization methods

Some examples:

SVM: regularization and kernel hyperparameters

ANN: regularization, network architecture, transfer functions

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 9: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components:

a learning algorithm A, parameterized by hyperparameters λ

training and test data X(tr), X(te)

a model M = A(X(tr) | λ)

loss function L to assess quality of M, typically using X(te):L(M | X(te))

In optimization terms, we aim to find λ∗ (assuming minimization):

λ∗ = arg minλL(A(X(tr) | λ) | X(te)

)= arg min

λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸

objective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 10: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components:

a learning algorithm A, parameterized by hyperparameters λ

training and test data X(tr), X(te)

a model M = A(X(tr) | λ)

loss function L to assess quality of M, typically using X(te):L(M | X(te))

In optimization terms, we aim to find λ∗ (assuming minimization):

λ∗ = arg minλL(A(X(tr) | λ) | X(te)

)= arg min

λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸

objective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 11: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components:

a learning algorithm A, parameterized by hyperparameters λ

training and test data X(tr), X(te)

a model M = A(X(tr) | λ)

loss function L to assess quality of M, typically using X(te):L(M | X(te))

In optimization terms, we aim to find λ∗ (assuming minimization):

λ∗ = arg minλL(A(X(tr) | λ) | X(te)

)= arg min

λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸

objective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 12: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components:

a learning algorithm A, parameterized by hyperparameters λ

training and test data X(tr), X(te)

a model M = A(X(tr) | λ)

loss function L to assess quality of M, typically using X(te):L(M | X(te))

In optimization terms, we aim to find λ∗ (assuming minimization):

λ∗ = arg minλL(A(X(tr) | λ) | X(te)

)= arg min

λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸

objective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 13: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components:

a learning algorithm A, parameterized by hyperparameters λ

training and test data X(tr), X(te)

a model M = A(X(tr) | λ)

loss function L to assess quality of M, typically using X(te):L(M | X(te))

In optimization terms, we aim to find λ∗ (assuming minimization):

λ∗ = arg minλL(A(X(tr) | λ) | X(te)

)

= arg minλF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸

objective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 14: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components:

a learning algorithm A, parameterized by hyperparameters λ

training and test data X(tr), X(te)

a model M = A(X(tr) | λ)

loss function L to assess quality of M, typically using X(te):L(M | X(te))

In optimization terms, we aim to find λ∗ (assuming minimization):

λ∗ = arg minλL(A(X(tr) | λ) | X(te)

)= arg min

λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸

objective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 15: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Tuning in practice

Most often done using a combination of grid and manual search:

grid search suffers from the curse of dimensionality

manual tuning leads to poor reproducibility

Better solutions exist but lack adoption because:

potential performance improvements are underestimated

lack of availability and/or ease of use

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 16: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Tuning in practice

Most often done using a combination of grid and manual search:

grid search suffers from the curse of dimensionality

manual tuning leads to poor reproducibility

Better solutions exist but lack adoption because:

potential performance improvements are underestimated

lack of availability and/or ease of use

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 17: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Outline

1 Introduction

2 Example: optimizing hyperparameters for an SVM classifier

3 Challenges in hyperparameter search

4 State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 18: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Support vector machine (SVM) classifiers

minα,ξ,b

1

2

∑i∈SV

∑j∈SV

αiαjyiyjκ(xi , xj) + Cn∑

i=1

ξi ,

subject to yi( ∑j∈SV

αiαjyiyjκ(xi , xj) + b)≥ 1− ξi , ξi ≥ 0, ∀i .

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 19: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Support vector machine (SVM) classifiers

minα,ξ,b

1

2

∑i∈SV

∑j∈SV

αiαjyiyjκ(xi , xj) + Cn∑

i=1

ξi ,

subject to yi( ∑j∈SV

αiαjyiyjκ(xi , xj) + b)≥ 1− ξi , ξi ≥ 0, ∀i .

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 20: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Support vector machine (SVM) classifiers

minα,ξ,b

1

2

∑i∈SV

∑j∈SV

αiαjyiyjκ(xi , xj) + Cn∑

i=1

ξi ,

subject to yi( ∑j∈SV

αiαjyiyjκ(xi , xj) + b)≥ 1− ξi , ξi ≥ 0, ∀i .

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 21: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Task: optimize hyperparameters for an SVM

Tune an SVM classifier with RBF kernel κ(u, v) = e−γ‖u−v‖2:

minα,b,ξ

1

2

∑i∈SV

∑j∈SV

αiαjyiyj exp(− γ‖xi − xj‖2

)︸ ︷︷ ︸

‖w‖2

+C∑i∈SV

ξi

optimize regularization parameter C and kernel parameter γevaluate (C , γ) pair using 2× iterated 10-fold cross-validationvia Optunity’s particle swarm optimizer [5]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 22: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Task: optimize hyperparameters for an SVM

Tune an SVM classifier with RBF kernel κ(u, v) = e−γ‖u−v‖2:

minα,b,ξ

1

2

∑i∈SV

∑j∈SV

αiαjyiyj exp(− γ‖xi − xj‖2

)︸ ︷︷ ︸

‖w‖2

+C∑i∈SV

ξi

optimize regularization parameter C and kernel parameter γevaluate (C , γ) pair using 2× iterated 10-fold cross-validationvia Optunity’s particle swarm optimizer [5]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 23: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Task: optimize hyperparameters for an SVM

Tune an SVM classifier with RBF kernel κ(u, v) = e−γ‖u−v‖2:

minα,b,ξ

1

2

∑i∈SV

∑j∈SV

αiαjyiyj exp(− γ‖xi − xj‖2

)︸ ︷︷ ︸

‖w‖2

+C∑i∈SV

ξi

optimize regularization parameter C and kernel parameter γevaluate (C , γ) pair using 2× iterated 10-fold cross-validationvia Optunity’s particle swarm optimizer [5]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 24: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Response surface I

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 25: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Response surface II

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 26: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Outline

1 Introduction

2 Example: optimizing hyperparameters for an SVM classifier

3 Challenges in hyperparameter search

4 State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 27: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Expensive function evaluations

A single objective function evaluation consists of:

1 training a model via the learning methodcan be very time consuming (days up to weeks! [6, 7, 8])

2 predict a test set (for supervised methods)

3 compute some evaluation metric for the model/its predictions

All of the above is often done in cross-validation [9, 10].

used to reliably estimate generalization performance

involves many repetitions → exacerbates computation time

Training/evaluation time is a function of hyperparameter choice!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 28: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Expensive function evaluations

A single objective function evaluation consists of:

1 training a model via the learning methodcan be very time consuming (days up to weeks! [6, 7, 8])

2 predict a test set (for supervised methods)

3 compute some evaluation metric for the model/its predictions

All of the above is often done in cross-validation [9, 10].

used to reliably estimate generalization performance

involves many repetitions → exacerbates computation time

Training/evaluation time is a function of hyperparameter choice!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 29: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Expensive function evaluations

A single objective function evaluation consists of:

1 training a model via the learning methodcan be very time consuming (days up to weeks! [6, 7, 8])

2 predict a test set (for supervised methods)

3 compute some evaluation metric for the model/its predictions

All of the above is often done in cross-validation [9, 10].

used to reliably estimate generalization performance

involves many repetitions → exacerbates computation time

Training/evaluation time is a function of hyperparameter choice!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 30: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Randomness

The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps

This gives rise to a stochastic component, inherent to:

the learning method (e.g. resampling methods [11, 12, 13])

random sampling (e.g. cross-validation, bootstrap [10, 9])

The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results

Empirical optimum might not really be best!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 31: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Randomness

The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps

This gives rise to a stochastic component, inherent to:

the learning method (e.g. resampling methods [11, 12, 13])

random sampling (e.g. cross-validation, bootstrap [10, 9])

The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results

Empirical optimum might not really be best!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 32: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Randomness

The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps

This gives rise to a stochastic component, inherent to:

the learning method (e.g. resampling methods [11, 12, 13])

random sampling (e.g. cross-validation, bootstrap [10, 9])

The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results

Empirical optimum might not really be best!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 33: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Randomness

The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps

This gives rise to a stochastic component, inherent to:

the learning method (e.g. resampling methods [11, 12, 13])

random sampling (e.g. cross-validation, bootstrap [10, 9])

The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results

Empirical optimum might not really be best!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 34: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Exotic search spaces

Hyperparameter search spaces can be extremely complex:

mixed integer-continuous (e.g. regularization & kernel)

often domain constrained (e.g. positive regularization)

combinatorial (e.g. feature selection)

conditional dimensions (*)

(*) Consider the architecture of an artificial neural network:

number of hidden layers

size per hidden layer

(transfer functions per layer)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 35: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Exotic search spaces

Hyperparameter search spaces can be extremely complex:

mixed integer-continuous (e.g. regularization & kernel)

often domain constrained (e.g. positive regularization)

combinatorial (e.g. feature selection)

conditional dimensions (*)

(*) Consider the architecture of an artificial neural network:

number of hidden layers

size per hidden layer

(transfer functions per layer)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 36: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Desiderata for hyperparameter optimizers

Optimization routines for hyperparameter search are ideally:

efficient in terms of function evaluations,

appropriate for wildly varying objective functions,

able to account for randomness,

flexible in terms of search space,

parallelizable.

The practical performance bottleneck is evaluating F → decidingon the next point to evaluate need not be fast

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 37: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Outline

1 Introduction

2 Example: optimizing hyperparameters for an SVM classifier

3 Challenges in hyperparameter search

4 State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 38: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Sequential model-based optimization (SMBO)

Commonly used for time-consuming objective functions F [14, 15].

SMBO is an iterative approach, in which each iteration involves:

1 model the response surface M, based on previous evaluations→ evaluating M is cheap, use M as surrogate for F

2 find optimal test point x∗ based on M→ optimize some criterion, e.g. expected improvement [16]

Approaches differ in terms of model and criterion [14, 15, 17].

But: inherently sequential!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 39: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Sequential model-based optimization (SMBO)

Commonly used for time-consuming objective functions F [14, 15].

SMBO is an iterative approach, in which each iteration involves:

1 model the response surface M, based on previous evaluations→ evaluating M is cheap, use M as surrogate for F

2 find optimal test point x∗ based on M→ optimize some criterion, e.g. expected improvement [16]

Approaches differ in terms of model and criterion [14, 15, 17].

But: inherently sequential!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 40: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Metaheuristic optimization techniques

A large variety of metaheuristic methods have been used, such as:

particle swarm optimization [18, 19, 20]

genetic algorithms [21, 22]

artificial bee colony [23]

harmonic search [24]

simulated annealing [25]

Nelder-Mead simplex [26]

Advantages:

ease of implementation and parallelization

general purpose solvers → few implicit assumptions

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 41: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Software

Several packages offer Bayesian SMBO approaches:

Hyperopt [27], Spearmint [17]

ParamILS [28], AutoWEKA [29]

BayesOpt [30], DiceKriging [31]

Optunity offers fundamentally distinct methods [5]:

focus on metaheuristic techniques not offered elsewhere

PSO, CMA-ES, random search, sobol sequences, . . .

multiplatform: Python, R, MATLAB, Octave

General purpose optimization libraries also applicable→ but often difficult to integrate in machine learning pipeline

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 42: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Software

Several packages offer Bayesian SMBO approaches:

Hyperopt [27], Spearmint [17]

ParamILS [28], AutoWEKA [29]

BayesOpt [30], DiceKriging [31]

Optunity offers fundamentally distinct methods [5]:

focus on metaheuristic techniques not offered elsewhere

PSO, CMA-ES, random search, sobol sequences, . . .

multiplatform: Python, R, MATLAB, Octave

General purpose optimization libraries also applicable→ but often difficult to integrate in machine learning pipeline

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 43: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Metaheuristic methods are competitive to SMBO

Optunity’s standard PSO [5] versus Hyperopt’s tree-structuredParzen estimator [15, 27] on two-dimensional rastrigin function.

1 100 200 300 400 500

100

101

winnerfunction evaluation number

error

random search

tree of Parzen estimators

particle swarm optimization

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 44: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Conclusion

Hyperparameter search in machine learning

requires disciplined optimization methods

is receiving a lot of research attention, e.g. ChaLearn AutoML

The main challenges are:

expensive function evaluations with a stochastic component

exotic search spaces

Hyperparameter search is an interesting optimization problem→ metaheuristic optimization methods are good candidates

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 45: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Conclusion

Hyperparameter search in machine learning

requires disciplined optimization methods

is receiving a lot of research attention, e.g. ChaLearn AutoML

The main challenges are:

expensive function evaluations with a stochastic component

exotic search spaces

Hyperparameter search is an interesting optimization problem→ metaheuristic optimization methods are good candidates

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 46: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Conclusion

Hyperparameter search in machine learning

requires disciplined optimization methods

is receiving a lot of research attention, e.g. ChaLearn AutoML

The main challenges are:

expensive function evaluations with a stochastic component

exotic search spaces

Hyperparameter search is an interesting optimization problem→ metaheuristic optimization methods are good candidates

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 47: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

Acknowledgements

Research Council KU Leuven: GOA/10/09 MaNet

Flemish Government:

FWO: projects: G.0871.12N (Neural circuits)IWT: TBM-Logic Insulin(100793), TBM RectalCancer(100783), TBM IETA(130256); PhD grant (111065)Industrial Research fund (IOF): IOF/HB/13/027 Logic InsuliniMinds Medical Information Technologies SBO 2014VLK Stichting E. van der Schueren: rectal cancer

Federal Government: FOD: Cancer Plan 2012-2015KPC-29-023 (prostate)

COST: Action: BM1104: Mass Spectrometry Imaging

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 48: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References I

[1] Simon Haykin and Neural Network. A comprehensivefoundation. Neural Networks, 2(2004), 2004.

[2] Corinna Cortes and Vladimir Vapnik. Support-vectornetworks. Machine learning, 20(3):273–297, 1995.

[3] Lawrence Rabiner. A tutorial on hidden markov models andselected applications in speech recognition. Proceedings ofthe IEEE, 77(2):257–286, 1989.

[4] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas,and Nuria Oliver. Multiverse recommendation: n-dimensionaltensor factorization for context-aware collaborative filtering.In Proceedings of the fourth ACM conference onRecommender systems, pages 79–86. ACM, 2010.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 49: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References II

[5] Marc Claesen, Jaak Simm, Dusan Popovic, Yves Moreau, andBart De Moor. Easy hyperparameter search using Optunity.arXiv preprint arXiv:1412.1114, 2014.

[6] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.Imagenet classification with deep convolutional neuralnetworks. In Advances in neural information processingsystems, pages 1097–1105, 2012.

[7] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen,Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker,Ke Yang, Quoc V Le, et al. Large scale distributed deepnetworks. In Advances in Neural Information ProcessingSystems, pages 1223–1231, 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 50: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References III

[8] Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence tosequence learning with neural networks. In Advances in NeuralInformation Processing Systems, pages 3104–3112, 2014.

[9] Bradley Efron and Gail Gong. A leisurely look at thebootstrap, the jackknife, and cross-validation. The AmericanStatistician, 37(1):36–48, 1983.

[10] Ron Kohavi. A study of cross-validation and bootstrap foraccuracy estimation and model selection. In InternationalJoint Conference on Artificial Intelligence, volume 14, pages1137–1145, 1995.

[11] Leo Breiman. Random forests. Machine learning, 45(1):5–32,2001.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 51: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References IV

[12] Marc Claesen, Frank De Smet, Johan A.K. Suykens, andBart De Moor. EnsembleSVM: A library for ensemble learningusing support vector machines. Journal of Machine LearningResearch, 15:141–145, 2014.

[13] Marc Claesen, Frank De Smet, Johan AK Suykens, and BartDe Moor. A robust ensemble approach to learn from positiveand unlabeled data using svm base models. Neurocomputing,160:73–84, 2015.

[14] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown.Sequential model-based optimization for general algorithmconfiguration. In Learning and Intelligent Optimization, pages507–523. Springer, 2011.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 52: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References V

[15] James S Bergstra, Remi Bardenet, Yoshua Bengio, and BalazsKegl. Algorithms for hyper-parameter optimization. InAdvances in Neural Information Processing Systems, pages2546–2554, 2011.

[16] Donald R Jones, Matthias Schonlau, and William J Welch.Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998.

[17] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. PracticalBayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems, pages2951–2959, 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 53: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References VI

[18] Michael Meissner, Michael Schmuker, and Gisbert Schneider.Optimized particle swarm optimization (opso) and itsapplication to artificial neural network training. BMCbioinformatics, 7(1):125, 2006.

[19] XC Guo, JH Yang, CG Wu, CY Wang, and YC Liang. A novells-svms hyper-parameter selection based on particle swarmoptimization. Neurocomputing, 71(16):3211–3215, 2008.

[20] Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, andZne-Jung Lee. Particle swarm optimization for parameterdetermination and feature selection of support vectormachines. Expert systems with applications,35(4):1817–1824, 2008.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 54: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References VII

[21] Jinn-Tsong Tsai, Jyh-Horng Chou, and Tung-Kuan Liu.Tuning the structure and parameters of a neural network byusing hybrid taguchi-genetic algorithm. Neural Networks,IEEE Transactions on, 17(1):69–80, 2006.

[22] Carlos Ansotegui, Meinolf Sellmann, and Kevin Tierney. Agender-based genetic algorithm for the automaticconfiguration of algorithms. In Principles and Practice ofConstraint Programming-CP 2009, pages 142–157. Springer,2009.

[23] Dervis Karaboga, Bahriye Akay, and Celal Ozturk. Artificialbee colony (abc) optimization algorithm for trainingfeed-forward neural networks. In Modeling decisions forartificial intelligence, pages 318–329. Springer, 2007.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 55: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References VIII

[24] Joao P Papa, Gustavo H Rosa, Aparecido N Marana, WalterScheirer, and David D Cox. Model selection for DiscriminativeRestricted Boltzmann Machines through meta-heuristictechniques. Journal of Computational Science, 9:14–18, 2015.

[25] Samuel Xavier-de Souza, Johan AK Suykens, JoosVandewalle, and Desire Bolle. Coupled simulated annealing.Systems, Man, and Cybernetics, Part B: Cybernetics, IEEETransactions on, 40(2):320–335, 2010.

[26] Gavin C Cawley and Nicola LC Talbot. Fast exactleave-one-out cross-validation of sparse least-squares supportvector machines. Neural networks, 17(10):1467–1475, 2004.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 56: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References IX

[27] James Bergstra, Dan Yamins, and David D Cox. Hyperopt: Apython library for optimizing the hyperparameters of machinelearning algorithms. In Proceedings of the 12th Python inScience Conference, pages 13–20. SciPy, 2013.

[28] Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, andThomas Stutzle. ParamILS: an automatic algorithmconfiguration framework. Journal of Artificial IntelligenceResearch, 36(1):267–306, 2009.

[29] Chris Thornton, Frank Hutter, Holger H. Hoos, and KevinLeyton-Brown. Auto-WEKA: Automated selection andhyper-parameter optimization of classification algorithms.CoRR, abs/1208.3719, 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

Page 57: Hyperparameter Search in Machine Learningclaesenm/optunity/varia/...Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be ESAT-STADIUS,

IntroductionExample: optimizing hyperparameters for an SVM classifier

Challenges in hyperparameter searchState-of-the-art

References

References X

[30] Ruben Martinez-Cantin. BayesOpt: A Bayesian optimizationlibrary for nonlinear optimization, experimental design andbandits. arXiv preprint arXiv:1405.7430, 2014.

[31] Olivier Roustant, David Ginsbourger, Yves Deville, et al.DiceKriging, DiceOptim: Two R packages for the analysis ofcomputer experiments by kriging-based metamodeling andoptimization. 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning