Hyperparameter Optimization - Sven Hafeneger

18
Spark & Machine Learning Meetup Hyperparameter Optimization - when scikit-learn meets PySpark Sven Hafeneger 27.10.2016

Transcript of Hyperparameter Optimization - Sven Hafeneger

Page 1: Hyperparameter Optimization - Sven Hafeneger

Spark & Machine Learning MeetupHyperparameter Optimization - when scikit-learn meets PySpark

Sven Hafeneger

27.10.2016

Page 2: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20232

Data Science Workflow

Wikipedia https://en.wikipedia.org/wiki Cross_Industry_Standard_Process_for_Data_Mining

Page 3: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20233

Data Science Workflow

knobs to tune !

Wikipedia https://en.wikipedia.org/wiki Cross_Industry_Standard_Process_for_Data_Mining

https://www.okwenclosures.com/en/Potentiometer-Tuning-knobs/Top-Knobs.htm

Page 4: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20234

Data Science Workflow - Modeling

Model Improves robustness

Influences complexity

Helps with class imbalances

https://www.kvraudio.com/forum/viewtopic.php?t=328938

Page 5: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20235

“… is the problem of choosing a set of hyperparameters for a learning algorithm, …” [1]

Grid search Random search …

What is Hyperparameter Optimzation?

https://openclipart.org/detail/194603/grid-search-pattern

Page 6: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20236

“… is the problem of choosing a set of hyperparameters for a learning algorithm, …” [1]

Grid search Random search …

What is Hyperparameter Optimzation?

http://25.media.tumblr.com/tumblr_lcelmoEfoX1qbl1tko1_400.jpg

Page 7: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20237

Gridsearch with scikit-learn

Build a classification model

We have some data and a classification problem

Page 8: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20238

Gridsearch with scikit-learn

Page 9: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 20239

Gridsearch with scikit-learn

… well ... yes ... overfitted !

Page 10: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202310

Gridsearch with scikit-learn

Improve test scores !

Page 11: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202311

Gridsearch with scikit-learn

~ 500 jobs~ 13 mins

Page 12: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202312

Gridsearch with scikit-learn

Return (best) model

Accuracy: 0.44 => 0.76

max_depth=15n_estimators=200

Page 13: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202313

Gridsearch with spark-sklearn

What if you have access to a Spark cluster ?

Distribute the workload on the cluster !

Page 14: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202314

Save time ! Concentrate on more important problems …

https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-apache-spark.html

Page 15: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202315

Data Science Workflow

Faster cycles !

Page 16: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202316

Try it out

Source: [6]

https://pypi.python.org

Page 17: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202317

Try it out

Page 18: Hyperparameter Optimization - Sven Hafeneger

©2015 IBM Corporation May 2, 202318

References [1]: Bergstra, James; Bengio, Yoshua (2012). "Random Search for Hyper-Parameter Optimization”, J.

Machine Learning Research. 13: 281–305.

Thanks !