Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf ·...

117
Advanced Analytics: Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University [email protected] vivekaxl.com * Configure software using less resources

Transcript of Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf ·...

Page 1: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Advanced Analytics:

Plant a (decision) TREE and save the world*!

Vivek NairNorth Carolina State University

[email protected]

* Configure software using less resources

Page 2: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Most Valuable Point

“Information is a source of learning. But unless it is organized, processed, and available to the right people in a format for decision making, it is a burden, not a benefit” -- Dr. William Pollard

Page 3: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Decision Trees - Use Cases

TAR(ZAN)2 [1]

2002

[1] Menzies, Tim, and Ying Hu. "Just enough learning (of association rules): the TAR2 “Treatment” learner." Artificial Intelligence Review 25.3 (2006): 211-229.

Page 4: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

TAR(ZAN)2 Learns Small Theories

Problem: Find picture on a page from 11 features

Page 5: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

TAR(ZAN)2 Learns Small Theories

Problem: Find picture on a page from 11 features

Page 6: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

TAR(ZAN)2 Learns Small Theories

Find picture on a page from 11 features

Page 7: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

TAR(ZAN)2 Learns Small Theories

Find picture on a page from 11 features

(34 ≤ height ≤ 86) ⋀ (3.9 ≤ mean_tr < 9.5)

Page 8: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

TAR(ZAN)2 Learns Small Theories

Find picture on a page from 11 features

(34 ≤ height ≤ 86) ⋀ (3.9 ≤ mean_tr < 9.5)

Page 9: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Decision Trees - Use Cases

TAR(ZAN)2 [1]

2002

[1] Menzies, Tim, and Ying Hu. "Just enough learning (of association rules): the TAR2 “Treatment” learner." Artificial Intelligence Review 25.3 (2006): 211-229.

Page 10: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Decision Trees - Use Cases

TAR(ZAN)2 [1]

2002SWAY[2]

2016

XTREE[4]

2017Performance Optimization[3]

2017

[1] Menzies, Tim, and Ying Hu. "Just enough learning (of association rules): the TAR2 “Treatment” learner." Artificial Intelligence Review 25.3 (2006): 211-229.[2] Nair et al. "An (accidental) exploration of alternatives to evolutionary algorithms for sbse." SSBSE- 2016.[3] Guo et al. “Variability-aware performance prediction: A statistical learning approach." ASE-2013.[4] Krishna et al.. "Less is more: Minimizing code reorganization using XTREE." IST-2017

Page 11: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Decision Trees - Use Cases

Optimization

PlanningSoftware Variability

[1] Menzies, Tim, and Ying Hu. "Just enough learning (of association rules): the TAR2 “Treatment” learner." Artificial Intelligence Review 25.3 (2006): 211-229.[2] Nair et al. "An (accidental) exploration of alternatives to evolutionary algorithms for sbse." SSBSE- 2016.[3] Guo et al. “Variability-aware performance prediction: A statistical learning approach." ASE-2013.[4] Krishna et al.. "Less is more: Minimizing code reorganization using XTREE." IST-2017

TAR(ZAN)2 [1]

2002SWAY[2]

2016

XTREE[4]

2017Performance Optimization[3]

2017

Page 12: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Decision Trees - Use Cases

Optimization

PlanningSoftware Variability

TAR(ZAN)2 [1]

2002SWAY[2]

2016

XTREE[4]

2017Performance Optimization[3]

2017

[1] Menzies, Tim, and Ying Hu. "Just enough learning (of association rules): the TAR2 “Treatment” learner." Artificial Intelligence Review 25.3 (2006): 211-229.[2] Nair et al. "An (accidental) exploration of alternatives to evolutionary algorithms for sbse." SSBSE- 2016.[3] Guo et al. “Variability-aware performance prediction: A statistical learning approach." ASE-2013.[4] Krishna et al.. "Less is more: Minimizing code reorganization using XTREE." IST-2017

Page 13: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Configurable Systems and Variability

System

Page 14: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Configurable Systems and Variability

Systeminput.y4m

Page 15: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Configurable Systems and Variability

System output.x264input.y4m

Page 16: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Configurable Systems and Variability

System output.x264

c1 c2 c3 c4 cn. . .

Configuration options

input.y4m

Page 17: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Configurable Systems and Variability

System output.x264

Non-functional behavior: response time, throughput, etc.

c1 c2 c3 c4 cn. . .

Configuration options

input.y4m

Page 18: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Configurable Systems and Variability

System output.x264

c1 c2 c3 c4 cn. . .

Non-functional behavior: response time, throughput, etc.

Configuration options

input.y4m

Page 19: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Performance Optimization is Necessary!

Page 20: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Performance Optimization is getting more Complex!

[1] Xu et. al. 2015. Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software.FSE 2015[2] Van Aken, Dana, et al. "Automatic Database Management System Tuning Through Large-scale Machine Learning." International Conference on Management of Data. ACM, 2017.

● Necessary

Page 21: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Performance Optimization is required since Default Configuration is Bad!

[1] Van Aken, Dana, et al. "Automatic Database Management System Tuning Through Large-scale Machine Learning." International Conference on Management of Data. ACM, 2017.[2] Herodotou, Herodotos, et al. "Starfish: A Self-tuning System for Big Data Analytics." CIDR

● Necessary

● Complex

Page 22: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Performance Optimization can be Expensive!

• Evaluation of single instance of software/hardware co-design problem can take weeks[1]

• Rolling Sort use-case required 21 days, within a total experimental time of about 2.5 months[2]

• Test suite generation using Evolutionary Algorithm can take weeks[3]

• Image recognition workload and speech recognition workload, jobs ran for many hours or days[4]

[1] Zuluaga, Marcela, et al. "Active learning for multi-objective optimization." International Conference on Machine Learning. 2013.[2] Jamshidi, Pooyan, and Giuliano Casale. "An uncertainty-aware approach to optimal configuration of stream processing systems."MASCOTS-2016[3] Wang, Tiantian, et al. "Searching for better configurations: a rigorous approach to clone evaluation." FSE-2013[4] Venkataraman, Shivaram, et al. "Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics." NSDI. 2016.

● Necessary

● Complex

● Default is bad

Page 24: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Performance Optimization!

● Necessary

● Complex

● Default is bad

Expensive

Pervasive

Page 25: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Performance Optimization!

● Necessary

● Complex

● Default is bad

Expensive

Pervasive

Page 26: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Road Map

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 27: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Road Map

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Residual-based Methods

Page 28: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Road Map

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Rank-based Method

Page 29: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Road Map

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based Method

Page 30: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Road Map

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Surrogate is a cheap(er) version of the actual system

APACHEhttp server

Request

Surrogate APACHE

http server

Response Time

Request

Response Time

Page 31: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Progressive Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Guo, Jianmei, et al. "Variability-aware performance prediction: A statistical learning approach." ASE-2013.

Page 32: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

How to find the ‘best performing’ configuration for any given system?

Page 33: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 34: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

40%40% 20%

Page 35: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 36: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 37: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Train

Page 38: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

TrainIF MRE < τ: ExitELSE: continue

Check Performance

Page 39: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Train Check Performance

Mean Relative Error

MRE = |actual - predicted|actual

IF MRE < τ: ExitELSE: continue

Page 40: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

TrainIF MRE < τ: ExitELSE: continue

Check Performance

Page 41: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

TrainIF MRE < τ: ExitELSE: continue

Check Performance

Page 42: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

TrainIF MRE < τ: ExitELSE: continue

Check Performance

Page 43: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

TrainIF MRE < τ: ExitELSE: continue

Check Performance

Page 44: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 45: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Use the model to find the best configuration

Page 46: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProgressive Sampling - Limitation

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

• The stopping condition is arbitrary

• Cannot estimate cost required to build a surrogate

Page 47: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Projective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Sarkar, Atri, et al. "Cost-efficient sampling for performance prediction of configurable systems." ASE 2015.

Page 48: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Given an acceptable accuracy estimate (τ), how many samples is required to build a ‘quality’ surrogate?

Page 49: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Learning Curve

Page 50: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Learning Curve

Page 51: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Learning Curve

Page 52: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Learning Curve

Steep Incline

Page 53: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Steep Incline

Gradual Incline

Learning Curve

Page 54: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Steep Incline

Gradual Incline

Pleatue

Learning Curve

Page 55: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Steep Incline

Gradual Incline

Plateau

Optimal Sample SizeLearning Curve

Page 56: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Estimates the Learning Curve

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Generate initial data points for the learning curve

Find best fit projective function

Calculate optimal training set

Build surrogate

Page 57: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Estimates the Learning Curve

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Generate initial data points for the learning curve

Find best fit projective function

Calculate optimal training set

Build surrogate

Page 58: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Estimates the Learning Curve

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Requirement: Initial samples should reflect relationship between all configuration options

Intuition: Performance depends if configuration option is selected or deselected

Heuristic: Feature Frequency - initial samples have each option selected or deselected, at least, δ times

Generate initial data points for the learning curve

Find best fit projective function

Calculate optimal training set

Build surrogate

Page 59: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 60: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 0 0 0 0

Deselected 0 0 0 0

c1 c2 c3 c4

Feature frequency table (δ=2)

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 61: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 0 0 0 0

Deselected 0 0 0 0

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 62: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 1 0 1 1

Deselected 0 1 0 0

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

1 5%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 63: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 1 0 1 1

Deselected 0 1 0 0

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

0 1 0 0

1 5%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 64: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 1 1 2 1

Deselected 1 1 0 1

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

0 1 1 0

1 5%

2 17%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 65: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 1 1 2 1

Deselected 1 1 0 1

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

0 1 1 0

1 1 0 0

1 5%

2 17%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 66: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 2 2 2 1

Deselected 1 1 1 2

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

0 1 1 0

1 1 0 0

1 5%

2 17%

3 29%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 67: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 2 2 2 1

Deselected 1 1 1 2

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

0 1 1 0

1 1 0 0

0 0 0 1

1 5%

2 17%

3 29%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 68: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool

Selected 2 2 2 2

Deselected 2 2 2 2

c1 c2 c3 c4

Train

c1 c2 c3 c4

1 0 1 1

0 1 1 0

1 1 0 0

0 0 0 1

1 5%

2 17%

3 29%

4 35%

#Samples Accuracy

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Feature frequency table (δ=2)

Page 69: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Estimates the Learning Curve

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Generate initial data points for the learning curve

Find best fit projective function

Calculate optimal training set

Build surrogate

Page 70: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

1 5%

2 17%

3 29%

4 35%

#Samples Accuracy

Page 71: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Exponential

Power

Log

Page 72: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Estimates the Learning Curve

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Generate initial data points for the learning curve

Find best fit projective function

Calculate optimal training set

Build surrogate

Page 73: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Coefficients of projective functionNumber of configurations whose performance value will be predicted

Penalty factor

Taken from Sarkar et al.

Page 74: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Estimates the Learning Curve

Residual-based MethodsProjective Sampling

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Generate initial data points for the learning curve

Find best fit projective function

Calculate optimal training set

Build surrogate

Page 75: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Residual-based MethodsProjective Sampling

Configuration Space

Training Pool Testing Pool Validation Pool

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Use the model to find the best configuration

Optimal Sample Size

Page 76: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Assumes an accurate model can be built

Residual-based MethodsProjective Sampling - Limitation

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 77: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Nair, Vivek, et al. "Using Bad Learners to find Good Configurations." FSE 2017

Page 78: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 79: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 80: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 81: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 82: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

What happens when an accurate model cannot be built?

Page 83: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based MethodCore Insights

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 84: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 85: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 86: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 87: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 88: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 89: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 90: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Page 91: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Configuration Space

Training Pool Testing Pool Validation Pool

TrainIF accuracy < τ: ExitELSE: continue

Check Performance

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 92: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model

Configuration Space

Training Pool Testing Pool Validation Pool

TrainIF accuracy < τ: ExitELSE: continue

Check Performance

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 93: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank Preserving Model - Limitation

Requires Testing Pool - 20% of configuration space

Configuration Space

Training Pool Testing Pool Validation Pool

40%40% 20%

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 94: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Future - Bayesian based Method

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Nair, Vivek et al. "FLASH: A Faster Optimizer for SBSE Tasks." preprint

Page 95: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based Method

Can we avoid measurement of configurations in the testing pool?

Page 96: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 97: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

True Performance Distribution

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 98: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

True Performance Distribution

Predicted Performance Distribution

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 99: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

True Performance Distribution

Predicted Performance Distribution

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 100: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

True Performance Distribution

Bayesian-based MethodBayesian Optimization

Predicted Performance Distribution

Acquisition Function

Which configuration should I evaluate next?

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 101: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

True Performance Distribution

Predicted Performance Distribution

Acquisition Function

Which configuration should I evaluate next?

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 102: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

Tradeoff between Exploration vs Exploitation

True Performance Distribution

Predicted Performance Distribution

Acquisition Function

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 103: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

Surrogate of choice: Gaussian Processes (GP)

True Performance Distribution

Predicted Performance Distribution

Acquisition Function

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 104: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 105: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

Taken from Dr. Nando de Freitas (tiny.cc/4tgeny)

Page 106: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

GP lose efficiency in high dimensional spaces i.e. number of features exceeds a dozen

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Bayesian-based MethodBayesian Optimization

Page 107: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

GP

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 108: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

GP

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 109: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 110: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 111: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 112: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

2013 2015

[Sarkar’15][Guo’13]

2017

[Nair’17]

Future

Page 113: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

Page 114: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Bayesian-based Method - FLASH

Page 115: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

● Fast

● Effective

● Comprehensible

Bayesian-based Method - FLASH

Page 116: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Conclusion• ML Algorithms are not a black box

– How to use Decision Tree in Planning?– Can I explain the results to the Decision

Makers?

• Lazy is good– Only do what is required– Optimization does not require an accurate

model

• Easy over Hard – Try simplest first– Tuning SVM outperforms DL

Page 117: Plant a (decision) TREE and save the world*!vivekaxl.github.io/assets/AdvancedAnalytics.pdf · Plant a (decision) TREE and save the world*! Vivek Nair North Carolina State University

Rank-based Method: http://tiny.cc/wnhenyFlash: http://tiny.cc/hohenyePAL: http://www.spiral.net/software/pal.htmlBayesian Optimization: https://youtu.be/vz3D36VXefI

"Look for me...beneath the tree...North!"

―Three-eyed raven