Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...

Post on 09-Jul-2020

1 views 0 download

Transcript of Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Instance Spaces for Objective Assessment ofAlgorithms and Benchmark Test Suites

Kate Smith-Miles

School of Mathematics and StatisticsUniversity of Melbourne

Instance Spaces for Performance Evaluation 1 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Acknowledgements

This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.

The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan

The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar

Instance Spaces for Performance Evaluation 2 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Acknowledgements

This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.

The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan

The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar

Instance Spaces for Performance Evaluation 2 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Acknowledgements

This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.

The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan

The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar

Instance Spaces for Performance Evaluation 2 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

The Importance of Test Instances

Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)

NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.

The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.

Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features

Reference

Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

Instance Spaces for Performance Evaluation 3 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

The Importance of Test Instances

Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)

NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.

The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.

Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features

Reference

Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

Instance Spaces for Performance Evaluation 3 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

The Importance of Test Instances

Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)

NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.

The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.

Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features

Reference

Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

Instance Spaces for Performance Evaluation 3 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

The Importance of Test Instances

Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)

NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.

The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.

Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features

Reference

Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

Instance Spaces for Performance Evaluation 3 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

The Importance of Test Instances

Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)

NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.

The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.

Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features

Reference

Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

Instance Spaces for Performance Evaluation 3 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Travelling Salesman Problem (TSP) Example

Easy Hard

Instance Spaces for Performance Evaluation 4 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

What makes the TSP easy or hard?

A TSP Formulation (not the only one)

Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise

minimiseN

∑i=1

N

∑j=1

Di ,jXi ,j

subject to

∑i

Xi ,j = 1 ∀j

∑j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D

Instance Spaces for Performance Evaluation 5 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

What makes the TSP easy or hard?

A TSP Formulation (not the only one)

Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise

minimiseN

∑i=1

N

∑j=1

Di ,jXi ,j

subject to

∑i

Xi ,j = 1 ∀j

∑j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D

Instance Spaces for Performance Evaluation 5 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

What makes the TSP easy or hard?

A TSP Formulation (not the only one)

Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise

minimiseN

∑i=1

N

∑j=1

Di ,jXi ,j

subject to

∑i

Xi ,j = 1 ∀j

∑j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D

Instance Spaces for Performance Evaluation 5 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

What makes the TSP easy or hard?

A TSP Formulation (not the only one)

Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise

minimiseN

∑i=1

N

∑j=1

Di ,jXi ,j

subject to

∑i

Xi ,j = 1 ∀j

∑j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D

Instance Spaces for Performance Evaluation 5 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Questions

How do instance features help us understand the strengths andweaknesses of algorithms?

How can we infer and visualise algorithm performance across ahuge �instance space�?

How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?

How can we measure objectively the relative performance ofalgorithms?

How can we generate new test instances to gain insights intoalgorithmic power?

Instance Spaces for Performance Evaluation 6 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Questions

How do instance features help us understand the strengths andweaknesses of algorithms?

How can we infer and visualise algorithm performance across ahuge �instance space�?

How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?

How can we measure objectively the relative performance ofalgorithms?

How can we generate new test instances to gain insights intoalgorithmic power?

Instance Spaces for Performance Evaluation 6 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Questions

How do instance features help us understand the strengths andweaknesses of algorithms?

How can we infer and visualise algorithm performance across ahuge �instance space�?

How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?

How can we measure objectively the relative performance ofalgorithms?

How can we generate new test instances to gain insights intoalgorithmic power?

Instance Spaces for Performance Evaluation 6 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Questions

How do instance features help us understand the strengths andweaknesses of algorithms?

How can we infer and visualise algorithm performance across ahuge �instance space�?

How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?

How can we measure objectively the relative performance ofalgorithms?

How can we generate new test instances to gain insights intoalgorithmic power?

Instance Spaces for Performance Evaluation 6 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Questions

How do instance features help us understand the strengths andweaknesses of algorithms?

How can we infer and visualise algorithm performance across ahuge �instance space�?

How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?

How can we measure objectively the relative performance ofalgorithms?

How can we generate new test instances to gain insights intoalgorithmic power?

Instance Spaces for Performance Evaluation 6 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Aims

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Instance Spaces for Performance Evaluation 7 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Algorithm Selection Problem, Rice (1976)

Instance Spaces for Performance Evaluation 8 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications of Rice's Framework

Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).

Reference

Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.

It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).

Reference

Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.

Instance Spaces for Performance Evaluation 9 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications of Rice's Framework

Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).

Reference

Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.

It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).

Reference

Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.

Instance Spaces for Performance Evaluation 9 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications of Rice's Framework

Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).

Reference

Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.

It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).

Reference

Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.

Instance Spaces for Performance Evaluation 9 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications of Rice's Framework

Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).

Reference

Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.

It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).

Reference

Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.

Instance Spaces for Performance Evaluation 9 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Instance Spaces for Performance Evaluation 10 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Instance Spaces for Performance Evaluation 10 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Instance Spaces for Performance Evaluation 10 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Instance Spaces for Performance Evaluation 10 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Instance Spaces for Performance Evaluation 10 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Instance Spaces for Performance Evaluation 10 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

MotivationAimsFramework

Extending Rice's Framework

Instance Spaces for Performance Evaluation 11 / 89

{I,F,Y,A} is the meta-data from which we learn

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 1: Collect meta-data {I,F,Y,A}

What makes the problem hard?

What features capture the di�culty of instances?

Which instances show su�cient diversity in features as well asalgorithm performance?

Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?

What performance metric(s) is most relevant?

Instance Spaces for Performance Evaluation 12 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 1: Collect meta-data {I,F,Y,A}

What makes the problem hard?

What features capture the di�culty of instances?

Which instances show su�cient diversity in features as well asalgorithm performance?

Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?

What performance metric(s) is most relevant?

Instance Spaces for Performance Evaluation 12 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 1: Collect meta-data {I,F,Y,A}

What makes the problem hard?

What features capture the di�culty of instances?

Which instances show su�cient diversity in features as well asalgorithm performance?

Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?

What performance metric(s) is most relevant?

Instance Spaces for Performance Evaluation 12 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 1: Collect meta-data {I,F,Y,A}

What makes the problem hard?

What features capture the di�culty of instances?

Which instances show su�cient diversity in features as well asalgorithm performance?

Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?

What performance metric(s) is most relevant?

Instance Spaces for Performance Evaluation 12 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 1: Collect meta-data {I,F,Y,A}

What makes the problem hard?

What features capture the di�culty of instances?

Which instances show su�cient diversity in features as well asalgorithm performance?

Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?

What performance metric(s) is most relevant?

Instance Spaces for Performance Evaluation 12 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 2: Create instance space

Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?

Which features should be selected?

Can the selected features accurately predict algorithmperformance?

Instance Spaces for Performance Evaluation 13 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 2: Create instance space

Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?

Which features should be selected?

Can the selected features accurately predict algorithmperformance?

Instance Spaces for Performance Evaluation 13 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 2: Create instance space

Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?

Which features should be selected?

Can the selected features accurately predict algorithmperformance?

Instance Spaces for Performance Evaluation 13 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses

In which parts of the space is an algorithm expected toperform well or poorly?

How large is its footprint, relative to other algorithms?

Does its footprint overlap real-world instances?

Is it unique anywhere?

Instance Spaces for Performance Evaluation 14 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses

In which parts of the space is an algorithm expected toperform well or poorly?

How large is its footprint, relative to other algorithms?

Does its footprint overlap real-world instances?

Is it unique anywhere?

Instance Spaces for Performance Evaluation 14 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses

In which parts of the space is an algorithm expected toperform well or poorly?

How large is its footprint, relative to other algorithms?

Does its footprint overlap real-world instances?

Is it unique anywhere?

Instance Spaces for Performance Evaluation 14 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses

In which parts of the space is an algorithm expected toperform well or poorly?

How large is its footprint, relative to other algorithms?

Does its footprint overlap real-world instances?

Is it unique anywhere?

Instance Spaces for Performance Evaluation 14 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 4: Generate new test instances to �ll gaps in theinstance space

Is there a theoretical boundary beyond which instances can'texist?

Where are the benchmark instances located?

How diverse and challenging are they?

How can we set target points in the instance space and evolvenew instances?

Which target points could provide important new informationto in�uence our assessment?

Return to STEP 1 to revisit if features distinguish newinstances

Instance Spaces for Performance Evaluation 15 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

STEP 4: Generate new test instances to �ll gaps in theinstance space

Is there a theoretical boundary beyond which instances can'texist?

Where are the benchmark instances located?

How diverse and challenging are they?

How can we set target points in the instance space and evolvenew instances?

Which target points could provide important new informationto in�uence our assessment?

Return to STEP 1 to revisit if features distinguish newinstances

Instance Spaces for Performance Evaluation 15 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring

Instance Spaces for Performance Evaluation 16 / 89

Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour

Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)

NP-hard problem → numerousheuristics for large n

Many applications, such as timetablingwhere edges represent con�ictsbetween events

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring

Instance Spaces for Performance Evaluation 16 / 89

Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour

Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)

NP-hard problem → numerousheuristics for large n

Many applications, such as timetablingwhere edges represent con�ictsbetween events

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring

Instance Spaces for Performance Evaluation 16 / 89

Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour

Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)

NP-hard problem → numerousheuristics for large n

Many applications, such as timetablingwhere edges represent con�ictsbetween events

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring

Instance Spaces for Performance Evaluation 16 / 89

Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour

Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)

NP-hard problem → numerousheuristics for large n

Many applications, such as timetablingwhere edges represent con�ictsbetween events

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

What makes graph colouring hard?

In total we have 18 features that describe a graph instanceG (V ,E )

5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to

the number of possible edges.I Mean node degree: the degree of a node is the number of

connections a node has to other nodes.I SD of node degree: the average node degree and its standard

deviation can give us an idea of how connected a graph is.

Instance Spaces for Performance Evaluation 17 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

What makes graph colouring hard?

In total we have 18 features that describe a graph instanceG (V ,E )

5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to

the number of possible edges.I Mean node degree: the degree of a node is the number of

connections a node has to other nodes.I SD of node degree: the average node degree and its standard

deviation can give us an idea of how connected a graph is.

Instance Spaces for Performance Evaluation 17 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

What makes graph colouring hard?

In total we have 18 features that describe a graph instanceG (V ,E )

5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to

the number of possible edges.I Mean node degree: the degree of a node is the number of

connections a node has to other nodes.I SD of node degree: the average node degree and its standard

deviation can give us an idea of how connected a graph is.

Instance Spaces for Performance Evaluation 17 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

What makes graph colouring hard?

In total we have 18 features that describe a graph instanceG (V ,E )

5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to

the number of possible edges.I Mean node degree: the degree of a node is the number of

connections a node has to other nodes.I SD of node degree: the average node degree and its standard

deviation can give us an idea of how connected a graph is.

Instance Spaces for Performance Evaluation 17 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

What makes graph colouring hard?

In total we have 18 features that describe a graph instanceG (V ,E )

5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to

the number of possible edges.I Mean node degree: the degree of a node is the number of

connections a node has to other nodes.I SD of node degree: the average node degree and its standard

deviation can give us an idea of how connected a graph is.

Instance Spaces for Performance Evaluation 17 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

Instance Spaces for Performance Evaluation 18 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Instance Spaces for Performance Evaluation 19 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Instance Spaces for Performance Evaluation 19 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Instance Spaces for Performance Evaluation 19 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Instance Spaces for Performance Evaluation 19 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Instance Spaces for Performance Evaluation 19 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph features (continued)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Instance Spaces for Performance Evaluation 19 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Instances

We use a set of 6788 instances from a variety of well-studiedsources, and others we have generated to explore bipartivity

DataSet # instances Description

B 1000 Bipartivity ControlledC1 1000 Culberson: cycle-drivenC2 932 Culberson: geometricC3 1000 Culberson: girth and degree inhibitedC4 1000 Culberson: IID edge probabilitiesC5 1000 Culberson: weight-biasedD 743 DIMACS instancesE 20 Social Network graphsF 80 Sports SchedulingG 13 Exam Timetabling

Instance Spaces for Performance Evaluation 20 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Instance Spaces for Performance Evaluation 21 / 89

HEA reported as bestoverall

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Creating the Instance Space: Process

Examine correlations to eliminate useless features

Label instances as easy or hard based on algorithm portfolio

Project instances from Rm feature space to 2-d

Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances

Instance Spaces for Performance Evaluation 22 / 89

98% variation explainedby top 2 axes

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Creating the Instance Space: Process

Examine correlations to eliminate useless features

Label instances as easy or hard based on algorithm portfolio

Project instances from Rm feature space to 2-d

Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances

Instance Spaces for Performance Evaluation 22 / 89

98% variation explainedby top 2 axes

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Creating the Instance Space: Process

Examine correlations to eliminate useless features

Label instances as easy or hard based on algorithm portfolio

Project instances from Rm feature space to 2-d

Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances

Instance Spaces for Performance Evaluation 22 / 89

98% variation explainedby top 2 axes

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Creating the Instance Space: Process

Examine correlations to eliminate useless features

Label instances as easy or hard based on algorithm portfolio

Project instances from Rm feature space to 2-d

Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances

Instance Spaces for Performance Evaluation 22 / 89

98% variation explainedby top 2 axes

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Visualising the instance space

Instance Spaces for Performance Evaluation 23 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning goodness of algorithm performance

Acknowledging the arbitrariness of this de�nition, here wede�ne an algorithm's performance to be �good� if the gapbetween the number of colors its needs to color the graphcompared to the portfolio's winner is less than ε% within a�xed computational budget of 5×1010 constraint checks.

We consider cases where ε = 0 (the algorithm is best) andε = 0.05 (within 5% of the best).

Instance Spaces for Performance Evaluation 24 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning goodness of algorithm performance

Acknowledging the arbitrariness of this de�nition, here wede�ne an algorithm's performance to be �good� if the gapbetween the number of colors its needs to color the graphcompared to the portfolio's winner is less than ε% within a�xed computational budget of 5×1010 constraint checks.

We consider cases where ε = 0 (the algorithm is best) andε = 0.05 (within 5% of the best).

Instance Spaces for Performance Evaluation 24 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Footprints with ε = 0 (blue is good)

Instance Spaces for Performance Evaluation 25 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning di�culty of instances

If less than a given fraction β of the 8 algorithms �nd aninstance easy, then we label the instance as hard for theportfolio of algorithmsI e.g. if β = 0.5 then an instance will be labelled hard if less than

half (only 1, 2 or 3 of the total eight algorithms) �nd it easy

It is important that we understand where good algorithmperformance is uninteresting (if all algorithms �nd theinstances easy) or interesting (if other algorithms struggle)

Instance Spaces for Performance Evaluation 26 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning di�culty of instances

If less than a given fraction β of the 8 algorithms �nd aninstance easy, then we label the instance as hard for theportfolio of algorithmsI e.g. if β = 0.5 then an instance will be labelled hard if less than

half (only 1, 2 or 3 of the total eight algorithms) �nd it easy

It is important that we understand where good algorithmperformance is uninteresting (if all algorithms �nd theinstances easy) or interesting (if other algorithms struggle)

Instance Spaces for Performance Evaluation 26 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

How many algorithms �nd an instance hard? (α = 0)

Instance Spaces for Performance Evaluation 27 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning Boundary of Algorithm Footprints

For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of

expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through

out-of-sample testing

Instance Spaces for Performance Evaluation 28 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning Boundary of Algorithm Footprints

For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of

expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through

out-of-sample testing

Instance Spaces for Performance Evaluation 28 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning Boundary of Algorithm Footprints

For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of

expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through

out-of-sample testing

Instance Spaces for Performance Evaluation 28 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning Boundary of Algorithm Footprints

For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of

expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through

out-of-sample testing

Instance Spaces for Performance Evaluation 28 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

De�ning Boundary of Algorithm Footprints

For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of

expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through

out-of-sample testing

Instance Spaces for Performance Evaluation 28 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Measuring the Area of Algorithm Footprints

Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area

to the total area of the instance space

Area of Algorithm Footprint

Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}

Area(H(S)) =1

2

k

∑j=1

(xjyj+1−yjxj+1)+(xky1−ykx1)

with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)

Instance Spaces for Performance Evaluation 29 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Measuring the Area of Algorithm Footprints

Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area

to the total area of the instance space

Area of Algorithm Footprint

Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}

Area(H(S)) =1

2

k

∑j=1

(xjyj+1−yjxj+1)+(xky1−ykx1)

with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)

Instance Spaces for Performance Evaluation 29 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Measuring the Area of Algorithm Footprints

Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area

to the total area of the instance space

Area of Algorithm Footprint

Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}

Area(H(S)) =1

2

k

∑j=1

(xjyj+1−yjxj+1)+(xky1−ykx1)

with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)

Instance Spaces for Performance Evaluation 29 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Measuring the Area of Algorithm Footprints

Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area

to the total area of the instance space

Area of Algorithm Footprint

Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}

Area(H(S)) =1

2

k

∑j=1

(xjyj+1−yjxj+1)+(xky1−ykx1)

with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)

Instance Spaces for Performance Evaluation 29 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Algorithm Footprint Areas (% of instance space)

Instance Spaces for Performance Evaluation 30 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Learning to predict easy or hard instances for a given ε,β

Instance Spaces for Performance Evaluation 31 / 89

Naive Bayes classi�er inR2 is 85% accurate

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Recommending algorithms

Instance Spaces for Performance Evaluation 32 / 89

Each SVM is 75-90% accurate but fails to identify winner in some regions

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

On which instance classes is each algorithm best suited?

Instance Spaces for Performance Evaluation 33 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Characterising algorithm suitability based on features

Enables us to see what properties (not instance class labels)explain algorithm performance.

Representation of instance space (location of instances)depends on feature set.

We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.

Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.

Instance Spaces for Performance Evaluation 34 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Characterising algorithm suitability based on features

Enables us to see what properties (not instance class labels)explain algorithm performance.

Representation of instance space (location of instances)depends on feature set.

We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.

Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.

Instance Spaces for Performance Evaluation 34 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Characterising algorithm suitability based on features

Enables us to see what properties (not instance class labels)explain algorithm performance.

Representation of instance space (location of instances)depends on feature set.

We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.

Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.

Instance Spaces for Performance Evaluation 34 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Characterising algorithm suitability based on features

Enables us to see what properties (not instance class labels)explain algorithm performance.

Representation of instance space (location of instances)depends on feature set.

We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.

Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.

Instance Spaces for Performance Evaluation 34 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Feature Distributions in Instance Space

Instance Spaces for Performance Evaluation 35 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Instance Spaces for Performance Evaluation 36 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Instance Spaces for Performance Evaluation 37 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Instance Spaces for Performance Evaluation 38 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Instance Spaces for Performance Evaluation 39 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Instance Spaces for Performance Evaluation 40 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Reference

Pisanski, T., & Randi¢, M. �Use of the Szeged index and the revised Szeged index formeasuring network bipartivity�. Disc. Appl. Math, vol. 158, pp. 1936-1944, 2010.

Instance Spaces for Performance Evaluation 41 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Reference

Estrada, E., & Rodríguez-Velázquez, J. A. �Spectral measures of bipartivity in complexnetworks�. Physical Review E, vol. 72(4), 046105, 2005.

Instance Spaces for Performance Evaluation 42 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

References

Balakrishnan, R. �The energy of a graph�. Linear Algebra and its applications, vol.387, pp. 287-295, 2004.

Instance Spaces for Performance Evaluation 43 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

HEA is not best everywhere (NFL) ... why not?

References

Smith-Miles, K. A., Baatar, D., Wreford, B. and Lewis, R., �Towards ObjectiveMeasures of Algorithm Performance across Instance Space�, Computers & OperationsResearch, vol. 45, pp. 12-24, 2014.

Instance Spaces for Performance Evaluation 44 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Where instances are, and are not, and why?

The instances are projected into the 2-d instance space by thelinear transformation[v1v2

]=

[0.559 0.614 0.557−0.702 −0.007 0.712

] densityalgebraic connectivityenergy

The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie

We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected

This is a new method for instance generation, enablingnon-trivial features to be controlled

Instance Spaces for Performance Evaluation 45 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Where instances are, and are not, and why?

The instances are projected into the 2-d instance space by thelinear transformation[v1v2

]=

[0.559 0.614 0.557−0.702 −0.007 0.712

] densityalgebraic connectivityenergy

The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie

We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected

This is a new method for instance generation, enablingnon-trivial features to be controlled

Instance Spaces for Performance Evaluation 45 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Where instances are, and are not, and why?

The instances are projected into the 2-d instance space by thelinear transformation[v1v2

]=

[0.559 0.614 0.557−0.702 −0.007 0.712

] densityalgebraic connectivityenergy

The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie

We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected

This is a new method for instance generation, enablingnon-trivial features to be controlled

Instance Spaces for Performance Evaluation 45 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Where instances are, and are not, and why?

The instances are projected into the 2-d instance space by thelinear transformation[v1v2

]=

[0.559 0.614 0.557−0.702 −0.007 0.712

] densityalgebraic connectivityenergy

The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie

We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected

This is a new method for instance generation, enablingnon-trivial features to be controlled

Instance Spaces for Performance Evaluation 45 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Evolving new instances at target points (n=100)

References

Smith-Miles, K. A. and Bowly, S., �Generating new test instances by evolving in instance space�,Computers & Operations Research, vol. 63, pp. 102-113, 2015.

Instance Spaces for Performance Evaluation 46 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Instance Spaces for Performance Evaluation 47 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Instance Spaces for Performance Evaluation 48 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Black Box Optimisation

We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)

We have no analytical expression of the objective function

We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD

I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+

I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y

Instance Spaces for Performance Evaluation 49 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Black Box Optimisation

We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)

We have no analytical expression of the objective function

We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD

I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+

I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y

Instance Spaces for Performance Evaluation 49 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Black Box Optimisation

We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)

We have no analytical expression of the objective function

We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD

I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+

I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y

Instance Spaces for Performance Evaluation 49 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Black Box Optimisation

We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)

We have no analytical expression of the objective function

We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD

I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+

I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y

Instance Spaces for Performance Evaluation 49 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

What makes BBO hard?

We depend on a sample to provide knowledge of the landscape

Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.

We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard

These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated

Instance Spaces for Performance Evaluation 50 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

What makes BBO hard?

We depend on a sample to provide knowledge of the landscape

Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.

We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard

These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated

Instance Spaces for Performance Evaluation 50 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

What makes BBO hard?

We depend on a sample to provide knowledge of the landscape

Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.

We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard

These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated

Instance Spaces for Performance Evaluation 50 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

What makes BBO hard?

We depend on a sample to provide knowledge of the landscape

Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.

We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard

These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated

Instance Spaces for Performance Evaluation 50 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: instances

The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D

The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)

New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting

in 1440 problem instances.

Instance Spaces for Performance Evaluation 51 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: instances

The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D

The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)

New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting

in 1440 problem instances.

Instance Spaces for Performance Evaluation 51 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: instances

The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D

The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)

New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting

in 1440 problem instances.

Instance Spaces for Performance Evaluation 51 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: features

Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM

Instance Spaces for Performance Evaluation 52 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: features

Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM

Instance Spaces for Performance Evaluation 52 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: features

Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM

Instance Spaces for Performance Evaluation 52 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO meta-data: features

Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM

Instance Spaces for Performance Evaluation 52 / 89

Method Feature Description Transformations

Surrogate models R̄2

LI Fit of linear regression model Unit scaling

R̄2

Q Fit of quadratic regression model Unit scaling

CN Ratio of min to max quadratic coe�. Unit scaling

Signi�cance ξ (D) Signi�cance of D-th order z-score, tanh

ξ (1) Signi�cance of �rst order z-score, tanhCost distribution γ (Y) Skewness of the cost distribution z-score, tanh

κ (Y) Kurtosis of the cost distribution log10, z-scoreH (Y) Entropy of the cost distribution log10, z-score

Fitness sequences Hmax Maximum information content withnearest neighbor sorting

z-score

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

BBO Algorithms

We consider a variety of algorithms selected using ICARUS toavoid overlapping performance:

Reference

Muñoz, M. (2013). Decision support systems for the automatic selection of algorithmsfor continuous optimization problems. PhD thesis, The University of Melbourne.

Instance Spaces for Performance Evaluation 53 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Visualising the instance space

Instance Spaces for Performance Evaluation 54 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Algorithm Footprints

Instance Spaces for Performance Evaluation 55 / 89

Solved if at least 1 of 15 runs comes within 10−8 of yt within budget 104×D function evaluations

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Recommended algorithms

Instance Spaces for Performance Evaluation 56 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Feature Distributions in Instance Space

Instance Spaces for Performance Evaluation 57 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Methodology - Evolving New Instances

We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d

Instance Spaces for Performance Evaluation 58 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Methodology - Evolving New Instances

We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d

Instance Spaces for Performance Evaluation 58 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Methodology - Evolving New Instances

We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d

Instance Spaces for Performance Evaluation 58 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Methodology - Evolving New Instances

We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d

Instance Spaces for Performance Evaluation 58 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Methodology - Evolving New Instances

We use Genetic Programming to evolve a program (function),represented as a binary treeI leaves are variables or constantsI nodes are operations {×,+,−,(.)2,sin, cos, tanh, exp}

Used GPTIPS v1.0 in MATLAB (GP for symbolic regression)I Population size: 400I Number of generations: 100I Tournament size: 7I Elite fraction: 0.1I Target cost:

√ε, where ε is the machine precision

I Number of inputs: D = 2I Max tree depth: 10I Constant range: [−1000,1000]I Tournament selection: lexicographic

Instance Spaces for Performance Evaluation 59 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Methodology - Evolving New Instances

We use Genetic Programming to evolve a program (function),represented as a binary treeI leaves are variables or constantsI nodes are operations {×,+,−,(.)2,sin, cos, tanh, exp}

Used GPTIPS v1.0 in MATLAB (GP for symbolic regression)I Population size: 400I Number of generations: 100I Tournament size: 7I Elite fraction: 0.1I Target cost:

√ε, where ε is the machine precision

I Number of inputs: D = 2I Max tree depth: 10I Constant range: [−1000,1000]I Tournament selection: lexicographic

Instance Spaces for Performance Evaluation 59 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Recreating Existing Functions (S1)

We attempt to generate a known function from COCO byselecting a target point coinciding with a known functionWe perform 5 iterations for each of 50 randomly selectedtarget instancesA few examples ...

Instance Spaces for Performance Evaluation 60 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Recreating Existing Functions - Sphere

Instance Spaces for Performance Evaluation 61 / 89

Sphere - unimodal

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Recreating Existing Functions - Discus

Instance Spaces for Performance Evaluation 62 / 89

Discus - poor conditioning

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Recreating Existing Functions - Katsuura

Instance Spaces for Performance Evaluation 63 / 89

Katsuura - highly multimodal with periodic structure

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Generating Functions across the Instance Space (S2)

Instance Spaces for Performance Evaluation 64 / 89

rugged instances in top left corner

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Generating Functions across the Instance Space (S2)

Instance Spaces for Performance Evaluation 64 / 89

conditioning worsens from left to rightrugged instances in top left corner

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Generating Functions across the Instance Space (S2)

Instance Spaces for Performance Evaluation 64 / 89

conditioning worsens from left to rightrugged instances in top left corner

large plateaus at bottom of space

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

New Test Functions - Examples

Instance Spaces for Performance Evaluation 65 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

How hard are these new test functions?

Comparing BIBOP-CMA-ES on COCO, evolved COCO-like(S1) and evolved diverse (S2) functions

Probability of solving within budget function evaluations isI 0.94 for COCOI 0.67 for S1I 0.61 for S2

Instance Spaces for Performance Evaluation 66 / 89

solid line - FEs to reach experimental optimum

dashed line - FEs to reach within 10−8 of ex-perimental optimum

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

How hard are these new test functions?

Comparing BIBOP-CMA-ES on COCO, evolved COCO-like(S1) and evolved diverse (S2) functions

Probability of solving within budget function evaluations isI 0.94 for COCOI 0.67 for S1I 0.61 for S2

Instance Spaces for Performance Evaluation 66 / 89

solid line - FEs to reach experimental optimum

dashed line - FEs to reach within 10−8 of ex-perimental optimum

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Returning to Machine Learning

The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that

stress the best algorithms?I data quality has also been questioned

Reference

N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.

Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.

Instance Spaces for Performance Evaluation 67 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Returning to Machine Learning

The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that

stress the best algorithms?I data quality has also been questioned

Reference

N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.

Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.

Instance Spaces for Performance Evaluation 67 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Returning to Machine Learning

The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that

stress the best algorithms?I data quality has also been questioned

Reference

N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.

Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.

Instance Spaces for Performance Evaluation 67 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Returning to Machine Learning

The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that

stress the best algorithms?I data quality has also been questioned

Reference

N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.

Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.

Instance Spaces for Performance Evaluation 67 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Instance Spaces for Performance Evaluation 68 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Instance Spaces for Performance Evaluation 69 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Metric Y

For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure

Instance Spaces for Performance Evaluation 70 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Metric Y

For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure

Instance Spaces for Performance Evaluation 70 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Metric Y

For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure

Instance Spaces for Performance Evaluation 70 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Metric Y

For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure

Instance Spaces for Performance Evaluation 70 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Metric Y

For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure

Instance Spaces for Performance Evaluation 70 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Instance Spaces for Performance Evaluation 71 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

What makes classi�cation hard?

Instance Spaces for Performance Evaluation 72 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Instance Spaces for Performance Evaluation 73 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Instance Spaces for Performance Evaluation 73 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Instance Spaces for Performance Evaluation 73 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Instance Spaces for Performance Evaluation 73 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Instance Spaces for Performance Evaluation 73 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Instance Spaces for Performance Evaluation 73 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Selected Features F

The �nal set of 10 features is:

Instance Spaces for Performance Evaluation 74 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Prediction using F

Regression predicts error rate of each algorithm

Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)

SVM used, parameters optimised via 10FCV grid-search

Instance Spaces for Performance Evaluation 75 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Prediction using F

Regression predicts error rate of each algorithm

Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)

SVM used, parameters optimised via 10FCV grid-search

Instance Spaces for Performance Evaluation 75 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Performance Prediction using F

Regression predicts error rate of each algorithm

Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)

SVM used, parameters optimised via 10FCV grid-search

Instance Spaces for Performance Evaluation 75 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A new projection algorithm

PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance

We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)

Instance Spaces for Performance Evaluation 76 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A new projection algorithm

PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance

We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)

Instance Spaces for Performance Evaluation 76 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A new projection algorithm

PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance

We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)

Instance Spaces for Performance Evaluation 76 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Instance Space (feature distribution)

Instance Spaces for Performance Evaluation 77 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Instance Space (performance distribution)

Instance Spaces for Performance Evaluation 78 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Size features

Instance Spaces for Performance Evaluation 79 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Algorithm Footprints (good is ER<20%)

Instance Spaces for Performance Evaluation 80 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Footprint Area Calculations

Instance Spaces for Performance Evaluation 81 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Other views: who is best, where are easy/hard instances?

Instance Spaces for Performance Evaluation 82 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

The need for new test instances

The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)

There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for

which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses

The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)

Instance Spaces for Performance Evaluation 83 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

The need for new test instances

The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)

There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for

which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses

The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)

Instance Spaces for Performance Evaluation 83 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

The need for new test instances

The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)

There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for

which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses

The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)

Instance Spaces for Performance Evaluation 83 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

The need for new test instances

The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)

There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for

which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses

The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)

Instance Spaces for Performance Evaluation 83 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A procedure to generate new instances at target points

We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes

The probability of an observation x being sampled from theGMM is:

pr(x) =κ

∑k=1

φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}

We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised

Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters

Instance Spaces for Performance Evaluation 84 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A procedure to generate new instances at target points

We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes

The probability of an observation x being sampled from theGMM is:

pr(x) =κ

∑k=1

φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}

We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised

Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters

Instance Spaces for Performance Evaluation 84 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A procedure to generate new instances at target points

We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes

The probability of an observation x being sampled from theGMM is:

pr(x) =κ

∑k=1

φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}

We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised

Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters

Instance Spaces for Performance Evaluation 84 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

A procedure to generate new instances at target points

We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes

The probability of an observation x being sampled from theGMM is:

pr(x) =κ

∑k=1

φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}

We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised

Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters

Instance Spaces for Performance Evaluation 84 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Two initial experiments

Reproduce a dataset that lives at the location of Iris (Iris sizeand features)?Generate datasets elsewhere (Iris size, di�erent features)?

Instance Spaces for Performance Evaluation 85 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Two initial experiments

Reproduce a dataset that lives at the location of Iris (Iris sizeand features)?Generate datasets elsewhere (Iris size, di�erent features)?

Instance Spaces for Performance Evaluation 85 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Discussion

Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)

Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)

We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work

There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size

Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.

Instance Spaces for Performance Evaluation 86 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Discussion

Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)

Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)

We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work

There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size

Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.

Instance Spaces for Performance Evaluation 86 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Discussion

Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)

Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)

We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work

There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size

Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.

Instance Spaces for Performance Evaluation 86 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Discussion

Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)

Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)

We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work

There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size

Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.

Instance Spaces for Performance Evaluation 86 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Discussion

Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)

Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)

We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work

There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size

Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.

Instance Spaces for Performance Evaluation 86 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Conclusions

The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either

across the entire instance space, orin a particular region of interest (e.g. real world problems)

I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances

Instance Spaces for Performance Evaluation 87 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Conclusions

The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either

across the entire instance space, orin a particular region of interest (e.g. real world problems)

I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances

Instance Spaces for Performance Evaluation 87 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Conclusions

The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either

across the entire instance space, orin a particular region of interest (e.g. real world problems)

I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances

Instance Spaces for Performance Evaluation 87 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Conclusions

The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either

across the entire instance space, orin a particular region of interest (e.g. real world problems)

I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances

Instance Spaces for Performance Evaluation 87 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Next Steps

We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.

We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis

The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.

We hope to be providing a free lunch for researchers soon!

Instance Spaces for Performance Evaluation 88 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Next Steps

We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.

We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis

The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.

We hope to be providing a free lunch for researchers soon!

Instance Spaces for Performance Evaluation 88 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Next Steps

We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.

We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis

The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.

We hope to be providing a free lunch for researchers soon!

Instance Spaces for Performance Evaluation 88 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Next Steps

We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.

We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis

The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.

We hope to be providing a free lunch for researchers soon!

Instance Spaces for Performance Evaluation 88 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance

space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.

I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.

I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.

I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.

ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance

Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.

I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.

I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.

I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home

Instance Spaces for Performance Evaluation 89 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance

space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.

I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.

I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.

I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.

ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance

Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.

I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.

I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.

I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home

Instance Spaces for Performance Evaluation 89 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance

space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.

I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.

I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.

I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.

ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance

Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.

I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.

I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.

I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home

Instance Spaces for Performance Evaluation 89 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance

space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.

I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.

I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.

I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.

ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance

Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.

I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.

I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.

I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home

Instance Spaces for Performance Evaluation 89 / 89

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance

space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.

I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.

I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.

I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.

ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance

Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.

I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.

I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.

I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home

Instance Spaces for Performance Evaluation 89 / 89