Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...
Transcript of Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Instance Spaces for Objective Assessment ofAlgorithms and Benchmark Test Suites
Kate Smith-Miles
School of Mathematics and StatisticsUniversity of Melbourne
Instance Spaces for Performance Evaluation 1 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Acknowledgements
This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.
The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan
The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar
Instance Spaces for Performance Evaluation 2 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Acknowledgements
This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.
The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan
The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar
Instance Spaces for Performance Evaluation 2 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Acknowledgements
This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.
The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan
The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar
Instance Spaces for Performance Evaluation 2 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
The Importance of Test Instances
Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)
NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.
The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.
Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features
Reference
Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
Instance Spaces for Performance Evaluation 3 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
The Importance of Test Instances
Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)
NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.
The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.
Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features
Reference
Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
Instance Spaces for Performance Evaluation 3 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
The Importance of Test Instances
Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)
NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.
The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.
Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features
Reference
Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
Instance Spaces for Performance Evaluation 3 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
The Importance of Test Instances
Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)
NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.
The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.
Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features
Reference
Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
Instance Spaces for Performance Evaluation 3 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
The Importance of Test Instances
Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)
NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.
The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.
Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features
Reference
Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
Instance Spaces for Performance Evaluation 3 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Travelling Salesman Problem (TSP) Example
Easy Hard
Instance Spaces for Performance Evaluation 4 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
What makes the TSP easy or hard?
A TSP Formulation (not the only one)
Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise
minimiseN
∑i=1
N
∑j=1
Di ,jXi ,j
subject to
∑i
Xi ,j = 1 ∀j
∑j
Xi ,j = 1 ∀i
∑i∈S
∑j∈S
Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}
TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D
Instance Spaces for Performance Evaluation 5 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
What makes the TSP easy or hard?
A TSP Formulation (not the only one)
Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise
minimiseN
∑i=1
N
∑j=1
Di ,jXi ,j
subject to
∑i
Xi ,j = 1 ∀j
∑j
Xi ,j = 1 ∀i
∑i∈S
∑j∈S
Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}
TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D
Instance Spaces for Performance Evaluation 5 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
What makes the TSP easy or hard?
A TSP Formulation (not the only one)
Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise
minimiseN
∑i=1
N
∑j=1
Di ,jXi ,j
subject to
∑i
Xi ,j = 1 ∀j
∑j
Xi ,j = 1 ∀i
∑i∈S
∑j∈S
Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}
TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D
Instance Spaces for Performance Evaluation 5 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
What makes the TSP easy or hard?
A TSP Formulation (not the only one)
Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise
minimiseN
∑i=1
N
∑j=1
Di ,jXi ,j
subject to
∑i
Xi ,j = 1 ∀j
∑j
Xi ,j = 1 ∀i
∑i∈S
∑j∈S
Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}
TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D
Instance Spaces for Performance Evaluation 5 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Questions
How do instance features help us understand the strengths andweaknesses of algorithms?
How can we infer and visualise algorithm performance across ahuge �instance space�?
How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?
How can we measure objectively the relative performance ofalgorithms?
How can we generate new test instances to gain insights intoalgorithmic power?
Instance Spaces for Performance Evaluation 6 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Questions
How do instance features help us understand the strengths andweaknesses of algorithms?
How can we infer and visualise algorithm performance across ahuge �instance space�?
How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?
How can we measure objectively the relative performance ofalgorithms?
How can we generate new test instances to gain insights intoalgorithmic power?
Instance Spaces for Performance Evaluation 6 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Questions
How do instance features help us understand the strengths andweaknesses of algorithms?
How can we infer and visualise algorithm performance across ahuge �instance space�?
How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?
How can we measure objectively the relative performance ofalgorithms?
How can we generate new test instances to gain insights intoalgorithmic power?
Instance Spaces for Performance Evaluation 6 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Questions
How do instance features help us understand the strengths andweaknesses of algorithms?
How can we infer and visualise algorithm performance across ahuge �instance space�?
How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?
How can we measure objectively the relative performance ofalgorithms?
How can we generate new test instances to gain insights intoalgorithmic power?
Instance Spaces for Performance Evaluation 6 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Questions
How do instance features help us understand the strengths andweaknesses of algorithms?
How can we infer and visualise algorithm performance across ahuge �instance space�?
How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?
How can we measure objectively the relative performance ofalgorithms?
How can we generate new test instances to gain insights intoalgorithmic power?
Instance Spaces for Performance Evaluation 6 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Aims
Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�
(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space
Enable objective assessment of algorithmic power.
Enable useful test instances to be generated with controllablecharacteristics to drive insights
Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.
Instance Spaces for Performance Evaluation 7 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Algorithm Selection Problem, Rice (1976)
Instance Spaces for Performance Evaluation 8 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications of Rice's Framework
Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).
Reference
Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.
It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).
Reference
Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.
Instance Spaces for Performance Evaluation 9 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications of Rice's Framework
Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).
Reference
Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.
It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).
Reference
Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.
Instance Spaces for Performance Evaluation 9 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications of Rice's Framework
Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).
Reference
Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.
It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).
Reference
Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.
Instance Spaces for Performance Evaluation 9 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications of Rice's Framework
Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).
Reference
Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.
It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).
Reference
Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.
Instance Spaces for Performance Evaluation 9 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications to Optimisation
Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)
Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of
. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�
Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance
Reference
Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.
Instance Spaces for Performance Evaluation 10 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications to Optimisation
Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)
Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of
. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�
Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance
Reference
Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.
Instance Spaces for Performance Evaluation 10 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications to Optimisation
Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)
Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of
. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�
Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance
Reference
Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.
Instance Spaces for Performance Evaluation 10 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications to Optimisation
Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)
Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of
. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�
Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance
Reference
Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.
Instance Spaces for Performance Evaluation 10 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications to Optimisation
Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)
Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of
. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�
Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance
Reference
Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.
Instance Spaces for Performance Evaluation 10 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Applications to Optimisation
Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)
Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of
. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�
Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance
Reference
Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.
Instance Spaces for Performance Evaluation 10 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
MotivationAimsFramework
Extending Rice's Framework
Instance Spaces for Performance Evaluation 11 / 89
{I,F,Y,A} is the meta-data from which we learn
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 1: Collect meta-data {I,F,Y,A}
What makes the problem hard?
What features capture the di�culty of instances?
Which instances show su�cient diversity in features as well asalgorithm performance?
Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?
What performance metric(s) is most relevant?
Instance Spaces for Performance Evaluation 12 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 1: Collect meta-data {I,F,Y,A}
What makes the problem hard?
What features capture the di�culty of instances?
Which instances show su�cient diversity in features as well asalgorithm performance?
Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?
What performance metric(s) is most relevant?
Instance Spaces for Performance Evaluation 12 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 1: Collect meta-data {I,F,Y,A}
What makes the problem hard?
What features capture the di�culty of instances?
Which instances show su�cient diversity in features as well asalgorithm performance?
Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?
What performance metric(s) is most relevant?
Instance Spaces for Performance Evaluation 12 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 1: Collect meta-data {I,F,Y,A}
What makes the problem hard?
What features capture the di�culty of instances?
Which instances show su�cient diversity in features as well asalgorithm performance?
Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?
What performance metric(s) is most relevant?
Instance Spaces for Performance Evaluation 12 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 1: Collect meta-data {I,F,Y,A}
What makes the problem hard?
What features capture the di�culty of instances?
Which instances show su�cient diversity in features as well asalgorithm performance?
Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?
What performance metric(s) is most relevant?
Instance Spaces for Performance Evaluation 12 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 2: Create instance space
Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?
Which features should be selected?
Can the selected features accurately predict algorithmperformance?
Instance Spaces for Performance Evaluation 13 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 2: Create instance space
Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?
Which features should be selected?
Can the selected features accurately predict algorithmperformance?
Instance Spaces for Performance Evaluation 13 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 2: Create instance space
Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?
Which features should be selected?
Can the selected features accurately predict algorithmperformance?
Instance Spaces for Performance Evaluation 13 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses
In which parts of the space is an algorithm expected toperform well or poorly?
How large is its footprint, relative to other algorithms?
Does its footprint overlap real-world instances?
Is it unique anywhere?
Instance Spaces for Performance Evaluation 14 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses
In which parts of the space is an algorithm expected toperform well or poorly?
How large is its footprint, relative to other algorithms?
Does its footprint overlap real-world instances?
Is it unique anywhere?
Instance Spaces for Performance Evaluation 14 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses
In which parts of the space is an algorithm expected toperform well or poorly?
How large is its footprint, relative to other algorithms?
Does its footprint overlap real-world instances?
Is it unique anywhere?
Instance Spaces for Performance Evaluation 14 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses
In which parts of the space is an algorithm expected toperform well or poorly?
How large is its footprint, relative to other algorithms?
Does its footprint overlap real-world instances?
Is it unique anywhere?
Instance Spaces for Performance Evaluation 14 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 4: Generate new test instances to �ll gaps in theinstance space
Is there a theoretical boundary beyond which instances can'texist?
Where are the benchmark instances located?
How diverse and challenging are they?
How can we set target points in the instance space and evolvenew instances?
Which target points could provide important new informationto in�uence our assessment?
Return to STEP 1 to revisit if features distinguish newinstances
Instance Spaces for Performance Evaluation 15 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
STEP 4: Generate new test instances to �ll gaps in theinstance space
Is there a theoretical boundary beyond which instances can'texist?
Where are the benchmark instances located?
How diverse and challenging are they?
How can we set target points in the instance space and evolvenew instances?
Which target points could provide important new informationto in�uence our assessment?
Return to STEP 1 to revisit if features distinguish newinstances
Instance Spaces for Performance Evaluation 15 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring
Instance Spaces for Performance Evaluation 16 / 89
Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour
Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)
NP-hard problem → numerousheuristics for large n
Many applications, such as timetablingwhere edges represent con�ictsbetween events
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring
Instance Spaces for Performance Evaluation 16 / 89
Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour
Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)
NP-hard problem → numerousheuristics for large n
Many applications, such as timetablingwhere edges represent con�ictsbetween events
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring
Instance Spaces for Performance Evaluation 16 / 89
Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour
Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)
NP-hard problem → numerousheuristics for large n
Many applications, such as timetablingwhere edges represent con�ictsbetween events
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring
Instance Spaces for Performance Evaluation 16 / 89
Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour
Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)
NP-hard problem → numerousheuristics for large n
Many applications, such as timetablingwhere edges represent con�ictsbetween events
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
What makes graph colouring hard?
In total we have 18 features that describe a graph instanceG (V ,E )
5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to
the number of possible edges.I Mean node degree: the degree of a node is the number of
connections a node has to other nodes.I SD of node degree: the average node degree and its standard
deviation can give us an idea of how connected a graph is.
Instance Spaces for Performance Evaluation 17 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
What makes graph colouring hard?
In total we have 18 features that describe a graph instanceG (V ,E )
5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to
the number of possible edges.I Mean node degree: the degree of a node is the number of
connections a node has to other nodes.I SD of node degree: the average node degree and its standard
deviation can give us an idea of how connected a graph is.
Instance Spaces for Performance Evaluation 17 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
What makes graph colouring hard?
In total we have 18 features that describe a graph instanceG (V ,E )
5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to
the number of possible edges.I Mean node degree: the degree of a node is the number of
connections a node has to other nodes.I SD of node degree: the average node degree and its standard
deviation can give us an idea of how connected a graph is.
Instance Spaces for Performance Evaluation 17 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
What makes graph colouring hard?
In total we have 18 features that describe a graph instanceG (V ,E )
5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to
the number of possible edges.I Mean node degree: the degree of a node is the number of
connections a node has to other nodes.I SD of node degree: the average node degree and its standard
deviation can give us an idea of how connected a graph is.
Instance Spaces for Performance Evaluation 17 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
What makes graph colouring hard?
In total we have 18 features that describe a graph instanceG (V ,E )
5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to
the number of possible edges.I Mean node degree: the degree of a node is the number of
connections a node has to other nodes.I SD of node degree: the average node degree and its standard
deviation can give us an idea of how connected a graph is.
Instance Spaces for Performance Evaluation 17 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between
any two nodes.I Average path length: average length of shortest paths for all
node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest
paths connecting all pairs of nodes that pass through a givennode.
I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.
I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)
I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)
Instance Spaces for Performance Evaluation 18 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector
of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the
standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.
I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).
I SD of the set of absolute values of eigenvalues of theadjacency matrix.
I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.
Instance Spaces for Performance Evaluation 19 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector
of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the
standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.
I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).
I SD of the set of absolute values of eigenvalues of theadjacency matrix.
I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.
Instance Spaces for Performance Evaluation 19 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector
of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the
standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.
I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).
I SD of the set of absolute values of eigenvalues of theadjacency matrix.
I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.
Instance Spaces for Performance Evaluation 19 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector
of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the
standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.
I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).
I SD of the set of absolute values of eigenvalues of theadjacency matrix.
I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.
Instance Spaces for Performance Evaluation 19 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector
of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the
standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.
I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).
I SD of the set of absolute values of eigenvalues of theadjacency matrix.
I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.
Instance Spaces for Performance Evaluation 19 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph features (continued)
5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector
of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the
standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.
I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).
I SD of the set of absolute values of eigenvalues of theadjacency matrix.
I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.
Instance Spaces for Performance Evaluation 19 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Instances
We use a set of 6788 instances from a variety of well-studiedsources, and others we have generated to explore bipartivity
DataSet # instances Description
B 1000 Bipartivity ControlledC1 1000 Culberson: cycle-drivenC2 932 Culberson: geometricC3 1000 Culberson: girth and degree inhibitedC4 1000 Culberson: IID edge probabilitiesC5 1000 Culberson: weight-biasedD 743 DIMACS instancesE 20 Social Network graphsF 80 Sports SchedulingG 13 Exam Timetabling
Instance Spaces for Performance Evaluation 20 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Graph Colouring Algorithms
We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite
graphs)I RandGr: Simple greedy �rst-�t colouring of random
permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR
solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic
Reference
Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.
Instance Spaces for Performance Evaluation 21 / 89
HEA reported as bestoverall
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Creating the Instance Space: Process
Examine correlations to eliminate useless features
Label instances as easy or hard based on algorithm portfolio
Project instances from Rm feature space to 2-d
Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances
Instance Spaces for Performance Evaluation 22 / 89
98% variation explainedby top 2 axes
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Creating the Instance Space: Process
Examine correlations to eliminate useless features
Label instances as easy or hard based on algorithm portfolio
Project instances from Rm feature space to 2-d
Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances
Instance Spaces for Performance Evaluation 22 / 89
98% variation explainedby top 2 axes
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Creating the Instance Space: Process
Examine correlations to eliminate useless features
Label instances as easy or hard based on algorithm portfolio
Project instances from Rm feature space to 2-d
Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances
Instance Spaces for Performance Evaluation 22 / 89
98% variation explainedby top 2 axes
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Creating the Instance Space: Process
Examine correlations to eliminate useless features
Label instances as easy or hard based on algorithm portfolio
Project instances from Rm feature space to 2-d
Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances
Instance Spaces for Performance Evaluation 22 / 89
98% variation explainedby top 2 axes
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Visualising the instance space
Instance Spaces for Performance Evaluation 23 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning goodness of algorithm performance
Acknowledging the arbitrariness of this de�nition, here wede�ne an algorithm's performance to be �good� if the gapbetween the number of colors its needs to color the graphcompared to the portfolio's winner is less than ε% within a�xed computational budget of 5×1010 constraint checks.
We consider cases where ε = 0 (the algorithm is best) andε = 0.05 (within 5% of the best).
Instance Spaces for Performance Evaluation 24 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning goodness of algorithm performance
Acknowledging the arbitrariness of this de�nition, here wede�ne an algorithm's performance to be �good� if the gapbetween the number of colors its needs to color the graphcompared to the portfolio's winner is less than ε% within a�xed computational budget of 5×1010 constraint checks.
We consider cases where ε = 0 (the algorithm is best) andε = 0.05 (within 5% of the best).
Instance Spaces for Performance Evaluation 24 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Footprints with ε = 0 (blue is good)
Instance Spaces for Performance Evaluation 25 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning di�culty of instances
If less than a given fraction β of the 8 algorithms �nd aninstance easy, then we label the instance as hard for theportfolio of algorithmsI e.g. if β = 0.5 then an instance will be labelled hard if less than
half (only 1, 2 or 3 of the total eight algorithms) �nd it easy
It is important that we understand where good algorithmperformance is uninteresting (if all algorithms �nd theinstances easy) or interesting (if other algorithms struggle)
Instance Spaces for Performance Evaluation 26 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning di�culty of instances
If less than a given fraction β of the 8 algorithms �nd aninstance easy, then we label the instance as hard for theportfolio of algorithmsI e.g. if β = 0.5 then an instance will be labelled hard if less than
half (only 1, 2 or 3 of the total eight algorithms) �nd it easy
It is important that we understand where good algorithmperformance is uninteresting (if all algorithms �nd theinstances easy) or interesting (if other algorithms struggle)
Instance Spaces for Performance Evaluation 26 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
How many algorithms �nd an instance hard? (α = 0)
Instance Spaces for Performance Evaluation 27 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning Boundary of Algorithm Footprints
For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of
expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through
out-of-sample testing
Instance Spaces for Performance Evaluation 28 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning Boundary of Algorithm Footprints
For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of
expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through
out-of-sample testing
Instance Spaces for Performance Evaluation 28 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning Boundary of Algorithm Footprints
For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of
expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through
out-of-sample testing
Instance Spaces for Performance Evaluation 28 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning Boundary of Algorithm Footprints
For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of
expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through
out-of-sample testing
Instance Spaces for Performance Evaluation 28 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
De�ning Boundary of Algorithm Footprints
For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of
expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through
out-of-sample testing
Instance Spaces for Performance Evaluation 28 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Measuring the Area of Algorithm Footprints
Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area
to the total area of the instance space
Area of Algorithm Footprint
Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}
Area(H(S)) =1
2
k
∑j=1
(xjyj+1−yjxj+1)+(xky1−ykx1)
with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)
Instance Spaces for Performance Evaluation 29 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Measuring the Area of Algorithm Footprints
Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area
to the total area of the instance space
Area of Algorithm Footprint
Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}
Area(H(S)) =1
2
k
∑j=1
(xjyj+1−yjxj+1)+(xky1−ykx1)
with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)
Instance Spaces for Performance Evaluation 29 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Measuring the Area of Algorithm Footprints
Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area
to the total area of the instance space
Area of Algorithm Footprint
Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}
Area(H(S)) =1
2
k
∑j=1
(xjyj+1−yjxj+1)+(xky1−ykx1)
with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)
Instance Spaces for Performance Evaluation 29 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Measuring the Area of Algorithm Footprints
Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area
to the total area of the instance space
Area of Algorithm Footprint
Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}
Area(H(S)) =1
2
k
∑j=1
(xjyj+1−yjxj+1)+(xky1−ykx1)
with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)
Instance Spaces for Performance Evaluation 29 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Algorithm Footprint Areas (% of instance space)
Instance Spaces for Performance Evaluation 30 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Learning to predict easy or hard instances for a given ε,β
Instance Spaces for Performance Evaluation 31 / 89
Naive Bayes classi�er inR2 is 85% accurate
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Recommending algorithms
Instance Spaces for Performance Evaluation 32 / 89
Each SVM is 75-90% accurate but fails to identify winner in some regions
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
On which instance classes is each algorithm best suited?
Instance Spaces for Performance Evaluation 33 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Characterising algorithm suitability based on features
Enables us to see what properties (not instance class labels)explain algorithm performance.
Representation of instance space (location of instances)depends on feature set.
We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.
Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.
Instance Spaces for Performance Evaluation 34 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Characterising algorithm suitability based on features
Enables us to see what properties (not instance class labels)explain algorithm performance.
Representation of instance space (location of instances)depends on feature set.
We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.
Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.
Instance Spaces for Performance Evaluation 34 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Characterising algorithm suitability based on features
Enables us to see what properties (not instance class labels)explain algorithm performance.
Representation of instance space (location of instances)depends on feature set.
We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.
Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.
Instance Spaces for Performance Evaluation 34 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Characterising algorithm suitability based on features
Enables us to see what properties (not instance class labels)explain algorithm performance.
Representation of instance space (location of instances)depends on feature set.
We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.
Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.
Instance Spaces for Performance Evaluation 34 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Feature Distributions in Instance Space
Instance Spaces for Performance Evaluation 35 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Instance Spaces for Performance Evaluation 36 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Instance Spaces for Performance Evaluation 37 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Instance Spaces for Performance Evaluation 38 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Instance Spaces for Performance Evaluation 39 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Instance Spaces for Performance Evaluation 40 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Reference
Pisanski, T., & Randi¢, M. �Use of the Szeged index and the revised Szeged index formeasuring network bipartivity�. Disc. Appl. Math, vol. 158, pp. 1936-1944, 2010.
Instance Spaces for Performance Evaluation 41 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Reference
Estrada, E., & Rodríguez-Velázquez, J. A. �Spectral measures of bipartivity in complexnetworks�. Physical Review E, vol. 72(4), 046105, 2005.
Instance Spaces for Performance Evaluation 42 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
References
Balakrishnan, R. �The energy of a graph�. Linear Algebra and its applications, vol.387, pp. 287-295, 2004.
Instance Spaces for Performance Evaluation 43 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
HEA is not best everywhere (NFL) ... why not?
References
Smith-Miles, K. A., Baatar, D., Wreford, B. and Lewis, R., �Towards ObjectiveMeasures of Algorithm Performance across Instance Space�, Computers & OperationsResearch, vol. 45, pp. 12-24, 2014.
Instance Spaces for Performance Evaluation 44 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Where instances are, and are not, and why?
The instances are projected into the 2-d instance space by thelinear transformation[v1v2
]=
[0.559 0.614 0.557−0.702 −0.007 0.712
] densityalgebraic connectivityenergy
The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie
We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected
This is a new method for instance generation, enablingnon-trivial features to be controlled
Instance Spaces for Performance Evaluation 45 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Where instances are, and are not, and why?
The instances are projected into the 2-d instance space by thelinear transformation[v1v2
]=
[0.559 0.614 0.557−0.702 −0.007 0.712
] densityalgebraic connectivityenergy
The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie
We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected
This is a new method for instance generation, enablingnon-trivial features to be controlled
Instance Spaces for Performance Evaluation 45 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Where instances are, and are not, and why?
The instances are projected into the 2-d instance space by thelinear transformation[v1v2
]=
[0.559 0.614 0.557−0.702 −0.007 0.712
] densityalgebraic connectivityenergy
The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie
We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected
This is a new method for instance generation, enablingnon-trivial features to be controlled
Instance Spaces for Performance Evaluation 45 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Where instances are, and are not, and why?
The instances are projected into the 2-d instance space by thelinear transformation[v1v2
]=
[0.559 0.614 0.557−0.702 −0.007 0.712
] densityalgebraic connectivityenergy
The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie
We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected
This is a new method for instance generation, enablingnon-trivial features to be controlled
Instance Spaces for Performance Evaluation 45 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Evolving new instances at target points (n=100)
References
Smith-Miles, K. A. and Bowly, S., �Generating new test instances by evolving in instance space�,Computers & Operations Research, vol. 63, pp. 102-113, 2015.
Instance Spaces for Performance Evaluation 46 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary
How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a
topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation
How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some
important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much
detail, and other dimension reduction methods are needed
Instance Spaces for Performance Evaluation 47 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances
Summary, continued
How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on
generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�
How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle
(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using
GA (�tness is algorithm performance, but this blows up forharder instances)
I Diversity of instances is critical for a meaningful instance space
Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)
Instance Spaces for Performance Evaluation 48 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Black Box Optimisation
We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)
We have no analytical expression of the objective function
We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD
I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+
I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y
Instance Spaces for Performance Evaluation 49 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Black Box Optimisation
We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)
We have no analytical expression of the objective function
We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD
I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+
I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y
Instance Spaces for Performance Evaluation 49 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Black Box Optimisation
We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)
We have no analytical expression of the objective function
We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD
I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+
I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y
Instance Spaces for Performance Evaluation 49 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Black Box Optimisation
We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)
We have no analytical expression of the objective function
We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD
I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+
I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y
Instance Spaces for Performance Evaluation 49 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
What makes BBO hard?
We depend on a sample to provide knowledge of the landscape
Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.
We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard
These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated
Instance Spaces for Performance Evaluation 50 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
What makes BBO hard?
We depend on a sample to provide knowledge of the landscape
Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.
We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard
These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated
Instance Spaces for Performance Evaluation 50 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
What makes BBO hard?
We depend on a sample to provide knowledge of the landscape
Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.
We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard
These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated
Instance Spaces for Performance Evaluation 50 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
What makes BBO hard?
We depend on a sample to provide knowledge of the landscape
Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.
We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard
These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated
Instance Spaces for Performance Evaluation 50 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: instances
The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D
The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)
New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting
in 1440 problem instances.
Instance Spaces for Performance Evaluation 51 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: instances
The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D
The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)
New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting
in 1440 problem instances.
Instance Spaces for Performance Evaluation 51 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: instances
The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D
The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)
New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting
in 1440 problem instances.
Instance Spaces for Performance Evaluation 51 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: features
Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM
Instance Spaces for Performance Evaluation 52 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: features
Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM
Instance Spaces for Performance Evaluation 52 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: features
Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM
Instance Spaces for Performance Evaluation 52 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO meta-data: features
Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM
Instance Spaces for Performance Evaluation 52 / 89
Method Feature Description Transformations
Surrogate models R̄2
LI Fit of linear regression model Unit scaling
R̄2
Q Fit of quadratic regression model Unit scaling
CN Ratio of min to max quadratic coe�. Unit scaling
Signi�cance ξ (D) Signi�cance of D-th order z-score, tanh
ξ (1) Signi�cance of �rst order z-score, tanhCost distribution γ (Y) Skewness of the cost distribution z-score, tanh
κ (Y) Kurtosis of the cost distribution log10, z-scoreH (Y) Entropy of the cost distribution log10, z-score
Fitness sequences Hmax Maximum information content withnearest neighbor sorting
z-score
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
BBO Algorithms
We consider a variety of algorithms selected using ICARUS toavoid overlapping performance:
Reference
Muñoz, M. (2013). Decision support systems for the automatic selection of algorithmsfor continuous optimization problems. PhD thesis, The University of Melbourne.
Instance Spaces for Performance Evaluation 53 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Visualising the instance space
Instance Spaces for Performance Evaluation 54 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Algorithm Footprints
Instance Spaces for Performance Evaluation 55 / 89
Solved if at least 1 of 15 runs comes within 10−8 of yt within budget 104×D function evaluations
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Recommended algorithms
Instance Spaces for Performance Evaluation 56 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Feature Distributions in Instance Space
Instance Spaces for Performance Evaluation 57 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Methodology - Evolving New Instances
We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d
Instance Spaces for Performance Evaluation 58 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Methodology - Evolving New Instances
We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d
Instance Spaces for Performance Evaluation 58 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Methodology - Evolving New Instances
We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d
Instance Spaces for Performance Evaluation 58 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Methodology - Evolving New Instances
We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d
Instance Spaces for Performance Evaluation 58 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Methodology - Evolving New Instances
We use Genetic Programming to evolve a program (function),represented as a binary treeI leaves are variables or constantsI nodes are operations {×,+,−,(.)2,sin, cos, tanh, exp}
Used GPTIPS v1.0 in MATLAB (GP for symbolic regression)I Population size: 400I Number of generations: 100I Tournament size: 7I Elite fraction: 0.1I Target cost:
√ε, where ε is the machine precision
I Number of inputs: D = 2I Max tree depth: 10I Constant range: [−1000,1000]I Tournament selection: lexicographic
Instance Spaces for Performance Evaluation 59 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Methodology - Evolving New Instances
We use Genetic Programming to evolve a program (function),represented as a binary treeI leaves are variables or constantsI nodes are operations {×,+,−,(.)2,sin, cos, tanh, exp}
Used GPTIPS v1.0 in MATLAB (GP for symbolic regression)I Population size: 400I Number of generations: 100I Tournament size: 7I Elite fraction: 0.1I Target cost:
√ε, where ε is the machine precision
I Number of inputs: D = 2I Max tree depth: 10I Constant range: [−1000,1000]I Tournament selection: lexicographic
Instance Spaces for Performance Evaluation 59 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Recreating Existing Functions (S1)
We attempt to generate a known function from COCO byselecting a target point coinciding with a known functionWe perform 5 iterations for each of 50 randomly selectedtarget instancesA few examples ...
Instance Spaces for Performance Evaluation 60 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Recreating Existing Functions - Sphere
Instance Spaces for Performance Evaluation 61 / 89
Sphere - unimodal
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Recreating Existing Functions - Discus
Instance Spaces for Performance Evaluation 62 / 89
Discus - poor conditioning
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Recreating Existing Functions - Katsuura
Instance Spaces for Performance Evaluation 63 / 89
Katsuura - highly multimodal with periodic structure
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Generating Functions across the Instance Space (S2)
Instance Spaces for Performance Evaluation 64 / 89
rugged instances in top left corner
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Generating Functions across the Instance Space (S2)
Instance Spaces for Performance Evaluation 64 / 89
conditioning worsens from left to rightrugged instances in top left corner
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
Generating Functions across the Instance Space (S2)
Instance Spaces for Performance Evaluation 64 / 89
conditioning worsens from left to rightrugged instances in top left corner
large plateaus at bottom of space
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
New Test Functions - Examples
Instance Spaces for Performance Evaluation 65 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
How hard are these new test functions?
Comparing BIBOP-CMA-ES on COCO, evolved COCO-like(S1) and evolved diverse (S2) functions
Probability of solving within budget function evaluations isI 0.94 for COCOI 0.67 for S1I 0.61 for S2
Instance Spaces for Performance Evaluation 66 / 89
solid line - FEs to reach experimental optimum
dashed line - FEs to reach within 10−8 of ex-perimental optimum
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions
How hard are these new test functions?
Comparing BIBOP-CMA-ES on COCO, evolved COCO-like(S1) and evolved diverse (S2) functions
Probability of solving within budget function evaluations isI 0.94 for COCOI 0.67 for S1I 0.61 for S2
Instance Spaces for Performance Evaluation 66 / 89
solid line - FEs to reach experimental optimum
dashed line - FEs to reach within 10−8 of ex-perimental optimum
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Returning to Machine Learning
The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that
stress the best algorithms?I data quality has also been questioned
Reference
N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.
Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.
Instance Spaces for Performance Evaluation 67 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Returning to Machine Learning
The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that
stress the best algorithms?I data quality has also been questioned
Reference
N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.
Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.
Instance Spaces for Performance Evaluation 67 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Returning to Machine Learning
The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that
stress the best algorithms?I data quality has also been questioned
Reference
N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.
Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.
Instance Spaces for Performance Evaluation 67 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Returning to Machine Learning
The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that
stress the best algorithms?I data quality has also been questioned
Reference
N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.
Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.
Instance Spaces for Performance Evaluation 67 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Problem Instances I
We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary
Learning)I 6 DCol instances (Data Complexity Library)
Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to
computational budget
Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.
Instance Spaces for Performance Evaluation 68 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithms A
We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial
(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)
R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters
Instance Spaces for Performance Evaluation 69 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Metric Y
For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure
Instance Spaces for Performance Evaluation 70 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Metric Y
For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure
Instance Spaces for Performance Evaluation 70 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Metric Y
For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure
Instance Spaces for Performance Evaluation 70 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Metric Y
For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure
Instance Spaces for Performance Evaluation 70 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Metric Y
For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure
Instance Spaces for Performance Evaluation 70 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Possible Features
We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,
outliers, class attributes)I statistical (descriptive statistics and canonical correlations,
PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB
or single node trees)I model-based (properties of decision trees such as shape and
size of tree, width and depth)I concept characterization (measures of sparsity of input space
and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of
manifolds)I itemsets & association rules (attribute & class relationships)
Instance Spaces for Performance Evaluation 71 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
What makes classi�cation hard?
Instance Spaces for Performance Evaluation 72 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Sensitivity Analysis and Feature Selection
We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge
For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and
90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge
Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)
For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances
Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures
Instance Spaces for Performance Evaluation 73 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Sensitivity Analysis and Feature Selection
We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge
For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and
90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge
Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)
For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances
Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures
Instance Spaces for Performance Evaluation 73 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Sensitivity Analysis and Feature Selection
We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge
For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and
90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge
Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)
For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances
Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures
Instance Spaces for Performance Evaluation 73 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Sensitivity Analysis and Feature Selection
We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge
For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and
90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge
Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)
For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances
Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures
Instance Spaces for Performance Evaluation 73 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Sensitivity Analysis and Feature Selection
We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge
For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and
90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge
Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)
For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances
Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures
Instance Spaces for Performance Evaluation 73 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Sensitivity Analysis and Feature Selection
We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge
For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and
90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge
Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)
For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances
Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures
Instance Spaces for Performance Evaluation 73 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Selected Features F
The �nal set of 10 features is:
Instance Spaces for Performance Evaluation 74 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Prediction using F
Regression predicts error rate of each algorithm
Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)
SVM used, parameters optimised via 10FCV grid-search
Instance Spaces for Performance Evaluation 75 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Prediction using F
Regression predicts error rate of each algorithm
Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)
SVM used, parameters optimised via 10FCV grid-search
Instance Spaces for Performance Evaluation 75 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Performance Prediction using F
Regression predicts error rate of each algorithm
Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)
SVM used, parameters optimised via 10FCV grid-search
Instance Spaces for Performance Evaluation 75 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A new projection algorithm
PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance
We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)
Instance Spaces for Performance Evaluation 76 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A new projection algorithm
PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance
We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)
Instance Spaces for Performance Evaluation 76 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A new projection algorithm
PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance
We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)
Instance Spaces for Performance Evaluation 76 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Instance Space (feature distribution)
Instance Spaces for Performance Evaluation 77 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Instance Space (performance distribution)
Instance Spaces for Performance Evaluation 78 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Size features
Instance Spaces for Performance Evaluation 79 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Algorithm Footprints (good is ER<20%)
Instance Spaces for Performance Evaluation 80 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Footprint Area Calculations
Instance Spaces for Performance Evaluation 81 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Other views: who is best, where are easy/hard instances?
Instance Spaces for Performance Evaluation 82 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
The need for new test instances
The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)
There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for
which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses
The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)
Instance Spaces for Performance Evaluation 83 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
The need for new test instances
The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)
There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for
which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses
The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)
Instance Spaces for Performance Evaluation 83 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
The need for new test instances
The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)
There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for
which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses
The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)
Instance Spaces for Performance Evaluation 83 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
The need for new test instances
The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)
There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for
which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses
The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)
Instance Spaces for Performance Evaluation 83 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A procedure to generate new instances at target points
We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes
The probability of an observation x being sampled from theGMM is:
pr(x) =κ
∑k=1
φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}
We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised
Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters
Instance Spaces for Performance Evaluation 84 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A procedure to generate new instances at target points
We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes
The probability of an observation x being sampled from theGMM is:
pr(x) =κ
∑k=1
φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}
We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised
Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters
Instance Spaces for Performance Evaluation 84 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A procedure to generate new instances at target points
We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes
The probability of an observation x being sampled from theGMM is:
pr(x) =κ
∑k=1
φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}
We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised
Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters
Instance Spaces for Performance Evaluation 84 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
A procedure to generate new instances at target points
We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes
The probability of an observation x being sampled from theGMM is:
pr(x) =κ
∑k=1
φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}
We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised
Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters
Instance Spaces for Performance Evaluation 84 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Two initial experiments
Reproduce a dataset that lives at the location of Iris (Iris sizeand features)?Generate datasets elsewhere (Iris size, di�erent features)?
Instance Spaces for Performance Evaluation 85 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Two initial experiments
Reproduce a dataset that lives at the location of Iris (Iris sizeand features)?Generate datasets elsewhere (Iris size, di�erent features)?
Instance Spaces for Performance Evaluation 85 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Discussion
Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)
Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)
We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work
There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size
Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.
Instance Spaces for Performance Evaluation 86 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Discussion
Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)
Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)
We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work
There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size
Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.
Instance Spaces for Performance Evaluation 86 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Discussion
Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)
Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)
We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work
There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size
Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.
Instance Spaces for Performance Evaluation 86 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Discussion
Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)
Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)
We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work
There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size
Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.
Instance Spaces for Performance Evaluation 86 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances
Discussion
Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)
Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)
We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work
There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size
Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.
Instance Spaces for Performance Evaluation 86 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Conclusions
The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either
across the entire instance space, orin a particular region of interest (e.g. real world problems)
I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances
Instance Spaces for Performance Evaluation 87 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Conclusions
The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either
across the entire instance space, orin a particular region of interest (e.g. real world problems)
I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances
Instance Spaces for Performance Evaluation 87 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Conclusions
The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either
across the entire instance space, orin a particular region of interest (e.g. real world problems)
I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances
Instance Spaces for Performance Evaluation 87 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Conclusions
The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either
across the entire instance space, orin a particular region of interest (e.g. real world problems)
I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances
Instance Spaces for Performance Evaluation 87 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Next Steps
We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.
We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis
The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.
We hope to be providing a free lunch for researchers soon!
Instance Spaces for Performance Evaluation 88 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Next Steps
We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.
We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis
The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.
We hope to be providing a free lunch for researchers soon!
Instance Spaces for Performance Evaluation 88 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Next Steps
We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.
We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis
The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.
We hope to be providing a free lunch for researchers soon!
Instance Spaces for Performance Evaluation 88 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Next Steps
We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.
We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis
The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.
We hope to be providing a free lunch for researchers soon!
Instance Spaces for Performance Evaluation 88 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance
space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.
I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.
I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.
I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.
ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance
Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.
I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.
I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.
I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.
I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home
Instance Spaces for Performance Evaluation 89 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance
space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.
I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.
I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.
I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.
ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance
Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.
I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.
I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.
I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.
I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home
Instance Spaces for Performance Evaluation 89 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance
space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.
I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.
I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.
I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.
ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance
Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.
I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.
I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.
I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.
I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home
Instance Spaces for Performance Evaluation 89 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance
space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.
I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.
I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.
I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.
ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance
Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.
I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.
I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.
I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.
I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home
Instance Spaces for Performance Evaluation 89 / 89
IntroductionMethodology
Case Study: Graph ColouringCase Study: Black-Box Optimisation
Case Study: Machine LearningConclusions
Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance
space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.
I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.
I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.
I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.
I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.
ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance
Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.
I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.
I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.
I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.
I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home
Instance Spaces for Performance Evaluation 89 / 89