Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...

IntroductionMethodology

Case Study: Graph ColouringCase Study: Black-Box Optimisation

Case Study: Machine LearningConclusions

Instance Spaces for Objective Assessment ofAlgorithms and Benchmark Test Suites

Kate Smith-Miles

School of Mathematics and StatisticsUniversity of Melbourne

Instance Spaces for Performance Evaluation 1 / 89

Acknowledgements

This research is funded by ARC Discovery Project grantDP120103678 and ARC Australian Laureate FellowshipFL140100012.

The instance space and evolving instances methodology isjoint work with Dr. Jano van Hemert (University ofEdinburgh), Dr. Davaa Baatar, Dr. Mario Andrés MuñozAcosta, and students Simon Bowly and Thomas Tan

The generalisation to machine learning is joint work with Dr.Laura Villanova, Dr. Mario Andrés Muñoz Acosta, and Dr.Davaa Baatar

Acknowledgements

MotivationAimsFramework

The Importance of Test Instances

Standard practice: use benchmark instances to reportalgorithm strengths (but rarely weaknesses!)

NFL Theorem (Wolpert & Macready, 1997) warns againstexpecting an algorithm to perform well on all instances,regardless of their structure and characteristics.

The properties (or measurable features) of instances mayprovide explanations about an algorithm's behaviour across arange of instances → predictions, insights.

Requires the right kinds of test instances (diverse, challenging,real-world-like, etc.) and suitable features

Reference

Smith-Miles, K. & Lopes, L., �Measuring Instance Di�culty for Combinatorial Optimization Problems�,Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

Reference

Travelling Salesman Problem (TSP) Example

Easy Hard

What makes the TSP easy or hard?

A TSP Formulation (not the only one)

Let Xi ,j = 1 if city i is followed by city j in the tour; 0 otherwise

minimiseN

∑i=1

∑j=1

Di ,jXi ,j

subject to

Xi ,j = 1 ∀j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

TSP is NP-hard, but some instances are easy depending onproperties of the inter-city distance matrix D

minimiseN

∑i=1

∑j=1

Di ,jXi ,j

subject to

Xi ,j = 1 ∀j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

minimiseN

∑i=1

∑j=1

Di ,jXi ,j

subject to

Xi ,j = 1 ∀j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

minimiseN

∑i=1

∑j=1

Di ,jXi ,j

subject to

Xi ,j = 1 ∀j

Xi ,j = 1 ∀i

∑i∈S

∑j∈S

Xi ,j ≤ |S |−1 ∀S 6= {0},S ⊂ {1,2, . . . ,N}

Questions

How do instance features help us understand the strengths andweaknesses of algorithms?

How can we infer and visualise algorithm performance across ahuge �instance space�?

How easy or hard are the benchmark instances in theliterature? How diverse are existing instances?

How can we measure objectively the relative performance ofalgorithms?

How can we generate new test instances to gain insights intoalgorithmic power?

Questions

Develop a new methodology toI visualise �instance space� based on instance featuresI visualise algorithm performance across the instance spaceI de�ne where algorithm performance is expected to be �good�

(called the �algorithm footprint�)I measure the relative size of an algorithm's footprintI evolve new instances at target locations in instance space

Enable objective assessment of algorithmic power.

Enable useful test instances to be generated with controllablecharacteristics to drive insights

Understand and report the boundary of good performance ofan algorithm � essential for good research practice, and toavoid deployment disasters.

Algorithm Selection Problem, Rice (1976)

Applications of Rice's Framework

Rice and colleagues used this approach to predict theperformance of the many methods (A) for numerical solutionof elliptic partial di�erential equations (PDEs).

Reference

Weerana, Rice, et al., �PYTHIA: a knowledge-based system to select scienti�c algorithms�, ACM Trans.on Math. Software, vol. 22(4), pp. 447-468, 1996.

It has also been used for pre-conditioners for linear systemsolvers, and extensively for machine learning (meta-learning).

Reference

Smith-Miles, K. A., �Cross-disciplinary perspectives on meta-learning for algorithm selection�, ACMComputing Surveys, vol. 41(1), 2008.

Reference

Applications to Optimisation

Represents a relatively new direction for the optimisationcommunity (combinatorial, continuous, black-box, etc.)

Much needed, givenI huge range of algorithmsI frequent statements like �currently there is still a strong lack of

. . . understanding of how exactly the relative performance ofdi�erent meta-heuristics depends on instance characteristics.�

Can also resolve longstanding debate about how instancechoice a�ects evaluation of algorithm performance

Reference

Hooker, J.N., �Testing heuristics: We have it all wrong�, Journal of Heuristics, vol. 1, pp. 33-42, 1995.

Reference

Extending Rice's Framework

{I,F,Y,A} is the meta-data from which we learn

STEP 1: Collect meta-data {I,F,Y,A}

What makes the problem hard?

What features capture the di�culty of instances?

Which instances show su�cient diversity in features as well asalgorithm performance?

Which algorithms will show su�cient diversity of performancethat we can learn something about the e�ectiveness of theirunderlying mechanism?

What performance metric(s) is most relevant?

STEP 2: Create instance space

Which dimension reduction method should be used to loseminimal information and create a visualisation that separateseasy and hard instances in interpretable ways?

Which features should be selected?

Can the selected features accurately predict algorithmperformance?

STEP 3: Measure algorithm footprints and gain insightsinto strengths and weaknesses

In which parts of the space is an algorithm expected toperform well or poorly?

How large is its footprint, relative to other algorithms?

Does its footprint overlap real-world instances?

Is it unique anywhere?

STEP 4: Generate new test instances to �ll gaps in theinstance space

Is there a theoretical boundary beyond which instances can'texist?

Where are the benchmark instances located?

How diverse and challenging are they?

How can we set target points in the instance space and evolvenew instances?

Which target points could provide important new informationto in�uence our assessment?

Return to STEP 1 to revisit if features distinguish newinstances

STEP 4: Generate new test instances to �ll gaps in theinstance space

Is there a theoretical boundary beyond which instances can'texist?

Where are the benchmark instances located?

How diverse and challenging are they?

How can we set target points in the instance space and evolvenew instances?

Which target points could provide important new informationto in�uence our assessment?

Return to STEP 1 to revisit if features distinguish newinstances

Meta-DataCreating the instance spaceMeasuring algorithm footprints and gaining insightsEvolving New Instances

Graph Colouring

Given an undirected graph G (V ,E )with |V |= n, colour the vertices suchthat no two vertices connected by anedge share the same colour

Try to �nd the minimum number ofcolours needed to colour the graph(chromatic number)

NP-hard problem → numerousheuristics for large n

Many applications, such as timetablingwhere edges represent con�ictsbetween events

Graph Colouring

What makes graph colouring hard?

In total we have 18 features that describe a graph instanceG (V ,E )

5 features relating to the nodes and edgesI The number of nodes or vertices in a graph: n = |V |I The number of edges in a graph: m = |E |I The density of a graph: the ratio of the number of edges to

the number of possible edges.I Mean node degree: the degree of a node is the number of

connections a node has to other nodes.I SD of node degree: the average node degree and its standard

deviation can give us an idea of how connected a graph is.

Graph features (continued)

8 features related to cycles and paths on the graphI The diameter of a graph: max shortest path distance between

any two nodes.I Average path length: average length of shortest paths for all

node pairs.I The girth of a graph: the length of the shortest cycle.I The clustering coe�cient: a measure of node clustering.I Mean betweenness centrality: average fraction of all shortest

paths connecting all pairs of nodes that pass through a givennode.

I SD of betweenness centrality: with the mean, the SD gives ameasure of how central the nodes are in a graph.

I Szeged index / revised Szeged index: generalisation of Wienernumber to cyclic graphs (correlates with bipartivity)

I Beta: proportion of even closed walks to all closed walks(correlates with bipartivity)

5 features related to the Adjacency and Laplacian matricesI Mean eigenvector centrality: the Perron-Frobenius eigenvector

of the adjacency matrix, averaged across all components.I SD of eigenvector centrality: together with the mean, the

standard deviation of eigenvector centrality gives us a measureof the importance of a node inside a graph.

I Mean spectrum: the mean of absolute values of eigenvalues ofthe adjacency matrix (a.k.a �energy� of the graph).

I SD of the set of absolute values of eigenvalues of theadjacency matrix.

I Algebraic connectivity: the 2nd smallest eigenvalue of theLaplacian matrix, re�ecting how well connected a graph is.Cheeger's constant, another important graph property, isbounded by half the algebraic connectivity.

Graph Colouring Instances

We use a set of 6788 instances from a variety of well-studiedsources, and others we have generated to explore bipartivity

DataSet # instances Description

B 1000 Bipartivity ControlledC1 1000 Culberson: cycle-drivenC2 932 Culberson: geometricC3 1000 Culberson: girth and degree inhibitedC4 1000 Culberson: IID edge probabilitiesC5 1000 Culberson: weight-biasedD 743 DIMACS instancesE 20 Social Network graphsF 80 Sports SchedulingG 13 Exam Timetabling

Graph Colouring Algorithms

We use the same 8 algorithms considered by Lewis et al.I DSATUR: Brelaz's greedy algorithm (exact for bipartite

graphs)I RandGr: Simple greedy �rst-�t colouring of random

permutations of nodesI Bktr: a backtracking version of DSATUR (Culberson)I HillClimb: a hill-climbing improvement on initial DSATUR

solutionI HEA: Hybrid evolutionary algorithmI TabuCol: Tabu search algorithmI PartCol: Like TabuCol, but doesn't restricts to feasible spaceI AntCol: Ant Colony meta-heuristic

Reference

Lewis, R. et al. �A wide-ranging computational comparison of high-performance graphcolouring algorithms�. Computers & Operations Research 39(9), pp. 1933-1950, 2012.

Reference

HEA reported as bestoverall

Creating the Instance Space: Process

Examine correlations to eliminate useless features

Label instances as easy or hard based on algorithm portfolio

Project instances from Rm feature space to 2-d

Use a GA to select optimal subset of m features (for2≤m ≤ 18), that best separates easy/hard instances

98% variation explainedby top 2 axes

Visualising the instance space

De�ning goodness of algorithm performance

Acknowledging the arbitrariness of this de�nition, here wede�ne an algorithm's performance to be �good� if the gapbetween the number of colors its needs to color the graphcompared to the portfolio's winner is less than ε% within a�xed computational budget of 5×1010 constraint checks.

We consider cases where ε = 0 (the algorithm is best) andε = 0.05 (within 5% of the best).

De�ning goodness of algorithm performance

Acknowledging the arbitrariness of this de�nition, here wede�ne an algorithm's performance to be �good� if the gapbetween the number of colors its needs to color the graphcompared to the portfolio's winner is less than ε% within a�xed computational budget of 5×1010 constraint checks.

We consider cases where ε = 0 (the algorithm is best) andε = 0.05 (within 5% of the best).

Footprints with ε = 0 (blue is good)

De�ning di�culty of instances

If less than a given fraction β of the 8 algorithms �nd aninstance easy, then we label the instance as hard for theportfolio of algorithmsI e.g. if β = 0.5 then an instance will be labelled hard if less than

half (only 1, 2 or 3 of the total eight algorithms) �nd it easy

It is important that we understand where good algorithmperformance is uninteresting (if all algorithms �nd theinstances easy) or interesting (if other algorithms struggle)

De�ning di�culty of instances

If less than a given fraction β of the 8 algorithms �nd aninstance easy, then we label the instance as hard for theportfolio of algorithmsI e.g. if β = 0.5 then an instance will be labelled hard if less than

half (only 1, 2 or 3 of the total eight algorithms) �nd it easy

It is important that we understand where good algorithmperformance is uninteresting (if all algorithms �nd theinstances easy) or interesting (if other algorithms struggle)

How many algorithms �nd an instance hard? (α = 0)

De�ning Boundary of Algorithm Footprints

For a given algorithm, we consider points labelled as good, andI remove outliers through clustering,I calculate the convex hull to de�ne a generalised area of

expected good performanceI remove the convex hull of contradicting pointsI validate the accuracy of the remaining �footprint� through

out-of-sample testing

Measuring the Area of Algorithm Footprints

Now we need only to calculate the area de�ning the footprintI our metric of the power of an algorithm is the ratio of this area

to the total area of the instance space

Area of Algorithm Footprint

Let H(S) be the convex hull of a region de�ned by a set ofpoints S = {(xi ,yi )∀i = 1, . . .η}

Area(H(S)) =1

∑j=1

(xjyj+1−yjxj+1)+(xky1−ykx1)

with the subset {(xj ,yj)∀j = 1, . . .k} and k ≤ η de�ning theextreme points of H(S)

Area(H(S)) =1

∑j=1

Area(H(S)) =1

∑j=1

Area(H(S)) =1

∑j=1

Algorithm Footprint Areas (% of instance space)

Learning to predict easy or hard instances for a given ε,β

Naive Bayes classi�er inR2 is 85% accurate

Recommending algorithms

Each SVM is 75-90% accurate but fails to identify winner in some regions

On which instance classes is each algorithm best suited?

Characterising algorithm suitability based on features

Enables us to see what properties (not instance class labels)explain algorithm performance.

Representation of instance space (location of instances)depends on feature set.

We have used a GA to select optimal feature subset tomaximise separability (reduce contradictions) in footprints toenable cleaner calculation of area of footprints.

Considering all 18 features again, some interesting featuredistributions clearly show the properties of instances thatcreate easy or hard instances for each algorithm.

Feature Distributions in Instance Space

Reference

Pisanski, T., & Randi¢, M. �Use of the Szeged index and the revised Szeged index formeasuring network bipartivity�. Disc. Appl. Math, vol. 158, pp. 1936-1944, 2010.

Reference

Estrada, E., & Rodríguez-Velázquez, J. A. �Spectral measures of bipartivity in complexnetworks�. Physical Review E, vol. 72(4), 046105, 2005.

References

Balakrishnan, R. �The energy of a graph�. Linear Algebra and its applications, vol.387, pp. 287-295, 2004.

HEA is not best everywhere (NFL) ... why not?

References

Smith-Miles, K. A., Baatar, D., Wreford, B. and Lewis, R., �Towards ObjectiveMeasures of Algorithm Performance across Instance Space�, Computers & OperationsResearch, vol. 45, pp. 12-24, 2014.

Where instances are, and are not, and why?

The instances are projected into the 2-d instance space by thelinear transformation[v1v2

[0.559 0.614 0.557−0.702 −0.007 0.712

] densityalgebraic connectivityenergy

The upper and lower bounds on the features give us abounding region in the instance space in which a valid instancecould lie

We can select target points within this valid instance space,and use a GA to evolve random graphs so that we minimisetheir distance to the target point when projected

This is a new method for instance generation, enablingnon-trivial features to be controlled

[0.559 0.614 0.557−0.702 −0.007 0.712

Evolving new instances at target points (n=100)

References

Smith-Miles, K. A. and Bowly, S., �Generating new test instances by evolving in instance space�,Computers & Operations Research, vol. 63, pp. 102-113, 2015.

Summary

How do instance features help us understand the strengths andweaknesses of optimisation algorithms?I Provided we have the right feature set, we can create a

topology-preserving instance spaceI The boundary between good and bad performance can be seenI Feature selection methods may improve topology-preservation

How can we infer and visualise algorithm performance across ahuge �instance space�?I PCA has been used to visualise instances in 2-d (or 3-d)I More than 90% of variation in data was preserved, but some

important information (as well as noise) is naturally lostI If the 4th largest eigenvalue is still large, then we lose too much

detail, and other dimension reduction methods are needed

Summary

Summary, continued

How can we objectively measure algorithm performance?I relative size of the area of algorithm footprintsI Convex or concave hulls can be used depending on

generalisation comfort (out-of-sample testing can help)I The area of the footprint depends on the de�nition of �good�

How easy or hard are the benchmark instances?I Randomly generated instances tend to be in the middle

(average features), and are usually not discriminatingI Discriminating instances can be generated intentionally using

GA (�tness is algorithm performance, but this blows up forharder instances)

I Diversity of instances is critical for a meaningful instance space

Alternatively can we generate new test instances at targetpoints in the instance space (more scalable)

Summary, continued

Meta-DataApplying the methodologyVisualising Strengths and WeaknessesEvolving New Test Functions

Black Box Optimisation

We are given only a sample of points from the continuousdecision (input) space, and known objective function values(output space)

We have no analytical expression of the objective function

We need to �nd the best point in the decision space tominimise the objective function with minimal functionevaluationsI Input space, X ⊂ RD

I Output space, Y ⊂ RI Problem dimensionality, D ∈ R+

I Candidate solutions, x ∈XI Candidate cost, y ∈ YI Target solution, xt ∈XI Target cost, yt ∈ Y

What makes BBO hard?

We depend on a sample to provide knowledge of the landscape

Algorithms perform di�erently and can struggle with certainlandscape characteristicsI multimodality, poor-conditioning, deceptiveness, etc.

We use sample-based Exploratory Landscape Analysis (ELA)metrics to learn what makes BBO hard

These features will also form our instance space, and enablealgorithm footprints to be seen, and new test instances to begenerated

BBO meta-data: instances

The noiseless COCO benchmark set is used: 24 basis functionsde�ned within X = [−5,5]D

The functions are divided into �ve categories:I Separable (f1 � f5)I Low or moderately conditioned (f6 � f9)I Unimodal with high conditioning (f10 � f14)I Multimodal with adequate global structure (f15 � f19)I Multimodal with weak global structure (f20 � f24)

New instances are generated by scaling and transforming thebasis functions (translations, rotations, oscillations)I We generated instances [1, . . . ,15] at D = 2,5,10,20, resulting

in 1440 problem instances.

BBO meta-data: features

Sample based on X⊂X , of size D×103 using LHDFeature selection applied to 18 features (chose 9) to maximiseperformance prediction accuracy using SVM

Method Feature Description Transformations

Surrogate models R̄2

LI Fit of linear regression model Unit scaling

Q Fit of quadratic regression model Unit scaling

CN Ratio of min to max quadratic coe�. Unit scaling

Signi�cance ξ (D) Signi�cance of D-th order z-score, tanh

ξ (1) Signi�cance of �rst order z-score, tanhCost distribution γ (Y) Skewness of the cost distribution z-score, tanh

κ (Y) Kurtosis of the cost distribution log10, z-scoreH (Y) Entropy of the cost distribution log10, z-score

Fitness sequences Hmax Maximum information content withnearest neighbor sorting

z-score

BBO Algorithms

We consider a variety of algorithms selected using ICARUS toavoid overlapping performance:

Reference

Muñoz, M. (2013). Decision support systems for the automatic selection of algorithmsfor continuous optimization problems. PhD thesis, The University of Melbourne.

Visualising the instance space

Algorithm Footprints

Solved if at least 1 of 15 runs comes within 10−8 of yt within budget 104×D function evaluations

Recommended algorithms

Feature Distributions in Instance Space

Methodology - Evolving New Instances

We focus on 2-d functions for ease of visualisationWe generate 720 instances ([1, . . . ,30] at D = 2, of the 24basis functions)Sample based on X⊂X , of size 2×104 using LHDEach function summarised as 9-d feature vector then projectedusing PCA to 2-d

We use Genetic Programming to evolve a program (function),represented as a binary treeI leaves are variables or constantsI nodes are operations {×,+,−,(.)2,sin, cos, tanh, exp}

Used GPTIPS v1.0 in MATLAB (GP for symbolic regression)I Population size: 400I Number of generations: 100I Tournament size: 7I Elite fraction: 0.1I Target cost:

√ε, where ε is the machine precision

I Number of inputs: D = 2I Max tree depth: 10I Constant range: [−1000,1000]I Tournament selection: lexicographic

We use Genetic Programming to evolve a program (function),represented as a binary treeI leaves are variables or constantsI nodes are operations {×,+,−,(.)2,sin, cos, tanh, exp}

Used GPTIPS v1.0 in MATLAB (GP for symbolic regression)I Population size: 400I Number of generations: 100I Tournament size: 7I Elite fraction: 0.1I Target cost:

√ε, where ε is the machine precision

I Number of inputs: D = 2I Max tree depth: 10I Constant range: [−1000,1000]I Tournament selection: lexicographic

Recreating Existing Functions (S1)

We attempt to generate a known function from COCO byselecting a target point coinciding with a known functionWe perform 5 iterations for each of 50 randomly selectedtarget instancesA few examples ...

Recreating Existing Functions - Sphere

Sphere - unimodal

Recreating Existing Functions - Discus

Discus - poor conditioning

Recreating Existing Functions - Katsuura

Katsuura - highly multimodal with periodic structure

Generating Functions across the Instance Space (S2)

rugged instances in top left corner

conditioning worsens from left to rightrugged instances in top left corner

large plateaus at bottom of space

New Test Functions - Examples

How hard are these new test functions?

Comparing BIBOP-CMA-ES on COCO, evolved COCO-like(S1) and evolved diverse (S2) functions

Probability of solving within budget function evaluations isI 0.94 for COCOI 0.67 for S1I 0.61 for S2

solid line - FEs to reach experimental optimum

dashed line - FEs to reach within 10−8 of ex-perimental optimum

How hard are these new test functions?

Comparing BIBOP-CMA-ES on COCO, evolved COCO-like(S1) and evolved diverse (S2) functions

Probability of solving within budget function evaluations isI 0.94 for COCOI 0.67 for S1I 0.61 for S2

solid line - FEs to reach experimental optimum

dashed line - FEs to reach within 10−8 of ex-perimental optimum

Collecting Meta-DataCreating the Instance SpaceAlgorithm FootprintsGenerating New Test Instances

Returning to Machine Learning

The UCI repository needs to be re-evaluatedI does it support insights into algorithm performance?I where are the really challenging (not just large) instances that

stress the best algorithms?I data quality has also been questioned

Reference

N. Macià, and E. Bernadó-Mansilla (2014). �Towards UCI+: A mindful repository design�, InformationSciences, vol. 261, pp. 237�262.

Salzberg, S. L. (1997), "On comparing classi�ers: Pitfalls to avoid and a recommended approach." DataMining and knowledge discovery vol. 1, no. 3, pp. 317-328.

Reference

Problem Instances I

We use a total of 236 classi�cation instances (binary andmulticlass) comprisingI 211 UCI instances (University of California Irvine)I 19 KEEL instances (Knowledge Extraction Evolutionary

Learning)I 6 DCol instances (Data Complexity Library)

Instances contain up to 11,055 observations and 1,558attributesI larger ones have been excluded for this study due to

computational budget

Instances with missing values are retained, and also duplicatedwith the missing values estimated with means for the class.

Problem Instances I

Algorithms A

We consider 10 supervised learners:I Naive Bayes (NB)I Linear Discriminant (LD)I Quadratic Discriminant (QD)I Classi�cation and Regression Trees (CART)I J48 Decision Tree (J48)I k-Nearest Neighbor (kNN)I Support Vector Machines with linear (L-SVM), polynomial

(poly-SVM) and radial basis (RB-SVM) kernelsI Random Forests (RF)

R packages used were e1071, MASS, rpart, RWeka, kknn, withdefault parameters

Algorithms A

Performance Metric Y

For each algorithm running on each instances, we record:I error rate (classi�cation accuracy)I precisionI recallI F-measure

Possible Features

We generate a set of 509 candidate features from 8 categories:I simple (dimensionality, types of attributes, missing values,

outliers, class attributes)I statistical (descriptive statistics and canonical correlations,

PCA, etc.)I information theoretic (entropy, mutual information, etc.)I landmarking (performance of simple landmarkers such as NB

or single node trees)I model-based (properties of decision trees such as shape and

size of tree, width and depth)I concept characterization (measures of sparsity of input space

and irregularity in input-output distributions)I complexity (separability, geometry, topology and density of

manifolds)I itemsets & association rules (attribute & class relationships)

Possible Features

What makes classi�cation hard?

Sensitivity Analysis and Feature Selection

We construct perturbed datasets that intentionally increase ordecrease the presence of the challenge

For each instance, 6108 statistical signi�cance test wereconducted (509 x 12) with Bonferroni correctionI setting give 99% chance to correctly discard a feature, and

90% chance to correctly select a feature with a cause-e�ectrelationship to the challenge

Repeat this procedure for 6 small instances (balloons, blogger,breast, breast with 2 attributes, iris, iris with 2 attributes)

For each challenge, we select the features that consistentlycaptured the challenge across the 6 instances

Correlations between features (> 0.7) and between featuresand algorithm performance (< 0.3) were used to eliminatefeatures

Selected Features F

The �nal set of 10 features is:

Performance Prediction using F

Regression predicts error rate of each algorithm

Classi�cation labels each instance as easy or hard for thealgorithm (easy if ER<0.2, else hard)

SVM used, parameters optimised via 10FCV grid-search

A new projection algorithm

PCA maximises variance retained, but this isn't exactly wantwe need to support insights through visualisationWe want a projection that creates linear trends (interpretable)in both the feature distribution and algorithm performance

We solve numerically using BIPOP-CMA-ES (note: PCA givesa locally optimal solution only)

Instance Space (feature distribution)

Instance Space (performance distribution)

Size features

Algorithm Footprints (good is ER<20%)

Footprint Area Calculations

Other views: who is best, where are easy/hard instances?

The need for new test instances

The current instances don't enable us to see much di�erencein algorithm footprints, despite fundamentally di�erentalgorithm mechanisms (e.g. kNN, RF, RBF-SVM)

There are areas of the instance space unexplored, or verysparseI e.g. at [0.744, 2.833] there is only one instance in the area for

which J48 was the only algorithm with ER<20%. More data isneeded to support conclusions about strengths and weaknesses

The boundary of possible instances in the space can beestimated using projections of the min and max features(either theoretical or observed)

A procedure to generate new instances at target points

We use a Gaussian Mixture Model (GMM) to generate adataset with κ classes on q attributes

The probability of an observation x being sampled from theGMM is:

pr(x) =κ

∑k=1

φkN (µk ,Σk) where{φk ∈ R,µk ∈ Rq,Σk ∈ Rq×q}

We tune the parameter vector of the GMM so that thedistance of its feature vector to the target feature vector isminimised

Tuning is a continuous black-box optimisation problem, andwe use BIPOP-CMA-ES to optimise parameters

pr(x) =κ

∑k=1

pr(x) =κ

∑k=1

pr(x) =κ

∑k=1

Two initial experiments

Reproduce a dataset that lives at the location of Iris (Iris sizeand features)?Generate datasets elsewhere (Iris size, di�erent features)?

Two initial experiments

Reproduce a dataset that lives at the location of Iris (Iris sizeand features)?Generate datasets elsewhere (Iris size, di�erent features)?

Discussion

Computational e�ciency issues (is there a better encoding of aproblem instances than GMM?)

Boundary of all instances is not the same as boundary ofinstances of a given size (since size can a�ect feature ranges)

We need some theoretical work on these boundaries like wehave drawn upon in graph theory for other work

There is much value in generating challenging smallerinstances to understand how structural properties a�ectcomplexity, not just size

Instance space depends on chosen features, which wereselected based on current instances. So iteration is required aswe generate new instances.

Discussion

Conclusions

The proposed methodology is a �rst step towards providingresearchers with a tool toI report the strengths and weaknesses of their algorithmsI show the relative power of an algorithm either

across the entire instance space, orin a particular region of interest (e.g. real world problems)

I evaluate the suitability of existing benchmark instancesI evolve new interesting and challenging test instances

Conclusions

Next Steps

We are currently developing the key components of themethodology (evolved instances, feature sets) for a number ofbroad classes of optimization problems, as well as machinelearning, time series forecasting, etc.

We are planning a web resource where researchers candownload instances that span the instance space, upload theiralgorithm performance results, and download footprint metricsand visualisations to support their analysis

The approach generalises to parameter selection withinalgorithms as well, and to choice of formulation.

We hope to be providing a free lunch for researchers soon!

Next Steps

Further ReadingMethodologyI K. Smith-Miles and S. Bowly, �Generating new test instances by evolving in instance

space�, Comp. & Oper. Res., vol. 63, pp. 102-113, 2015.

I K. Smith-Miles et al., �Towards Objective Measures of Algorithm Performance acrossInstance Space�, Comp. & Oper. Res., vol. 45, pp. 12-24, 2014.

I L. Lopes and K. Smith-Miles, �Generating Applicable Synthetic Instances for BranchProblems�, Operations Research, vol. 61, no. 3, pp. 563-577, 2013.

I K. Smith-Miles & L. Lopes, �Measuring Instance Di�culty for CombinatorialOptimization Problems�, Comp. & Oper. Res., vol. 39(5), pp. 875-889, 2012.

I K. Smith-Miles, �Cross-disciplinary perspectives on meta-learning for algorithm selection�,ACM Computing Surveys, vol. 41, no. 1, article 6, 2008.

ApplicationsI Machine Learning: L. Villanova, M. A. Muñoz, D. Baatar, and K. Smith-Miles, �Instance

Spaces for Machine Learning Classi�cation�, Machine Learning, vol. 107, no. 1, pp.109-147, 2018.

I Time Series Forecasting: Kang, Y., Hyndman, R. and Smith-Miles, K., "VisualisingForecasting Algorithm Performance using Time Series Instance Spaces", InternationalJournal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

I Continuous Optimisation: M. A. Muñoz and K. Smith-Miles, "Performance analysis ofcontinuous black-box optimization algorithms via footprints in instance space",Evolutionary Computation, vol, 25, no. 4, pp. 529-554, 2017.

I Travelling Salesman Problem: K. Smith-Miles and J. van Hemert, �Discovering theSuitability of Optimisation Algorithms by Learning from Evolved Instances�, Annals ofMathematics and Arti�cial Intelligence, vol. 61, no. 2, pp. 87-104, 2011.

I and others on Quadratic Assignment Problem, Job Shop Scheduling , Timetabling , GraphColouring : see kate.smithmiles.wixsite.com/home

Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...

Documents

Transcript of Instance Spaces for Objective Assessment of Algorithms and ... · Kate Smith-Miles School of...