Open Machine Learning

65
the open experiment database meta-learning for the masses Joaquin Vanschoren @joavanschoren

description

This talk explores the possibility of turning machine learning research into open science and proposed concrete approaches to achieve this goal

Transcript of Open Machine Learning

Page 1: Open Machine Learning

the open experiment databasemeta-learning for the masses

Joaquin Vanschoren @joavanschoren

Page 2: Open Machine Learning

The Polymath story

Tim Gowers

Page 3: Open Machine Learning

Machine Learningare we doing it right?

Page 4: Open Machine Learning

Computer Science

• The scientific method• Make a hypothesis about the world

• Generate predictions based on this hypothesis

• Design experiments to verify/falsify the prediction

• Predictions verified: hypothesis might be true

• Predictions falsified: hypothesis is wrong

Page 5: Open Machine Learning

Computer Science

• The scientific method (for ML)• Make a hypothesis about (the structure of) given data

• Generate models based on this hypothesis

• Design experiments to measure accuracy of the models

• Good performance: It works (on this data)

• Bad performance: It doesn’t work on this data

• Aggregates (it works 60% of the time) not useful

Page 6: Open Machine Learning

Computer Science

• The scientific method (for ML)• Make a hypothesis about (the structure of) given data

• Generate models based on this hypothesis

• Design experiments to measure accuracy of the models

• Good performance: It works (on this data)

• Bad performance: It doesn’t work on this data

• Aggregates (it works 60% of the time) not usefulHow can data be characterized on

which the algorithm works well?

Page 7: Open Machine Learning

Computer Science

• The scientific method (for ML)• Make a hypothesis about (the structure of) given data

• Generate models based on this hypothesis

• Design experiments to measure accuracy of the models

• Good performance: It works (on this data)

• Bad performance: It doesn’t work on this data

• Aggregates (it works 60% of the time) not usefulHow can data be characterized on

which the algorithm works well? What is the effect of

parameter settings?

Page 8: Open Machine Learning

Meta-Learning

• The science of understanding which algorithms work well on which types of data

• Hard: thorough understanding of data and algorithms

• Requires good data: extensive experimentation

• Why is this separate from other ML research?• A thorough algorithm evaluation = a meta-learning study

• Original authors know algorithms and data best, have large sets of experiments, are (presumably) interested in knowing on which data their algorithms work well (or not)

Page 9: Open Machine Learning

Meta-Learning

With the right tools, can we make everyone a meta-learner?

ML algorithm design meta-learning

Large sets of experiments algorithm selection

algorithm characterizationdata characterization

bias-variance analysis

learning curvesdata insight

algorithm insight

algorithm comparisondatasets

source code

Page 10: Open Machine Learning

Open Machine Learning

Page 11: Open Machine Learning

Open science

World-wide Telescope

Page 12: Open Machine Learning

Open science

Microarray Databases

Page 13: Open Machine Learning

Open science

GenBank

Page 14: Open Machine Learning

Open machine learning?

• We can also be `open’• Simple, common formats to describe experiments, workflows,

algorithms,...

• Platform to share, store, query, interact

• We can go (much) further• Share experiments automatically (open source ML tools)

• Experiment on-the-fly (cheap, no expensive instruments)

• Controlled experimentation (experimentation engine)

Page 15: Open Machine Learning

Formalizing machine learning

• Unique names for algorithms, datasets, evaluation measures, data characterizations,... (ontology)

• Based on DMOP, OntoDM, KDOntology, EXPO,...

• Simple, structured way to describe algorithm setups, workflows and experiment runs

• Detailed enough to reproduce all experiments

Page 16: Open Machine Learning

Run

run

Page 17: Open Machine Learning

Run

run

Execution of a predefined setup

Page 18: Open Machine Learning

Run

run

Execution of a predefined setup

setup

Page 19: Open Machine Learning

Run

setup

run

Page 20: Open Machine Learning

Run

in

setup

data run

Page 21: Open Machine Learning

Run

in

setup

data

machine

run

Page 22: Open Machine Learning

Run

in out

setup

data data

machine

run

Page 23: Open Machine Learning

Run

in out

setup

data data

machine

run

Also: start time author status,...

Page 24: Open Machine Learning

Setup

setup

Page 25: Open Machine Learning

Setup

Plan of what we want to do

setup

Page 26: Open Machine Learning

Setup

Plan of what we want to do

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

Page 27: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part of

Hierarchical

Page 28: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part ofp=!

parameter setting

HierarchicalParameterized

Page 29: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part ofp=!

parameter setting

HierarchicalParameterized

Abstract/concrete

Page 30: Open Machine Learning

Algorithm Setup

algorithmsetup

Page 31: Open Machine Learning

Algorithm Setup

Fully defined algorithm configuration

algorithmsetup

part of

Page 32: Open Machine Learning

Algorithm Setup

Fully defined algorithm configuration

algorithmsetup

p=!parameter settingimplementation

part of

f(x)function

setup

Page 33: Open Machine Learning

Algorithm Setup

Fully defined algorithm configuration

algorithmsetup

p=!parameter settingimplementation

part of

f(x)function

setup

Page 34: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!parameter setting

part of

f(x)function

setupimplementation

Page 35: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!

algorithm

parameter setting

algorithm quality

part of

f(x)function

setup

p=?parameter

f(x)mathematical function

implementation

Page 36: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!

algorithm

parameter setting

algorithm quality

part of

f(x)function

setup

p=?parameter

f(x)mathematical function

implementation

unique names

Page 37: Open Machine Learning

Algorithm Setup

algorithmsetup

p=!

algorithm

parameter setting

algorithm quality

part of

f(x)function

setup

p=?parameter

f(x)mathematical function

implementation

unique names

Roles: learner, base-learner, kernel,...

Page 38: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part of

Page 39: Open Machine Learning

Workflow Setup

setup

algorithmsetup

work!ow

part of

Page 40: Open Machine Learning

Workflow Setup

setup

algorithmsetup

work!ow

part of

source

connection

target

Workflow: components, connections, and parameters (inputs)

Page 41: Open Machine Learning

Workflow Setup

setup

algorithmsetup

work!ow

part of

source

connection

target

Workflow: components, connections, and parameters (inputs)

Also: ports datatype

Page 42: Open Machine Learning

WorkflowExample

Weka.ARFFLoader

p=! location= http://...

2:loadData

Weka.Evaluation

p=! F=10

3:crossValidate

Weka.SMO

p=! C=0.01

4:learner

Weka.RBF

f(x) 5:kernel

p=! G=0.01

p=! S=1

data

data

eval

pred

url evalu-ations

predic-tions

par

logRuns=true logRuns=falselogRuns=true

1:mainFlow

Page 43: Open Machine Learning

WorkflowExample

Weka.ARFFLoader

p=! location= http://...

2:loadData

Weka.Evaluation

p=! F=10

3:crossValidate

Weka.SMO

p=! C=0.01

4:learner

Weka.RBF

f(x) 5:kernel

p=! G=0.01

p=! S=1

data

data

eval

pred

url evalu-ations

predic-tions

par

logRuns=true logRuns=falselogRuns=true

1:mainFlow

86

Evaluations

7 Predictions

data data evalpred

predictions

evaluations

Weka.Instances

Page 44: Open Machine Learning

Setup

setup

f(x)algorithm

setupfunction

setupwork!ow experiment

part of

Page 45: Open Machine Learning

ExperimentSetup

setup

algorithmsetup

work!ow experiment

part of

<X>experiment

variable

Page 46: Open Machine Learning

ExperimentSetup

setup

algorithmsetup

work!ow experiment

part of

<X>experiment

variable

setup

Also: experiment design, description, literature reference, author,...

Page 47: Open Machine Learning

Experiment Setup

Page 48: Open Machine Learning

Experiment SetupVariables: labeled tuples which can be

referenced in setups

Page 49: Open Machine Learning

Run

in out

setup

data data

machine

run

Also: start time author status,...

Page 50: Open Machine Learning

Run

data

dataset evaluation model predictions

Page 51: Open Machine Learning

Run

sourcedata run

dataset evaluation model predictions

Page 52: Open Machine Learning

Run

sourcedata run

dataset evaluation model predictions

data quality

Page 53: Open Machine Learning

EXPMLWeka.ARFFLoader

p=! location= http://...

2:loadData

Weka.Evaluation

p=! F=10

3:crossValidate

Weka.SMO

p=! C=0.01

4:learner

Weka.RBF

f(x) 5:kernel

p=! G=0.01

p=! S=1

data

data

eval

pred

url evalu-ations

predic-tions

par

logRuns=true logRuns=falselogRuns=true

1:mainFlow

Page 54: Open Machine Learning

Demo(preview)

Page 55: Open Machine Learning

Learning curves

0.2$

0.3$

0.4$

0.5$

0.6$

0.7$

0.8$

0.9$

1$

10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$

pred

ic've)accuracy)

percentage)of)original)dataset)size)

RandomForest$C45$Logis<cRegression$RacedIncrementalLogitBoostAStump$NaiveBayes$SVMARBF$

Examples

Page 56: Open Machine Learning

When does one algorithm outperform another?

Examples

Page 57: Open Machine Learning

When does one algorithm outperform another?

Examples

Page 58: Open Machine Learning

Bias-variance profile + effect of dataset size

Examples

Page 59: Open Machine Learning

Bias-variance profile + effect of dataset size

boosting

bagging

Examples

Page 60: Open Machine Learning

Bias-variance profile + effect of dataset size

Examples

Page 61: Open Machine Learning

Taking it furtherSeamless integration

• Webservice for sharing, querying experiments

• Integrate experiment sharing in ML tools (WEKA, KNIME, RapidMiner, R, ....)

• Mapping implementations, evaluation measures,...

• Online platform for custom querying, community interaction

• Semantic wiki: algorithm/data descriptions, rankings, ...

Page 62: Open Machine Learning

Experimentation Engine

• Controlled experimentation (Delve, MLComp)• Download datasets, build training/test sets

• Feed training and test sets to algorithms, retrieve predictions/models

• Run broad set of evaluation measures

• Benchmarking (Cross-Validation), learning curve analysis, bias-variance analysis, workflows(!)

• Compute data properties for new datasets

Page 63: Open Machine Learning

Why would you use it?(seeding)

• Let the system run the experiments for you

• Immediate, highly detailed benchmarks (no repeats)

• Up to date, detailed results (vs. static, aggregated in journals)

• All your results organized online (private?), anytime, anywhere

• Interact with people (weird results?)

• Get credit for all your results (e.g. citations), unexpected results

• Visibility, new collaborations

• Check if your algorithm really the best (e.g. active testing)

• On which datasets does it perform well/badly?

Page 64: Open Machine Learning

Question

Is open

machine learning possible?

Page 65: Open Machine Learning

http://expdb.cs.kuleuven.be

Thanks

Gracias

Xie XieDanke

Dank U

Merci

Efharisto

Dhanyavaad

GrazieSpasiba

Kia oraTesekkurler

Diolch

KöszönömArigato

Hvala

Toda