Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy...

46
Dr Athanasios Tsanas (‘Thanasis’) Oxford Centre for Industrial and Applied Mathematics Institute of Biomedical Engineering, Dept. of Engineering Science Sleep and Circadian Neuroscience Institute, Dept. of Medicine

Transcript of Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy...

Page 1: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Dr Athanasios Tsanas (‘Thanasis’)

Oxford Centre for Industrial and Applied Mathematics Institute of Biomedical Engineering, Dept. of Engineering Science Sleep and Circadian Neuroscience Institute, Dept. of Medicine

Page 2: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Tidal Generation: Rolls-Royce EMEC 500kW

Page 3: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Tidal Generation: Rolls-Royce EMEC 500kW

Feature

generation

Feature

selection/transformation Statistical

mapping

1 1.041.021.01 1.03

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Time (s)

Am

plit

ude

bsignal amplitude pitch period

Characterise

signal

Page 4: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Supervised learning setting

Samples feature 1 feature 2 ... feature M

S1 3.1 1.3 0.9

S2 3.7 1.0 1.3

S3 2.9 2.6 0.6

SN 1.7 2.0 0.7

Outcome

type 1

type 2

type 1

type 5 N (

sa

mp

les)

M (features or characteristics)

y

y = f (X)

X

Page 5: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Thin and fat datasets

Page 6: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Curse of dimensionality

Many features Μ Curse of dimensionality

Page 7: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Solution to the problem

Reduce the initial feature space M

into m (or m<<M)

Feature selection

Feature transformation

Page 8: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Feature transformation

Manifold embedding (e.g. PCA)

Not easily interpretable

Page 9: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Feature selection

Minimal feature subset with maximal predictive power

Interpretable

Page 10: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Reminder: supervised learning

Samples feature 1 feature 2 ... feature M

S1 3.1 1.3 0.9

S2 3.7 1.0 1.3

S3 2.9 2.6 0.6

SN 1.7 2.0 0.7

Outcome

type 1

type 2

type 1

type 5 N (

sa

mp

les)

M (features or characteristics)

y

y = f (X)

X

Page 11: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Functional mapping

y = f (X)

Simple approaches (LDA, kNN…)

Support Vector Machines (SVM)

Ensembles

Page 12: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Ensembles

Ensemble = combination of learners

Sequential VS Parallel

Learner1 …LearnerK Final

model

Learner1

…LearnerK

Final model

Page 13: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Ensembles founding question

“Can a set of weak learners create

a single strong learner?”

Page 14: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

14

Bagging Boosting Adaptive boosting (Adaboost) (Schapire and Freund 1997)

Bootstrapping

A brief history of ensembles

Classifier design

Resampling data

Page 15: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Decision trees

Powerful, conceptually simple learners

Page 16: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Decision trees: an example

Page 17: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Tree growing process I

Find best split & partitioning data into two sub-regions (nodes)

Exhaustive search

Determine the pairs of half-planes 𝑅1 𝑗, 𝑠 , 𝑅2 𝑗, 𝑠 :

𝑅1 𝑗, 𝑠 = 𝐗 𝐟𝑗 ≤ 𝑠

𝑅2 𝑗, 𝑠 = 𝐗 𝐟𝑗 > 𝑠

Stop when data in the node is below some threshold (min. node size)

Page 18: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Tree growing process II

Optimal feature 𝐟𝑗 and splitting point 𝑠 according to loss function:

min𝑗,𝑠

min𝑐1

𝑦𝑖 − 𝑐12

𝑥𝑖 𝜖 𝑅1 𝑗,𝑠 +min𝑐2

𝑦𝑖 − 𝑐22

𝑥𝑖 𝜖 𝑅2 𝑗,𝑠

where 𝑐1, 𝑐2 are the mean values of the 𝑦𝑖 in the node when using

the sum of squares as the loss function

𝑐1 = 𝑚𝑒𝑎𝑛 𝑦𝑖 𝐱𝑖 𝜖 𝑅1 𝑗, 𝑠

𝑐2 = 𝑚𝑒𝑎𝑛 𝑦𝑖 𝐱𝑖 𝜖 𝑅2 𝑗, 𝑠

Equations differ depending on the loss function to be optimized

Page 19: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Random Forests (RF)

Parallel combination of decision trees

Use many decision trees (typically 500)

Bagging

Each tree, each node accesses ( 𝑀)

features

Majority voting

Page 20: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Some results

Comparing Logistic Regression (LR) with Random Forests (RF)

Page 21: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Some results

Comparing Support Vector Machines with Random Forests on one

dataset

1 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of features

Mis

cla

ssific

atio

n (

SV

M)

Sonar

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

1 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of features

Mis

cla

ssific

atio

n (

RF

)

Sonar

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

Page 22: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Final remarks on Random Forests

State of the art learner

Very robust to hyper-parameter selection (just use it

off-the-shelf!)

Bonus: provides feature importance scores

Weakness: cannot capture well linear relationships

(works in steps internally)

Powerful in multi-class classification settings

Page 23: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Boosting and adaptive boosting

Boosting: reduce bias by

combining sequential

learners

Adaptive boosting: adjust

the weights of the samples

Page 24: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Sequential ensemble: Adaboost

Train sequential learners

Introduce weights on misclassified samples

Sensitive to noise and outliers in the data

No overfitting claims; late results can overfit data

Can be used with different types of base learners

Convergence if all learners are better than chance

Page 25: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Adaboost

Given 𝐱𝑖 , y𝑖 𝑖=1…𝑁, 𝐱𝑖 𝑀 and y𝑖 +1,−1

Initialize the weights 𝐃1, for each of the samples: 𝐃𝟏 𝑖 =1

𝑁

For 𝑡 = 1…𝑇 sequential weak learners

Find the learner 𝑓 𝐃𝒕 : min 𝐿 y𝑖 , 𝑦 𝑖 𝑖𝐃𝒕

Ensure that the weak learner is better than chance (0.5)

Update weights 𝐃𝒕+𝟏 𝑖 = 𝑓 𝐃𝒕 𝑖 , 𝑦 𝑖 ≠ y𝑖

Free parameter: influence of misclassified samples

Page 26: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Summary of Adaboost

Easy to implement

Generic framework – can work with any

type of learner

Competitive with Random Forests – no

clear winner

Variants for robustness (see Matlab‟s

native function “fitensemble”)

Page 27: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Adaboost versus RF

Instead of bootstrapping (RF), uses sample

reweighting

Instead of majority voting (RF), uses

weighted voting

Explicitly revisits samples misclassified by

previous weak learners

Page 28: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Choosing algorithms

What kind of data do you have? Labeled data? Data

growing daily?

Often the biggest practical challenge is creating or

obtaining enough training data

For many problems and algorithms, hundreds or

thousands of examples from each class are required

to produce a high performance classifier

Number of samples, features, correlations,

interactions… use plots.

No free lunch theorem: no universally best learner!

Page 29: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Thanasis’ rules of thumb

Get to know your data before any processing!

• (1) plot densities and scatter plots – univariate analysis and intuitive

“feel”, perhaps these suggest simple data transformation

• (2) compute correlations and correlation matrix, identify interactions

(e.g. via partial correlations and conditional mutual information)

• (3) Feature selection – a simple guide: if there are low correlations use

LASSO, if there are few (low) interactions use mRMR

• (4) In my experience, SVMs may work better for binary-class

classification problems, and RF may work better for multi-class

classification problems

I would strongly advise testing both SVM and RF (and possibly other

statistical mapping algorithms) in the problems you study

Page 30: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Appendix: food for thought

A Unifying Review of Linear Gaussian Models

http://www.cs.nyu.edu/~roweis/papers/NC110201.pdf

an excellent paper by Sam Roweis and Zoubin Ghahramani unifying linear models

Statistical modelling: the two cultures

http://faculty.smu.edu/tfomby/eco5385/lecture/Breiman%27s%20Two

%20Cultures%20paper.pdf

Great introduction: first principles versus statistical modelling by Leo Breiman

Relations Between Machine Learning Problems

http://videolectures.net/nipsworkshops2011_williamson_machine/

Check out these lectures!

Page 31: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Additional Slides

Page 32: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Feature selection concepts

Relevance Redundancy

Complementarity

Page 33: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

LASSO (L1 regularization)

Start with classical ordinary least squares regression

L1 penalty: sparsity promoting, some coefficients become 0

Page 34: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

RELIEF

Concept: work with nearest neighbours

Nearest hit (NH) and nearest miss (NM)

Great for datasets with interactions but does not account for

information redundancy

Page 35: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

mRMR

minimum Redundancy Maximum Relevance (mRMR)

Generally works very well

Page 36: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

My new algorithm RRCT

RRCT ≝ max𝑖 ∈ 𝑄−𝑆

𝑟IT 𝐟𝑖; 𝐲𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒

−1

𝑆 𝑟IT 𝐟𝑖; 𝐟𝑠s ∈ 𝑆

𝑟𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑐𝑦

+ 𝑠𝑖𝑔𝑛 𝑟p 𝐟𝑖; 𝐲|𝑆 ∙ 𝑠𝑖𝑔𝑛 𝑟p 𝐟𝑖; 𝐲|𝑆 − 𝑟 𝐟𝑖; 𝐲 ∙ 𝑟p,IT𝑐𝑜𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑖𝑡𝑦

https://www.youtube.com/watch?v=LJS7Igvk6ZM

The best result will

come when

everyone in the

group doing what is

best for himself…

and the group.

Page 37: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

𝑟p 𝑋, 𝑌|𝑍 =𝑟 𝑋, 𝑌 − 𝑟 𝑋, 𝑍 ∙ 𝑟 𝑌, 𝑍

𝑟2 𝑋, 𝑍 ∙ 𝑟2 𝑌, 𝑍

My new algorithm RRCT

𝐃 = −0.5 ∙ log

1 − 𝑟12 1 − 𝜌12

2 … 1 − 𝜌1𝛭2

1 − 𝜌122 1 − 𝑟2

2 ⋯ 1 − 𝜌2𝛭2

⋮ ⋮ ⋱ ⋮1 − 𝜌1𝛭

2 1 − 𝜌2𝛭2 … 1 − 𝑟𝑀

2

Relevance & Redundancy trade-off

Complementarity

Page 38: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

My new algorithm RRCT

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Correlation coefficient

Info

rmation c

onte

nt Asymptotically tending

to infinity

Normalizing pdfs of variables

Information theoretic transformation 𝑟𝐼𝑇 = −0.5 ∙ log 1 − 𝑟2

Page 39: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Validation setting

Matching ‘true’ feature subset

o Possible only for artificial datasets

Maximize the out of sample prediction performance

o adds an additional „layer‟: the learner

o potential problem: feature exportability

o BUT… in practice this is really what is of most interest!

Page 40: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Dataset info

Dataset Design matrix Associated task Type

MONK135 124×6 Classification (2 classes) D (6)

Artificial 1 500×150 Classification (2 classes) C(150)

Artificial 2 1000×100 Classification (10 classes) C(100)

Hepatitis 155×19 Classification (2 classes) C (17), D (2)

Parkinson’s35 195×22 Classification (2 classes) C (22)

Sonar35 208×60 Classification (2 classes) C (60)

Wine35 178×13 Classification (3 classes) C (13)

Image segmentation35 2310×19 Classification (7 classes) C (16), D (3)

Cardiotocography35 2129×21 Classification (10 classes) C (14), D (7)

Ovarian cancer 72×592 Classification (2 classes) C (592)

SRBCT 88×2308 Classification (4 classes) C (2308)

Page 41: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

False Discovery Rate (FDR)

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

Features

FD

R

Artificial1

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

1

2

3

4

5

6

7

8

9

10

11

12

Features

FD

R

Artificial2

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

Artificial datasets with known ground truth

Page 42: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Indicative performance

Classifier accuracy as proxy for FS accuracy

1 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of features

Mis

cla

ssific

atio

n (

RF

)

Sonar

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

1 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of features

Mis

cla

ssific

atio

n (

SV

M)

Sonar

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

Page 43: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Indicative performance

Classifier accuracy as proxy for FS accuracy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 190

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of features

Mis

cla

ssific

atio

n (

SV

M)

Image segmentation

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 190

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of features

Mis

cla

ssific

ation (

RF

)

Image segmentation

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

Page 44: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Indicative performance

Classifier accuracy as proxy for FS accuracy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.1

0.2

0.3

0.4

0.5

0.6

Number of features

Mis

cla

ssific

atio

n (

SV

M)

Cardiotocography

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

1 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

Number of features

Mis

cla

ssific

atio

n (

RF

)

Cardiotocography

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

Page 45: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

1 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

Mis

cla

ssific

atio

n (

RF

)

Ovarian cancer

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

1 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

Mis

cla

ssific

atio

n (

RF

)

SRBCT

LASSO

mRMR

mRMRSpearman

GSO

RELIEF

LLBFS

RRCT

Performance in fat datasets

Very challenging setting for many algorithms

Page 46: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Supervised learning setting

f : mapping X: Design matrix y: outcome

Sa

mp

les

f 1 f2 ... feature

M

S1 3.1 1.3 0.9

S2 3.7 1.0 1.3

S3 2.9 2.6 0.6

SN 1.7 2.0 0.7

Samples feature 1 feature 2 ... feature M

S1 3.1 1.3 0.9

S2 3.7 1.0 1.3

S3 2.9 2.6 0.6

SN 1.7 2.0 0.7