Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena...

41
Feature selection, SVM- based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena...

Page 1: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Feature selection, SVM-based classification and application to

mass spectrometry data analysis

Elena Marchiori

Department of Computer Science

Vrije Universiteit Amsterdam

Page 2: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Overview

• Support Vector Machines

• Variable selection

• Application in Bioinformatics

Page 3: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Support Vector Machines

• Advantages:– maximize the margin between two classes in the

feature space characterized by a kernel function– are robust with respect to high input dimension

• Disadvantages:– difficult to incorporate background knowledge– Sensitive to outliers

Page 4: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Linear Separators

Page 5: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Hyperplane Classifiers

11

11

ii

ii

yforbxw

yforbxw

Page 6: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM

• To construct optimal hyperplane

– Minimize

– Subject to

• Constrained Optimization problem with Lagrangian

libxwy

ww

ii ,...,1,1))((

21

)(2

l

iiii bwxywbwL

1

2

21 )1))(((),,(

0),,(0),,(

bwL

wbwL

b

Page 7: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM– Primal variables vanish

• KKT condition

• Support Vectors whose is nonzero

– Optimization problem

• Maximize

• Subject to

• Decision function

ii

l

iii

l

ii xywya

110

libwxy iii ,...,1,0]1))(([

i

l

iiii

l

iiii

l

i

l

jijijijii

bxxyxf

yandli

xxyyW

1

1

1 1,21

))(sgn()(

0,,...,1,0

)()(

Page 8: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM: separable classes

ρ

Support vector

margin

Optimal hyper-plane

Support vectors uniquely characterize optimal hyper-plane

Page 9: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM and outliers

outlier

Page 10: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Soft Margin Classification • What if the training set is not linearly separable?

• Slack variables ξi can be added to allow misclassification of difficult or noisy examples.

ξjξk

Page 11: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Weakening the constraints

Weakening the constraints

Allow that the objects do not strictly obey the constraints

Introduce ‘slack’-variables

Page 12: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVC with slacks

The optimization problem changes into:

Page 13: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Tradeoff parameter C

Notice that the tradeoff parameter C has to be defined beforehand.

It weighs the contributions between the training error and the structural error.

Its value is often optimized using cross-validation.

Page 14: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Influence of C

Erroneous objects can still have a (large) influence on the solution

Page 15: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Classifying new examples

• Once the parameters (*, b*) are found by solving the required quadratic optimisation on the training set of points, the SVM is ready to be used for classifying new points.

• Given new point x, its class membership is

sign[f(x, *, b*)], where

***

1

***** ),,( bybybbfSVi iii

N

i iii xxxxxwx

Data enters only in the form of dot products!

Page 16: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Non-linear SVMs• Datasets that are linearly separable with some noise work

out great:

• But what are we going to do if the dataset is just too hard?

• How about… mapping data to a higher-dimensional space:

0

x2

x

0 x

0 x

Page 17: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Non-linear SVMs: Feature Spaces

• Map the original feature space to some higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

Page 18: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

The “Kernel Trick”• The linear classifier relies on inner product between vectors K(xi,xj)=xi

Txj

• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes:

K(xi,xj)= φ(xi) Tφ(xj)

• A kernel function is some function that corresponds to an inner product in some expanded feature space.

• Example:

2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2

,

Need to show that K(xi,xj)= φ(xi) Tφ(xj):

K(xi,xj)=(1 + xiTxj)2

,= 1+ xi12xj1

2 + 2 xi1xj1 xi2xj2+ xi2

2xj22 + 2xi1xj1 + 2xi2xj2=

= [1 xi12 √2 xi1xi2 xi2

2 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj2

2 √2xj1 √2xj2] =

= φ(xi) Tφ(xj), where φ(x) = [1 x1

2 √2 x1x2 x22 √2x1 √2x2]

Page 19: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Examples of kernels

• Example1: 2D input space, 3D feature space

• Example2:

in this case the dimension of is infinite• Note: Not every function is a proper kernel. There is a

theorem called Mercer Theorem that characterises proper kernels

• To test a new input x when working with kernels

2

22

21

21

)(),(2)( jijiK

x

xx

x

xxxxx

)),(()(1

bKysignxfn

i iii xx

}2/||||exp{),( 22 jijiK xxxx

Page 20: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.
Page 21: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM applications

• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained increasing popularity in late 1990s.

• SVMs are currently among the best performers for a number of classification tasks ranging from text to genomic data.

• SVM techniques have been extended to a number of tasks such as regression [Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.

• Most popular optimization algorithms for SVMs are SMO [Platt ’99] and SVMlight

[Joachims’ 99], both use decomposition to hill-climb over a subset of αi’s at a time.

• Tuning SVMs remains a black art: selecting a specific kernel and parameters is usually done in a try-and-see manner.

Page 22: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Variable Selection

• Select a subset of “relevant” input variables • Advantages:

– it is cheaper to measure less variables– the resulting classifier is simpler and potentially

faster – prediction accuracy may improve by discarding

irrelevant variables – identifying relevant variables gives more insight

into the nature of the corresponding classification problem (biomarker detection)

Page 23: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Approaches

• Wrapper– feature selection takes into account the contribution to

the performance of a given type of classifier

• Filter– feature selection is based on an evaluation criterion

for quantifying how well feature (subsets) discriminate the two classes

• Embedded– feature selection is part of the training procedure of a

classifier (e.g. decision trees)

Page 24: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM-RFE: wrapper

• Recursive Feature Elimination:– Train linear SVM -> linear decision function– Use absolute value of variable weights to rank

variables– Remove half variables with lower rank– Repeat above steps (train, rank, remove) on data

restricted to variables not removed

• Output: subset of variables

Page 25: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM-RFE• Linear binary classifier decision function

• Recursive Feature Elimination (SVM-RFE) – at each iteration:

1) eliminate threshold% of variables with lower score2) recompute scores of remaining variables

bxwxxf i

N

iiN

11 ),...,(

ii xw variableof score ||

Page 26: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

SVM-RFEI. Guyon et al.,Machine Learning,46,389-422, 2002

Page 27: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

RELIEF: filter• Idea: relevant variables make nearest

examples of same class closer and make nearest examples of opposite classes more far apart.

• Algorithm RELIEF:1. Initialize weights of variables to zero.2. For all examples in training set:

– find nearest example from same (hit) and opposite class (miss)

– update weight of variable by adding abs(example - miss) -abs(example - hit)

3. Rank variables using weights

Page 28: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Application in Bioinformatics

Biomarker detection with Mass Spectrometric data of mixed quality

Page 29: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

What does a mass spectrometer do?

1. It measures mass better than any other technique.

2. It can give information about chemical structures.

What are mass measurements good for?

To identify, verify, and quantitate: metabolites, recombinant proteins, proteins isolated from natural sources, oligonucleotides, drug candidates, peptides, synthetic organic chemicals, polymers

Slides from University of California San Francisco

Page 30: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Pharmaceutical analysisBioavailability studiesDrug metabolism studies, pharmacokineticsCharacterization of potential drugsDrug degradation product analysisScreening of drug candidatesIdentifying drug targets

Biomolecule characterizationProteins and peptidesOligonucleotides

Environmental analysisPesticides on foodsSoil and groundwater contamination

Forensic analysis/clinical

Applications of Mass Spectrometry

Slides from University of California San Francisco

Page 31: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Inlet

Ionization

Mass Analyzer

Mass Sorting (filtering)

Ion Detector

Detection

Ion Source

• Solid• Liquid• Vapor

Detect ionsForm ions

(charged molecules)Sort Ions by Mass (m/z)

1330 1340 1350

100

75

50

25

0

Mass Spectrum

Summary: acquiring a mass spectrum

Slides from University of California San Francisco

Page 32: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

h Laser

1. Sample is mixed with matrix (X) and dried on plate.

2. Laser flash ionizes matrix molecules.

3. Sample molecules (M) are ionized by proton transfer: XH+ + M MH+ + X.

MH+

MALDI: Matrix Assisted Laser Desorption Ionization

+/- 20 kV Grid (0 V)

Sample plate

Slides from University of California San Francisco

Page 33: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Time-of-flight (TOF) Mass Analyzer

+

+

+

+

Source Drift region (flight tube)

dete

ctor

V

• Measures the time for ions to reach the detector.

• Small ions reach the detector before large ones.

Slides adapted from University of California San Francisco

Page 34: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

The mass spectrum shows the results

Re

lativ

e A

bun

dan

ce

Mass (m/z)

0

10000

20000

30000

40000

50000 100000 150000 200000

MH+

(M+2H)2+

(M+3H)3+

MALDI TOF spectrum of IgG

Slides from University of California San Francisco

Page 35: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Dataset

• MALDI-TOF data.

• samples of mixed quality due to different storage time.

• controlled molecule spiking used to generate two classes.

Page 36: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Profiles of one spiked sample

Page 37: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Comparison of ML algorithms

• Feature selection + classification:1. RFE+SVM

2. RFE+kNN

3. RELIEF+SVM

4. RELIEF+kNN

Page 38: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

LOOCV results

• Misclassified samples are of bad quality (higher storage time)

• The selected features do not always correspond to m/z of spiked molecules

Page 39: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

LOOCV results • The variables selected by RELIEF correspond

to the spiked peptides• RFE is less robust than RELIEF over LOOCV

runs and selects also “irrelevant” variables

RELIEF-based feature selection yields results which are better interpretable than RFE

Page 40: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

BUT...

• RFE+SVM yields superior loocv accuracy than RELIEF+SVM

• RFE+kNN superior accuracy than RELIEF+kNN

(perfect LOOCV classification for RFE+1NN)

RFE-based feature selection yields better predictive performance than RELIEF

Page 41: Feature selection, SVM-based classification and application to mass spectrometry data analysis Elena Marchiori Department of Computer Science Vrije Universiteit.

Conclusion• Better predictive performance does not

necessarily correspond to stability and interpretability of results

• Open issues: – how to measure reliability of potential

biomarkers identified by feature selection algorithms?

– Is stability of feature selection algorithms more important than predictive accuracy?