1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka,...
-
Upload
zoe-bradley -
Category
Documents
-
view
219 -
download
3
Transcript of 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka,...
![Page 1: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/1.jpg)
1
Classification and Feature Selection Algorithms for Multi-class
CGH data
Jun Liu, Sanjay Ranka, Tamer Kahveci
http://www.cise.ufl.edu
![Page 2: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/2.jpg)
2
Gene copy number
• The number of copies of genes can vary from person to person.– ~0.4% of the gene copy
numbers are different for pairs of people.
• Variations in copy numbers can alter resistance to disease– EGFR copy number can be
higher than normal in Non-small cell lung cancer. Healthy Cancer
Lung images (ALA)
![Page 3: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/3.jpg)
3
Comparative Genomic Hybridization (CGH)
![Page 4: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/4.jpg)
4
Raw and smoothed CGH data
![Page 5: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/5.jpg)
5
Example CGH dataset
862 genomic intervals in the Progenetix database
![Page 6: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/6.jpg)
6
Problem description
•Given a new sample, which class does this sample belong to?
•Which features should we use to make this decision?
![Page 7: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/7.jpg)
7
Outline
• Support Vector Machine (SVM)
• SVM for CGH data
• Maximum Influence Feature Selection algorithm
• Results
![Page 8: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/8.jpg)
8
SVM in a nutshell
Support Vector Machine (SVM)SVM for CGH dataMaximum Influence Feature Selection algorithmResults
![Page 9: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/9.jpg)
9
Classification with SVM• Consider a two-class,
linearly separable classification problem
• Many decision boundaries!
• The decision boundary should be as far away from the data of both classes as possible– We should maximize the
margin, mClass 1
Class 2
m
![Page 10: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/10.jpg)
10
• Let {x1, ..., xn} be our data set and let yi {1,-1} be the class label of xi
• Maximize J over αi
SVM Formulation
Similarity between xi and xj
Similarity between xi and xj
•The decision boundary can be constructed as
![Page 11: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/11.jpg)
11
SVM for CGH data
Support Vector Machine (SVM)SVM for CGH dataMaximum Influence Feature Selection algorithmResults
![Page 12: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/12.jpg)
12
Pairwise similarity measures
• Raw measure– Count the number of genomic intervals that
both samples have gain (or loss) at that position.
Raw = 3
![Page 13: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/13.jpg)
13
SVM based on Raw kernel
• Using SVM with the Raw kernel amounts to solving the following quadratic program
• The resulting decision function is
Maximize J over αi :
Use Raw kernel to replace
Use Raw kernel to replace Is this cool?
![Page 14: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/14.jpg)
14
Is Raw kernel valid?
• Not all similarity function can serve as kernel. This requires the underlying kernel matrix M is “positive semi-definite”.
• M is positive semi-definite if for all vectors v, vTMv ≥ 0
![Page 15: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/15.jpg)
15
• Proof: define a function Φ() where – Φ: a {1, 0, -1}m b {1, 0}2m,where
• Φ(gain) = Φ(1) = 01• Φ(no-change) = Φ(0) = 00• Φ(loss) = Φ(-1) = 10
– Raw(X, Y) =Φ(X)T Φ(Y)
Is Raw kernel valid?
X = 0 1 1 0 1 -1Y = 0 1 0 -1 -1 -1 * *
Φ(X) = 0 0 0 1 0 1 0 0 0 1 1 0Φ(Y) = 0 0 0 1 0 0 1 0 1 0 1 0 * *
Raw(X, Y) = 2 Φ(X)T Φ(Y) = 2
![Page 16: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/16.jpg)
16
Raw Kernel is valid!• Raw kernel can be written as Raw(X, Y) =Φ(X)T Φ(Y)
• Define a 2m by n matrix
• Therefore,
Let M denote theKernel matrix of Raw
![Page 17: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/17.jpg)
17
MIFS algorithm
Support Vector Machine (SVM)SVM for CGH dataMaximum Influence Feature Selection algorithmResults
![Page 18: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/18.jpg)
18
MIFS for multi-class data
One-versus-all SVM
[1, 3, 8] [1, 2, 31][3, 4, 12][5, 8, 15]Sort ranks of features
[8, 1, 3] [2, 31, 1][12, 4, 3]Ranks of features [5, 15, 8]
Feature 1 Feature 2 Feature 3 Feature 4
Sort features [1, 3, 8][1, 2, 31] [3, 4, 12] [5, 8, 15]
Most promising feature. Insert Feature 4 into feature set
1. Feature 82. Feature 43. Feature 94. Feature 335. Feature 26. Feature 487. Feature 278. Feature 1 …
Con
trib
utio
n
High
Low
![Page 19: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/19.jpg)
19
Results
Support Vector Machine (SVM)SVM for CGH dataMaximum Influence Feature Selection algorithmResults
![Page 20: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/20.jpg)
20
Dataset Details
Data taken fromProgenetix database
![Page 21: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/21.jpg)
21
Datasets
Similarity level
#cancers best good fair poor
2 478 466 351 373
4 1160 790 800 800
6 1100 850 880 810
8 1000 830 750 760
Dataset size
![Page 22: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/22.jpg)
22
Experimental results
• Comparison of linear and Raw kernel
0.3
0.5
0.7
0.9
Linear Raw
On average, Raw kernel improves the predictive accuracy by 6.4% over sixteen datasets compared to linear kernel.
![Page 23: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/23.jpg)
23
0.5
0.55
0.6
0.65
0.7
0.75
0.8
4 812 16 20 24 28 30 40 50 60 70 80 90
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
862
All features MIFS MRMR SVM-RFE
Using 80 features results in accuracy that is comparable or better than using all features
Experimental results
Using 40 features results in accuracy that is comparable to using all features
Acc
urac
y
Number of Features
(Fu and Fu-Liu, 2005)(Ding and Peng, 2005)
![Page 24: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/24.jpg)
24
Using MIFS for feature selection
• Result to test the hypothesis that 40 features are enough and 80 features are better
![Page 25: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/25.jpg)
25
A Web Server for Mining CGH Data
http://cghmine.cise.ufl.edu:8007/CGH/Default.html
![Page 26: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/26.jpg)
26
Thank you
![Page 27: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/27.jpg)
27
Appendix
![Page 28: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/28.jpg)
28
Minimum Redundancy and Maximum Relevance (MRMR)
0 1 1 0
0 1 1 0
0 1 1 0
0 0 0 1
0 0 0 0
0 0 0 0
x1
x2
x3
x4
x5
x6
Class
1
1
1
1
-1
-1
Features 1 2 3 4
X
Y
0 1
1
• Relevance V is defined as the average mutual information between features and class labels
• Redundancy W is defined as the average mutual information between all pairs of features
• Incrementally select features by maximizing (V / W) or (V – W)
![Page 29: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/29.jpg)
29
Compute the weight vector
Support Vector Machine Recursive Feature Elimination (SVM-RFE)
Train a linear SVM based on feature set
Compute the ranking coefficient wi2 for the ith feature
Remove the feature with smallest ranking coefficient
Is feature set empty?
N
Y
![Page 30: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/30.jpg)
30
Pairwise similarity measures
• Sim measure– Segment is a contiguous block of aberrations
of the same type.– Count the number of overlapping segment
pairs.
Sim = 2
![Page 31: 1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci .](https://reader036.fdocuments.in/reader036/viewer/2022062805/5697bfdf1a28abf838cb2b7b/html5/thumbnails/31.jpg)
31
Non-linear Decision Boundary
• How to generalize SVM when the two class classification problem is not linearly separable?
• Key idea: transform xi to a higher dimensional space to “make life easier”– Input space: the space the point xi are located
– Feature space: the space of (xi) after transformation
Input space
( )
( )
( )( )( )
( )
( )( )
(.)( )
( )
( )
( )( )
( )
( )
( )( )
( )
Feature space
A linear decisionboundary can
be found!
A linear decisionboundary can
be found!