Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... ·...
Transcript of Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... ·...
![Page 1: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/1.jpg)
Sp’10 Bafna/Ideker
Classification (SVMs / Kernel method)
![Page 2: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/2.jpg)
Sp’10 Bafna/Ideker
LP versus Quadratic programming
min cT x
Ax b
x 0
min xTQx cT x
Ax b
x 0
• LP: linear constraints, linear
objective function
• LP can be solved in
polynomial time.
• In QP, the objective function
contains a quadratic form.
• For +ve semindefinite Q, the
QP can be solved in
polynomial time
![Page 3: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/3.jpg)
Sp’10 Bafna/Ideker
Margin of separation
• Suppose we find a separating hyperplane (, 0) s.t. – For all +ve points x
• Tx-0>=1
– For all +ve points x • Tx-0 <= -1
• What is the margin of separation?
Tx- 0=0
Tx- 0=1
Tx- 0=-1
![Page 4: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/4.jpg)
Sp’10 Bafna/Ideker
Separating by a wider margin
• Solutions with a wider margin are better.
Maximize 2
2 , or Minimize
2
2
![Page 5: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/5.jpg)
Sp’10 Bafna/Ideker
Separating via misclassification
• In general, data is not linearly separable
• What if we also wanted to minimize misclassified points
• Recall that, each sample xi in our training set has the label yi {-1,1}
• For each point i, yi(Txi-0) should be positive
• Define i >= max {0, 1- yi(Txi-0) }
• If i is correctly classified ( yi(Txi-0) >= 1), and i = 0
• If i is incorrectly classified, or close to the boundaries i > 0
• We must minimize ii
![Page 6: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/6.jpg)
Sp’10 Bafna/Ideker
Support Vector machines (wide margin and misclassification)
• Maximimize margin while minimizing misclassification
• Solved using non-linear optimization techniques
• The problem can be reformulated to exclusively using cross products of variables, which allows us to employ the kernel method.
• This gives a lot of power to the method.
min
2
2C i
i
![Page 7: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/7.jpg)
Sp’10 Bafna/Ideker
Reformulating the optimization
min
2
2C i
i
i 0
i 1 y i T xi 0
![Page 8: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/8.jpg)
Sp’10 Bafna/Ideker
Lagrangian relaxation
L
2
2C i
i i
i i 1 yi
T xi 0 ii
i
• Goal
• S.t.
• We minimize
min
2
2C i
i
i 0
i 1 y i T xi 0
![Page 9: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/9.jpg)
Sp’10 Bafna/Ideker
Simplifying
L
2
2C i
i i
i i 1 y i
T x i 0 ii
i
T
2 iy ix i
i
C i i
i i i
i y i0 i
i
• For fixed >= 0, >= 0, we minimize the
lagrangian
L
y i
i ix i 0 (1)
L
0
y ii
i 0 (2)
L
iC i i 0 (3)
![Page 10: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/10.jpg)
Sp’10 Bafna/Ideker
Substituting
• Substituting (1)
L
2
2C i
i i
i i 1 y i
T x i 0 ii
i
T
2 iy ix i
i
C i i
i i i
i y i0 i
i
L 1
2 i jyiy jxi
T x ji, j
C i i i
i ii
yi0 ii
![Page 11: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/11.jpg)
Sp’10 Bafna/Ideker
• Substituting (2,3), we have the minimization
problem
L 1
2 i jyiy jxi
T x ji, j
C i i i
i ii
yi0 ii
min 1
2 i jy iy jx i
T x ji, j
ii
s.t.
y i ii
0
0 i C
![Page 12: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/12.jpg)
Sp’10 Bafna/Ideker
Classification using SVMs
• Under these conditions, the problem is a
quadratic programming problem and can be
solved using known techniques
• Quiz: When we have solved this QP, how do
we classify a point x?
f (x) T x 0 yii
ixiT x 0
![Page 13: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/13.jpg)
Sp’10 Bafna/Ideker
The kernel method
• The SVM formulation can be solved using QP on dot-products.
• As these are wide-margin classifiers, they provide a more robust solution.
• However, the true power of SVMs approach from using ‘the kernel method’, which allows us to go to higher dimensional (and non-linear spaces)
![Page 14: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/14.jpg)
Sp’10 Bafna/Ideker
kernel
• Let X be the set of objects
– Ex: X =the set of samples in micro-arrays.
– Each object xX is a vector of gene expression
values
• k: X X -> R is a positive semidefinite kernel
if
– k is symmetric.
– k is +ve semidefinite
k(x,x') k(x',x)
cTkc 0 c Rp
![Page 15: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/15.jpg)
Sp’10 Bafna/Ideker
Kernels as dot-product
• Quiz: Suppose the objects x are all real vectors (as in gene expression)
• Define
• Is kL a kernel? It is symmetric, but is is +ve semi-definite?
kL x,x' xT x'
![Page 16: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/16.jpg)
Sp’10 Bafna/Ideker
Linear kernel is +ve semidefinite
• Recall X as a matrix, such that each column
is a sample
– X=[x1 x2 …]
• By definition, the linear kernel kL=XTX
• For any c
cTkLc cTXTXc Xc
2 0
![Page 17: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/17.jpg)
Sp’10 Bafna/Ideker
Generalizing kernels
• Any object can be represented by a feature
vector in real space.
: XRp
k(x,x') (x)T(x')
![Page 18: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/18.jpg)
Sp’10 Bafna/Ideker
Generalizing
• Note that the feature mapping could actually
be non-linear.
• On the flip side, Every kernel can be
represented as a dot-product in a high
dimensional space.
• Sometimes the kernel space is easier to
define than the mapping
![Page 19: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/19.jpg)
Sp’10 Bafna/Ideker
The kernel trick
• If an algorithm for vectorial data is expressed
exclusively in the form of dot-products, it can
be changed to an algorithm on an arbitrary
kernel
– Simply replace the dot-product by the kernel
![Page 20: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/20.jpg)
Sp’10 Bafna/Ideker
Kernel trick example
• Consider a kernel k defined on a mapping
– k(x,x’) = (x)T (x’)
• It could be that is very difficult to compute
explicitly, but k is easy to compute
• Suppose we define a distance function
between two objects as
• How do we compute this distance?
d(x,x') (x) (x')2
d(x,x') (x) (x')2 (x)T (x)(x')T(x') 2(x)T(x')
k(x,x) k(x',x') 2k(x,x')
![Page 21: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/21.jpg)
Sp’10 Bafna/Ideker
Kernels and SVMs
• Recall that SVM based classification is
described as
min 1
2 i jy iy jx i
T x ji, j
ii
s.t.
y i ii
0
0 i C
![Page 22: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/22.jpg)
Sp’10 Bafna/Ideker
Kernels and SVMs
• Applying the kernel trick
• We can try kernels that are biologically relevant
min 1
2 i jy iy jk(x i
i, j
,x j ) ii
s.t.
y i ii
0
0 i C
![Page 23: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/23.jpg)
Sp’10 Bafna/Ideker
Examples of kernels for vectors
linear kernel kL (x,x') xT x'
poly kernel kp (x,x') xT x'c d
Gaussian RBF kernel kG (x,x') exp x x'
2
2 2
![Page 24: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/24.jpg)
Sp’10 Bafna/Ideker
String kernel
• Consider a string s = s1, s2,…
• Define an index set I as a
subset of indices
• s[I] is the substring limited to
those indices
• l(I) = span
• W(I) = cl(I) c<1
– Weight decreases as span
increases
• For any string u of length k
l(I)
u(s) c l(I )
I :s(I ) u
![Page 25: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/25.jpg)
Sp’10 Bafna/Ideker
String Kernel
• Map every string to a ||n dimensional space, indexed by all strings u of length upto n
• The mapping is expensive, but given two strings s,t,the dot-product kernel k(s,t) = (s)T(t) can be computed in O(n |s| |t|) time
s u
u(s)
![Page 26: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/26.jpg)
Sp’10 Bafna/Ideker
SVM conclusion
• SVM are a generic scheme for classifying data with wide margins and low misclassifications
• For data that is not easily represented as vectors, the kernel trick provides a standard recipe for classification – Define a meaningful kernel, and solve using SVM
• Many standard kernels are available (linear, poly., RBF, string)
![Page 27: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/27.jpg)
Sp’10 Bafna/Ideker
Classification review
• We started out by treating the classification problem as one of separating points in high dimensional space
• Obvious for gene expression data, but applicable to any kind of data
• Question of separability, linear separation
• Algorithms for classification – Perceptron
– Lin. Discriminant
– Max Likelihood
– Linear Programming
– SVMs
– Kernel methods & SVM
![Page 28: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/28.jpg)
Sp’10 Bafna/Ideker
Classification review
• Recall that we considered 3 problems: – Group together samples in an unsupervised
fashion (clustering)
– Classify based on a training data (often by learning a hyperplane that separates).
– Selection of marker genes that are diagnostic for the class. All other genes can be discarded, leading to lower dimensionality.
![Page 29: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/29.jpg)
Sp’10 Bafna/Ideker
Dimensionality reduction
• Many genes have highly correlated
expression profiles.
• By discarding some of the genes, we can
greatly reduce the dimensionality of the
problem.
• There are other, more principled ways to do
such dimensionality reduction.
![Page 30: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/30.jpg)
Sp’10 Bafna/Ideker
Why is high dimensionality bad?
• With a high enough dimensionality, all points can be linearly separated.
• Recall that a point xi is misclassified if – it is +ve, but Txi-0<=0
– it is -ve, but Txi+0 > 0
• In the first case choose i s.t. – Txi-0+i >= 0
• By adding a dimension for each misclassified point, we create a higher dimension hyperplane that perfectly separates all of the points!
![Page 31: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/31.jpg)
Sp’10 Bafna/Ideker
Principle Components Analysis
• We get the intrinsic dimensionality of a data-
set.
![Page 32: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/32.jpg)
Sp’10 Bafna/Ideker
Principle Components Analysis
• Consider the expression values of 2 genes over 6 samples.
• Clearly, the expression of the two genes is highly correlated.
• Projecting all the genes on a single line could explain most of the data.
• This is a generalization of “discarding the gene”.
![Page 33: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/33.jpg)
Sp’10 Bafna/Ideker
Projecting
• Consider the mean of all points m, and a vector emanating from the mean
• Algebraically, this projection on means that all samples x can be represented by a single value T(x-m)
m
x
x-m
T =
M
T(x-m)
![Page 34: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/34.jpg)
Sp’10 Bafna/Ideker
Higher dimensions
• Consider a set of 2 (k) orthonormal vectors 1, 2…
• Once projected, each sample means that all samples x can be represented by 2 (k) dimensional vector
– 1T(x-m), 2
T(x-m)
1 m
x
x-m
1T
=
M
1T(x-m)
2
![Page 35: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/35.jpg)
Sp’10 Bafna/Ideker
How to project
• The generic scheme allows us to project an m dimensional surface into a k dimensional one.
• How do we select the k ‘best’ dimensions?
• The strategy used by PCA is one that maximizes the variance of the projected points around the mean
![Page 36: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/36.jpg)
Sp’10 Bafna/Ideker
PCA
• Suppose all of the data
were to be reduced by
projecting to a single line
from the mean.
• How do we select the line
? m
![Page 37: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/37.jpg)
Sp’10 Bafna/Ideker
PCA cont’d
• Let each point xk map to x’k=m+ak. We want to minimize the error
• Observation 1: Each point xk maps to x’k = m + T(xk-m) – (ak= T(xk-m))
xk x'k2
k
m
xk
x’k
![Page 38: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/38.jpg)
Sp’10 Bafna/Ideker
Proof of Observation 1
min ak xk x'k2
min ak xk m m x'k2
min ak xk m2 m x'k
2 2(x'k m)T (xk m)
min ak xk m2 ak
2T 2akT (xk m)
min ak xk m2 ak
2 2akT (xk m)
2ak 2T (xk m) 0
ak T (xk m)
ak2 ak
T (xk m)
xk x'k2 xk m
2T (xk m)(xk m)T
Differentiating w.r.t ak
![Page 39: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/39.jpg)
Sp’10 Bafna/Ideker
Minimizing PCA Error
• To minimize error, we must maximize TS
• By definition, = TS implies that is an eigenvalue, and the corresponding eigenvector.
• Therefore, we must choose the eigenvector corresponding to the largest eigenvalue.
xk x'kk
2
C T
k
(xk m)(xk m)T C TS
![Page 40: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/40.jpg)
Sp’10 Bafna/Ideker
PCA steps
• X = starting matrix with n columns, m rows
xj X
1. m 1
nx j
j1
n
2. hT 11 1
3. M X mhT
4. S MMT x j m j1
n
x j m T
5. BTSB
6. Return BTM
![Page 41: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/41.jpg)
End of Lecture
Sp’10 Bafna/Ideker
![Page 42: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/42.jpg)
Sp’10 Bafna/Ideker
![Page 43: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/43.jpg)
Sp’10 Bafna/Ideker
ALL-AML classification
• The two leukemias need different different
therapeutic regimen.
• Usually distinguished through hematopathology
• Can Gene expression be used for a more definitive
test?
– 38 bonemarrow samples
– Total mRNA was hybridized against probes for 6817 genes
– Q: Are these classes separable
![Page 44: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/44.jpg)
Sp’10 Bafna/Ideker
Neighborhood analysis (cont’d)
• Each gene is represented by an expression vector v(g) = (e1,e2,…,en)
• Choose an idealized expression vector as center.
• Discriminating genes will be ‘closer’ to the center (any distance measure can be used).
Discriminating gene
![Page 45: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/45.jpg)
Sp’10 Bafna/Ideker
Neighborhood analysis
• Q: Are there genes, whose expression correlates with
one of the two classes
• A: For each class, create an idealized vector c
– Compute the number of genes Nc whose expression
‘matches’ the idealized expression vector
– Is Nc significantly larger than Nc* for a random c*?
![Page 46: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/46.jpg)
Sp’10 Bafna/Ideker
Neighborhood test
• Distance measure used: – For any binary vector c, let the one entries denote class 1, and the 0
entries denote class 2
– Compute mean and std. dev. [1(g),1(g)] of expression in class 1 and also [2(g),2(g)].
– P(g,c) = [1(g)-2(g)]/ [1(g)+2(g)]
– N1(c,r) = {g | P(g,c) == r}
– High density for some r is indicative of correlation with class distinction
– Neighborhood is significant if a random center does not produce the same density.
![Page 47: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/47.jpg)
Sp’10 Bafna/Ideker
Neighborhood analysis
• #{g |P(g,c) > 0.3} > 709 (ALL) vs 173 by chance.
• Class prediction should be possible using micro-array expression values.
![Page 48: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/48.jpg)
Sp’10 Bafna/Ideker
Class prediction
• Choose a fixed set of informative genes (based on their correlation with the class distinction).
– The predictor is uniquely defined by the sample and the subset of informative genes.
• For each informative gene g, define (wg,bg).
– wg=P(g,c) (When is this +ve?)
– bg = [1(g)+2(g)]/2
• Given a new sample X
– xg is the normalized expression value at g
– Vote of gene g =wg|xg-bg| (+ve value is a vote for class 1, and negative for class 2)
![Page 49: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/49.jpg)
Sp’10 Bafna/Ideker
Prediction Strength
• PS = [Vwin-Vlose]/[Vwin+Vlose] – Reflects the margin of victory
• A 50 gene predictor is correct 36/38 (cross-validation)
• Prediction accuracy on other samples 100% (prediction made for 29/34 samples.
• Median PS = 0.73
• Other predictors between 10 and 200 genes all worked well.
![Page 50: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/50.jpg)
Sp’10 Bafna/Ideker
Performance
![Page 51: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/51.jpg)
Sp’10 Bafna/Ideker
Differentially expressed genes?
• Do the predictive genes reveal any biology?
• Initial expectation is that most genes would be of a hematopoetic lineage.
• However, many genes encode – Cell cycle progression genes
– Chromatin remodelling
– Transcription
– Known oncogenes
– Leukemia targets (etopside)
![Page 52: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/52.jpg)
Sp’10 Bafna/Ideker
Relationship between ML, and Golub predictor
• ML when the covariance matrix is a diagonal matrix with
identical variance for different classes is similar to Golub’s
classifier
p(x | i) 1
2 d
2 1
2
exp 1
2x m
T1
x m
gi(x) ln p(x | i) lnP( i)
Compute argmax i gi(x)
gi(x) ln p(x | i) lnP( i)
gi(x) x j kj
2
j
2
j1
p
g1(x) g2(x) 1 j 2 j
j
2x j
1 j 2 j 2
j
![Page 53: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/53.jpg)
Sp’10 Bafna/Ideker
Automatic class discovery
• The classification of different cancers is over
years of hypothesis driven research.
• Suppose you were given unlabeled samples
of ALL/AML. Would you be able to distinguish
the two classes?
![Page 54: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/54.jpg)
Sp’10 Bafna/Ideker
Self Organizing Maps
• SOMs was applied to group the 38 samples
• Class A1 contained 24/25 ALL and 3/13 AML
samples.
• How can we validate this?
• Use the labels to do supervised classification
via cross-validation
• A 20 gene predictor gave 34 accurate
predictions, 1 error, and 2 of 3 uncertains
![Page 55: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/55.jpg)
Sp’10 Bafna/Ideker
Comparing various error models
![Page 56: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for](https://reader035.fdocuments.in/reader035/viewer/2022081411/60b09f5d099f8a21656e15f1/html5/thumbnails/56.jpg)
Sp’10 Bafna/Ideker
Conclusion