Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...
Transcript of Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...
Kernel Logistic Regression and the Import VectorMachine
Ji Zhu and Trevor HastieJournal of Computational and Graphical Statistics, 2005
Presented by Mingtao DingDuke University
December 8, 2011
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24
Summary
The authors propose a new approach for classification, called theimport vector machine (IVM).
Provides estimates of the class probabilities. Often these aremore useful than the classifications.
Generalizes naturally to M-class classification through kernellogistic regression (KLR).
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 2 / 24
Problem and Objective
Supervised Learning Problem: a set of training data {(xi , yi)},where xi ∈ Rp is an input vector, and yi (dependent on xi ) is aunivariate continuous output for the regress problem or binaryoutput for the classification problem.Objective: learn a predictive function f (x) from the training data
minf∈F
{1n
n∑i=1
L (yi , f (xi)) + λ2 Φ(‖f‖F )
}.
In this presentation, F is assumed as an reproducing kernelHilbert space (RKHS) HK .
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 3 / 24
SVM and KLR
The standard SVM can be fitted via Loss + Regularization
minf∈HK
{1n
n∑i=1
[1− yi f (xi)]+ + λ2 ‖f‖
2HK
}.
Under very general conditions, the solution has the form
f (x) =n∑
i=1aiK (x,xi).
Example Conditions:Arbitrary L ((x1, y1, f (x1), ) (x2, y2, f (x2), · · · , ) (xn, yn, f (xn))) andstrictly monotonic increasing function Φ.
The points with yi f (xi) > 1 have no influence in loss function. As aconsequence, it often happens that a sizeable fraction of the nvalues of ai can be zero. The points corresponding nonzero ai arecalled supporting points
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 4 / 24
SVM and KLR
The loss function (1− yf )+ is plotted in Fig.1, along with the negativelog-likelihood (NLL) of the binomial distribution (of y over {1,−1}).
NLL = ln(1 + e−yf ) =
{− ln p, if y = 1− ln (1− p) , if y = −1
,
where p ≡ P (Y = 1|X = x) = 11+e−f .
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 5 / 24
SVM and KLR
The SVM only estimates sign [p(x)− 1/2] (by calculating thedistances between x and the hyperplanes), without defining theclass probability p(x).
The NLL of y has a similar shape to that of the SVM.
If we let y ∈ {0,1}, then
NLL = − (y ln p + (1− y) ln p) = −(yf − ln
(1 + ef )) .
This is the loss function of the classical KLR.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 6 / 24
SVM and KLR
If we replace (1− yf )+ with ln(1 + e−yf ), the SVM becomes a KLR
problem with the objective junction
minf∈HK
{1n
n∑i=1
ln(1 + e−yf )+ λ
2 ‖f‖2HK
}.
Advantages:Offer a natural estimate of the class probability p(x).
Can naturally be generalized to the M-Class case through kernelmulti-logit regress.
Disadvantages: For the KLR solution f (x) =n∑
i=1aiK (x,xi), all the
ai ’s are nonzero
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 7 / 24
SVM and KLR
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 8 / 24
KLR as a Margin Maximizer
Suppose the basis functions of the transformed features space h(x) isrich enough, so that the superplane f (x) = h(x)Tβ + β0 = 0 canseparate the training data.
Theorem 1 Denote by β̂ (λ) the solution to KLR
minf∈HK
{1n
n∑i=1
ln(1 + e−yf )+ λ
2 ‖f‖2HK
},
then limλ→0
β̂ (λ) = β∗, where β∗ is the margin-maximizing SVM solution.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 9 / 24
Import Vector Machine
The objective function of KLR can be written as
H = 1n 1T ln
(1 + ey·(K1a)
)+ λ
2 aT K2a.
To find a, we set the derivative of H with respect to a equal to 0, anduse the Newton method iteratively solve the score equation. TheNewton update can be written as
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 10 / 24
Import Vector Machine
The computational cost of the KLR is O(n3). To save the cost, the IVMalgorithm will find a sub-model to approximate the full model given byKLR.
The sub-model has the form
f (x) =∑
xi∈SaiK (x,xi)
where S is a subset of the training data, and the data in S are calledimport points.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 11 / 24
Import Vector Machine
Algorithm 1:
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 12 / 24
Import Vector Machine
Algorithm 1:
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 13 / 24
Import Vector Machine
The algorithm can be accelerated by revising Step (2) as
Stopping rule for adding point to S: run the algorithm until|Hk−Hk−∆k ||Hk | < ε.
Choosing the regularization parameter λ: we can split all the datainto a training set and a tuning set, and use the misclassificationerror on the tuning set as a criterion for choosing λ (Algorithm 3).
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 14 / 24
Import Vector Machine
Algorithm 3:
1. Start with a large regularization parameter λ.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 15 / 24
Simulation Results
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 16 / 24
Simulation Results
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 17 / 24
Real Data Results
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 18 / 24
Real Data Results
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 19 / 24
Real Data Results
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 20 / 24
Generalization to M-Class Case
Similar with the kernel multi-logit regression, we define the classprobabilities as
C∑c=1
fc (x) = 0.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 21 / 24
Generalization to M-Class Case
The M-class KLR fits a model to minimize the regularized NLL ofmultinomial distribution
The approximate solution can be obtained by a M-class IVMprocedure, which is similar to the two-class case.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 22 / 24
Generalization to M-Class Case
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 23 / 24
Conclusion
IVM not only performs as well as SVm in two-class classification,but also can naturally be generalized to the M-class case.
Computational Cost: KLR (O(n3)), SVM (O(n2ns)),IVM (O(n2n2
I ), O(Cn2n2I )).
IVM has limiting optimal margin properties.
Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 24 / 24