Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

24
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24

Transcript of Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

Page 1: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Kernel Logistic Regression and the Import VectorMachine

Ji Zhu and Trevor HastieJournal of Computational and Graphical Statistics, 2005

Presented by Mingtao DingDuke University

December 8, 2011

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24

Page 2: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Summary

The authors propose a new approach for classification, called theimport vector machine (IVM).

Provides estimates of the class probabilities. Often these aremore useful than the classifications.

Generalizes naturally to M-class classification through kernellogistic regression (KLR).

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 2 / 24

Page 3: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Problem and Objective

Supervised Learning Problem: a set of training data {(xi , yi)},where xi ∈ Rp is an input vector, and yi (dependent on xi ) is aunivariate continuous output for the regress problem or binaryoutput for the classification problem.Objective: learn a predictive function f (x) from the training data

minf∈F

{1n

n∑i=1

L (yi , f (xi)) + λ2 Φ(‖f‖F )

}.

In this presentation, F is assumed as an reproducing kernelHilbert space (RKHS) HK .

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 3 / 24

Page 4: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

SVM and KLR

The standard SVM can be fitted via Loss + Regularization

minf∈HK

{1n

n∑i=1

[1− yi f (xi)]+ + λ2 ‖f‖

2HK

}.

Under very general conditions, the solution has the form

f (x) =n∑

i=1aiK (x,xi).

Example Conditions:Arbitrary L ((x1, y1, f (x1), ) (x2, y2, f (x2), · · · , ) (xn, yn, f (xn))) andstrictly monotonic increasing function Φ.

The points with yi f (xi) > 1 have no influence in loss function. As aconsequence, it often happens that a sizeable fraction of the nvalues of ai can be zero. The points corresponding nonzero ai arecalled supporting points

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 4 / 24

Page 5: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

SVM and KLR

The loss function (1− yf )+ is plotted in Fig.1, along with the negativelog-likelihood (NLL) of the binomial distribution (of y over {1,−1}).

NLL = ln(1 + e−yf ) =

{− ln p, if y = 1− ln (1− p) , if y = −1

,

where p ≡ P (Y = 1|X = x) = 11+e−f .

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 5 / 24

Page 6: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

SVM and KLR

The SVM only estimates sign [p(x)− 1/2] (by calculating thedistances between x and the hyperplanes), without defining theclass probability p(x).

The NLL of y has a similar shape to that of the SVM.

If we let y ∈ {0,1}, then

NLL = − (y ln p + (1− y) ln p) = −(yf − ln

(1 + ef )) .

This is the loss function of the classical KLR.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 6 / 24

Page 7: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

SVM and KLR

If we replace (1− yf )+ with ln(1 + e−yf ), the SVM becomes a KLR

problem with the objective junction

minf∈HK

{1n

n∑i=1

ln(1 + e−yf )+ λ

2 ‖f‖2HK

}.

Advantages:Offer a natural estimate of the class probability p(x).

Can naturally be generalized to the M-Class case through kernelmulti-logit regress.

Disadvantages: For the KLR solution f (x) =n∑

i=1aiK (x,xi), all the

ai ’s are nonzero

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 7 / 24

Page 8: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

SVM and KLR

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 8 / 24

Page 9: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

KLR as a Margin Maximizer

Suppose the basis functions of the transformed features space h(x) isrich enough, so that the superplane f (x) = h(x)Tβ + β0 = 0 canseparate the training data.

Theorem 1 Denote by β̂ (λ) the solution to KLR

minf∈HK

{1n

n∑i=1

ln(1 + e−yf )+ λ

2 ‖f‖2HK

},

then limλ→0

β̂ (λ) = β∗, where β∗ is the margin-maximizing SVM solution.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 9 / 24

Page 10: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Import Vector Machine

The objective function of KLR can be written as

H = 1n 1T ln

(1 + ey·(K1a)

)+ λ

2 aT K2a.

To find a, we set the derivative of H with respect to a equal to 0, anduse the Newton method iteratively solve the score equation. TheNewton update can be written as

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 10 / 24

Page 11: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Import Vector Machine

The computational cost of the KLR is O(n3). To save the cost, the IVMalgorithm will find a sub-model to approximate the full model given byKLR.

The sub-model has the form

f (x) =∑

xi∈SaiK (x,xi)

where S is a subset of the training data, and the data in S are calledimport points.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 11 / 24

Page 12: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Import Vector Machine

Algorithm 1:

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 12 / 24

Page 13: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Import Vector Machine

Algorithm 1:

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 13 / 24

Page 14: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Import Vector Machine

The algorithm can be accelerated by revising Step (2) as

Stopping rule for adding point to S: run the algorithm until|Hk−Hk−∆k ||Hk | < ε.

Choosing the regularization parameter λ: we can split all the datainto a training set and a tuning set, and use the misclassificationerror on the tuning set as a criterion for choosing λ (Algorithm 3).

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 14 / 24

Page 15: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Import Vector Machine

Algorithm 3:

1. Start with a large regularization parameter λ.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 15 / 24

Page 16: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Simulation Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 16 / 24

Page 17: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Simulation Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 17 / 24

Page 18: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Real Data Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 18 / 24

Page 19: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Real Data Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 19 / 24

Page 20: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Real Data Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 20 / 24

Page 21: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Generalization to M-Class Case

Similar with the kernel multi-logit regression, we define the classprobabilities as

C∑c=1

fc (x) = 0.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 21 / 24

Page 22: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Generalization to M-Class Case

The M-class KLR fits a model to minimize the regularized NLL ofmultinomial distribution

The approximate solution can be obtained by a M-class IVMprocedure, which is similar to the two-class case.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 22 / 24

Page 23: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Generalization to M-Class Case

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 23 / 24

Page 24: Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011  · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating the distances

Conclusion

IVM not only performs as well as SVm in two-class classification,but also can naturally be generalized to the M-class case.

Computational Cost: KLR (O(n3)), SVM (O(n2ns)),IVM (O(n2n2

I ), O(Cn2n2I )).

IVM has limiting optimal margin properties.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 24 / 24