Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

Post on 03-Oct-2020

1 views 0 download

Transcript of Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

Kernel Logistic Regression and the Import VectorMachine

Ji Zhu and Trevor HastieJournal of Computational and Graphical Statistics, 2005

Presented by Mingtao DingDuke University

December 8, 2011

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24

Summary

The authors propose a new approach for classification, called theimport vector machine (IVM).

Provides estimates of the class probabilities. Often these aremore useful than the classifications.

Generalizes naturally to M-class classification through kernellogistic regression (KLR).

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 2 / 24

Problem and Objective

Supervised Learning Problem: a set of training data {(xi , yi)},where xi ∈ Rp is an input vector, and yi (dependent on xi ) is aunivariate continuous output for the regress problem or binaryoutput for the classification problem.Objective: learn a predictive function f (x) from the training data

minf∈F

{1n

n∑i=1

L (yi , f (xi)) + λ2 Φ(‖f‖F )

}.

In this presentation, F is assumed as an reproducing kernelHilbert space (RKHS) HK .

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 3 / 24

SVM and KLR

The standard SVM can be fitted via Loss + Regularization

minf∈HK

{1n

n∑i=1

[1− yi f (xi)]+ + λ2 ‖f‖

2HK

}.

Under very general conditions, the solution has the form

f (x) =n∑

i=1aiK (x,xi).

Example Conditions:Arbitrary L ((x1, y1, f (x1), ) (x2, y2, f (x2), · · · , ) (xn, yn, f (xn))) andstrictly monotonic increasing function Φ.

The points with yi f (xi) > 1 have no influence in loss function. As aconsequence, it often happens that a sizeable fraction of the nvalues of ai can be zero. The points corresponding nonzero ai arecalled supporting points

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 4 / 24

SVM and KLR

The loss function (1− yf )+ is plotted in Fig.1, along with the negativelog-likelihood (NLL) of the binomial distribution (of y over {1,−1}).

NLL = ln(1 + e−yf ) =

{− ln p, if y = 1− ln (1− p) , if y = −1

,

where p ≡ P (Y = 1|X = x) = 11+e−f .

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 5 / 24

SVM and KLR

The SVM only estimates sign [p(x)− 1/2] (by calculating thedistances between x and the hyperplanes), without defining theclass probability p(x).

The NLL of y has a similar shape to that of the SVM.

If we let y ∈ {0,1}, then

NLL = − (y ln p + (1− y) ln p) = −(yf − ln

(1 + ef )) .

This is the loss function of the classical KLR.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 6 / 24

SVM and KLR

If we replace (1− yf )+ with ln(1 + e−yf ), the SVM becomes a KLR

problem with the objective junction

minf∈HK

{1n

n∑i=1

ln(1 + e−yf )+ λ

2 ‖f‖2HK

}.

Advantages:Offer a natural estimate of the class probability p(x).

Can naturally be generalized to the M-Class case through kernelmulti-logit regress.

Disadvantages: For the KLR solution f (x) =n∑

i=1aiK (x,xi), all the

ai ’s are nonzero

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 7 / 24

SVM and KLR

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 8 / 24

KLR as a Margin Maximizer

Suppose the basis functions of the transformed features space h(x) isrich enough, so that the superplane f (x) = h(x)Tβ + β0 = 0 canseparate the training data.

Theorem 1 Denote by β̂ (λ) the solution to KLR

minf∈HK

{1n

n∑i=1

ln(1 + e−yf )+ λ

2 ‖f‖2HK

},

then limλ→0

β̂ (λ) = β∗, where β∗ is the margin-maximizing SVM solution.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 9 / 24

Import Vector Machine

The objective function of KLR can be written as

H = 1n 1T ln

(1 + ey·(K1a)

)+ λ

2 aT K2a.

To find a, we set the derivative of H with respect to a equal to 0, anduse the Newton method iteratively solve the score equation. TheNewton update can be written as

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 10 / 24

Import Vector Machine

The computational cost of the KLR is O(n3). To save the cost, the IVMalgorithm will find a sub-model to approximate the full model given byKLR.

The sub-model has the form

f (x) =∑

xi∈SaiK (x,xi)

where S is a subset of the training data, and the data in S are calledimport points.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 11 / 24

Import Vector Machine

Algorithm 1:

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 12 / 24

Import Vector Machine

Algorithm 1:

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 13 / 24

Import Vector Machine

The algorithm can be accelerated by revising Step (2) as

Stopping rule for adding point to S: run the algorithm until|Hk−Hk−∆k ||Hk | < ε.

Choosing the regularization parameter λ: we can split all the datainto a training set and a tuning set, and use the misclassificationerror on the tuning set as a criterion for choosing λ (Algorithm 3).

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 14 / 24

Import Vector Machine

Algorithm 3:

1. Start with a large regularization parameter λ.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 15 / 24

Simulation Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 16 / 24

Simulation Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 17 / 24

Real Data Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 18 / 24

Real Data Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 19 / 24

Real Data Results

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 20 / 24

Generalization to M-Class Case

Similar with the kernel multi-logit regression, we define the classprobabilities as

C∑c=1

fc (x) = 0.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 21 / 24

Generalization to M-Class Case

The M-class KLR fits a model to minimize the regularized NLL ofmultinomial distribution

The approximate solution can be obtained by a M-class IVMprocedure, which is similar to the two-class case.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 22 / 24

Generalization to M-Class Case

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 23 / 24

Conclusion

IVM not only performs as well as SVm in two-class classification,but also can naturally be generalized to the M-class case.

Computational Cost: KLR (O(n3)), SVM (O(n2ns)),IVM (O(n2n2

I ), O(Cn2n2I )).

IVM has limiting optimal margin properties.

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 24 / 24