Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

Kernel Logistic Regression and the Import VectorMachine

Ji Zhu and Trevor HastieJournal of Computational and Graphical Statistics, 2005

Presented by Mingtao DingDuke University

December 8, 2011

Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24

Summary

The authors propose a new approach for classification, called theimport vector machine (IVM).

Provides estimates of the class probabilities. Often these aremore useful than the classifications.

Generalizes naturally to M-class classification through kernellogistic regression (KLR).

Problem and Objective

Supervised Learning Problem: a set of training data {(xi , yi)},where xi ∈ Rp is an input vector, and yi (dependent on xi ) is aunivariate continuous output for the regress problem or binaryoutput for the classification problem.Objective: learn a predictive function f (x) from the training data

minf∈F

n∑i=1

L (yi , f (xi)) + λ2 Φ(‖f‖F )

In this presentation, F is assumed as an reproducing kernelHilbert space (RKHS) HK .

SVM and KLR

The standard SVM can be fitted via Loss + Regularization

minf∈HK

n∑i=1

[1− yi f (xi)]+ + λ2 ‖f‖

Under very general conditions, the solution has the form

f (x) =n∑

i=1aiK (x,xi).

Example Conditions:Arbitrary L ((x1, y1, f (x1), ) (x2, y2, f (x2), · · · , ) (xn, yn, f (xn))) andstrictly monotonic increasing function Φ.

The points with yi f (xi) > 1 have no influence in loss function. As aconsequence, it often happens that a sizeable fraction of the nvalues of ai can be zero. The points corresponding nonzero ai arecalled supporting points

SVM and KLR

The loss function (1− yf )+ is plotted in Fig.1, along with the negativelog-likelihood (NLL) of the binomial distribution (of y over {1,−1}).

NLL = ln(1 + e−yf ) =

{− ln p, if y = 1− ln (1− p) , if y = −1

where p ≡ P (Y = 1|X = x) = 11+e−f .

SVM and KLR

The SVM only estimates sign [p(x)− 1/2] (by calculating thedistances between x and the hyperplanes), without defining theclass probability p(x).

The NLL of y has a similar shape to that of the SVM.

If we let y ∈ {0,1}, then

NLL = − (y ln p + (1− y) ln p) = −(yf − ln

(1 + ef )) .

This is the loss function of the classical KLR.

SVM and KLR

If we replace (1− yf )+ with ln(1 + e−yf ), the SVM becomes a KLR

problem with the objective junction

minf∈HK

n∑i=1

ln(1 + e−yf )+ λ

2 ‖f‖2HK

Advantages:Offer a natural estimate of the class probability p(x).

Can naturally be generalized to the M-Class case through kernelmulti-logit regress.

Disadvantages: For the KLR solution f (x) =n∑

i=1aiK (x,xi), all the

ai ’s are nonzero

SVM and KLR

KLR as a Margin Maximizer

Suppose the basis functions of the transformed features space h(x) isrich enough, so that the superplane f (x) = h(x)Tβ + β0 = 0 canseparate the training data.

Theorem 1 Denote by β̂ (λ) the solution to KLR

minf∈HK

n∑i=1

ln(1 + e−yf )+ λ

2 ‖f‖2HK

then limλ→0

β̂ (λ) = β∗, where β∗ is the margin-maximizing SVM solution.

Import Vector Machine

The objective function of KLR can be written as

H = 1n 1T ln

(1 + ey·(K1a)

2 aT K2a.

To find a, we set the derivative of H with respect to a equal to 0, anduse the Newton method iteratively solve the score equation. TheNewton update can be written as

The computational cost of the KLR is O(n3). To save the cost, the IVMalgorithm will find a sub-model to approximate the full model given byKLR.

The sub-model has the form

f (x) =∑

xi∈SaiK (x,xi)

where S is a subset of the training data, and the data in S are calledimport points.

Algorithm 1:

The algorithm can be accelerated by revising Step (2) as

Stopping rule for adding point to S: run the algorithm until|Hk−Hk−∆k ||Hk | < ε.

Choosing the regularization parameter λ: we can split all the datainto a training set and a tuning set, and use the misclassificationerror on the tuning set as a criterion for choosing λ (Algorithm 3).

Algorithm 3:

1. Start with a large regularization parameter λ.

Simulation Results

Real Data Results

Generalization to M-Class Case

Similar with the kernel multi-logit regression, we define the classprobabilities as

C∑c=1

fc (x) = 0.

The M-class KLR fits a model to minimize the regularized NLL ofmultinomial distribution

The approximate solution can be obtained by a M-class IVMprocedure, which is similar to the two-class case.

Conclusion

IVM not only performs as well as SVm in two-class classification,but also can naturally be generalized to the M-class case.

Computational Cost: KLR (O(n3)), SVM (O(n2ns)),IVM (O(n2n2

I ), O(Cn2n2I )).

IVM has limiting optimal margin properties.

Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

Documents

Transcript of Kernel Logistic Regression and the Import Vector Machinelcarin/Mingtao12.9.2011.pdf · 12/9/2011...

Kernel Logistic Regression and the Import Vector Machinepeople.ee.duke.edu/~lcarin/Mingtao12.9.2011.pdfDec 09, 2011 · SVM and KLR The SVM only estimates sign[p(x) 1=2] (by calculating

Boletín de noticias KLR 10MAY2016

￼A coverage criterion for spaced seeds and its applications to SVM string-kernels and k-mer distances - presentation

Boletín de noticias KLR 20 MAY 2016

Manual de Servicio KLR 650

PBW bases and KLR algebrasmathsoc.jp/meeting/msjsi12/file/Kato.pdfKLR algebras and KLR-version of Lusztig’s conjecture Induction functors between KLR-algebras R -gmod : category

Boletín de noticias KLR 10JUN2016

Boletín de noticias KLR 24MAY2016

Boletín de noticias KLR 1JUN2016

KLR SERIES T Power -C mPlifiers -P - Ashly Audioashly.com/wp-content/uploads/2015/08/klr-series.pdf · klr3200 klr4000 klr5000 KLR SERIES ... RMS per channel into 4 Ohm loads, or

Boletín de noticias KLR 28JUN2016

Llantas klr 650

KLR Initiation Report - D. Gacicia

PRODUCT CATALOGUE - klr-toner.com

Coulomb branches and KLR algebras - uwaterloo.ca

Boletín de noticias KLR 08JUN2016

Contents...IDENTIFICATION KLR FLAT BREAD BAGGER (KLR.5050) Left/right : Serial: Name and address of the manufacturer KLR SYSTEMS INC.. 944 rue Des Hérons, SAINT‐PIE, QUEBEC, CANADA

Boletín de noticias KLR 02JUN2016

Boletín de noticias KLR 06MAY2016

Boletín de noticias KLR 30JUN2016

A coverage criterion for spaced seeds and its applications to SVM string-kernels and k-mer distances - presentation