Medical Image Analysis Machine learning 4A.4 L1-regularized Logistic Regression L1-regularized LR...

Medical Image Analysis Machine learning 4 KALLE ÅSTRÖM

Today – wrap up machine learning topics

•  ISOMAP – non-linear dimensionality reduction •  Logistic regression •  Classification, where parameter estimation becomes a

convex optimization problem •  ANN, where parameter estimation becomes a non-convex

optimization problem •  Boosting (feature selection)

ISOMAP

•  Idea (illustrate on blackboard) –  For each point choose the k nearest neighbours –  Form weighted graph using distance to the k nearest

neighbours – Calculate distance matrix D containing all distances dij

of pairs of feature vectors using shortest distance in graph.

– Use Multi-dimensional scaling to embed points in e g R2

Estimated p(y|x) using Bayes theorem (Discussion -> logistic regression

Parzen Windows Conditional

: An Introduction to Machine Learning 44 / 49

Logistic regression

•  Discuss ideas and derivations on blackboard

More machine learning algorithms, where the parameter est => convex opt

•  SVM (L2 regularized, L1 loss) •  SVM (L2 regularized, L2 loss) •  LR (L2 regularized)

•  SVM (L1 regularized, L2 loss) •  LR (L1 regularized)

•  Efficient implementations e g in ’liblinear package’

LIBLINEAR: A Library for Large Linear Classification

Acknowledgments

This work was supported in part by the National Science Council of Taiwan via the grantNSC 95-2221-E-002-205-MY3.

Appendix: Implementation Details and Practical Guide

Appendix A. Formulations

This section briefly describes classifiers supported in LIBLINEAR. Given training vectorsx

i

2 R

n

, i = 1, . . . , l in two class, and a vector y 2 R

l such that y

i

= {1,�1}, a linearclassifier generates a weight vector w as the model. The decision function is

sgn�w

T

x

�.

LIBLINEAR allows the classifier to include a bias term b. See Section 2 for details.

A.1 L2-regularized L1- and L2-loss Support Vector Classification

L2-regularized L1-loss SVC solves the following primal problem:

minw

1

2w

T

w + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

)),

whereas L2-regularized L2-loss SVC solves the following primal problem:

minw

1

2w

T

w + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

))2. (2)

Their dual forms are:

min↵

1

2↵

T

Q̄↵� e

T

↵

subject to 0 ↵

i

U, i = 1, . . . , l.

where e is the vector of all ones, Q̄ = Q+D, D is a diagonal matrix, and Q

ij

= y

i

y

j

x

T

i

x

j

.For L1-loss SVC, U = C and D

ii

= 0, 8i. For L2-loss SVC, U = 1 and D

ii

= 1/(2C), 8i.

A.2 L2-regularized Logistic Regression

L2-regularized LR solves the following unconstrained optimization problem:

minw

1

2w

T

w + C

lX

i=1

log(1 + e

�y

i

w

T

x

i). (3)

Its dual form is:

min↵

1

2↵

T

Q↵+X

i:↵i

>0

↵

i

log↵i

+X

i:↵i

<C

(C � ↵

i

) log(C � ↵

i

)�lX

i=1

C logC

subject to 0 ↵

i

C, i = 1, . . . , l.

(4)

A.1


Acknowledgments





i

2 R

n


l such that y

i


sgn�w

T

x

�.




minw

1

2w

T

w + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

)),


minw

1

2w

T

w + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

))2. (2)


min↵

1

2↵

T

Q̄↵� e

T

↵

subject to 0 ↵

i

U, i = 1, . . . , l.


ij

= y

i

y

j

x

T

i

x

j


ii


ii

= 1/(2C), 8i.



minw

1

2w

T

w + C

lX

i=1

log(1 + e

�y

i

w

T

x

i). (3)

Its dual form is:

min↵

1

2↵

T

Q↵+X

i:↵i

>0

↵

i

log↵i

+X

i:↵i

<C

(C � ↵

i

) log(C � ↵

i

)�lX

i=1

C logC

subject to 0 ↵

i

C, i = 1, . . . , l.

(4)

A.1


Acknowledgments





i

2 R

n


l such that y

i


sgn�w

T

x

�.




minw

1

2w

T

w + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

)),


minw

1

2w

T

w + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

))2. (2)


min↵

1

2↵

T

Q̄↵� e

T

↵

subject to 0 ↵

i

U, i = 1, . . . , l.


ij

= y

i

y

j

x

T

i

x

j


ii


ii

= 1/(2C), 8i.



minw

1

2w

T

w + C

lX

i=1

log(1 + e

�y

i

w

T

x

i). (3)

Its dual form is:

min↵

1

2↵

T

Q↵+X

i:↵i

>0

↵

i

log↵i

+X

i:↵i

<C

(C � ↵

i

) log(C � ↵

i

)�lX

i=1

C logC

subject to 0 ↵

i

C, i = 1, . . . , l.

(4)

A.1

Fan, Chang, Hsieh, Wang and Lin

A.3 L1-regularized L2-loss Support Vector Classification

L1 regularization generates a sparse solution w. L1-regularized L2-loss SVC solves thefollowing primal problem:

minw

kwk1 + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

))2. (5)

where k · k1 denotes the 1-norm.



minw

kwk1 + C

lX

i=1

log(1 + e

�y

i

w

T

x

i). (6)


A.5 L2-regularized L1- and L2-loss Support Vector Regression

Support vector regression (SVR) considers a problem similar to (1), but y

i

is a real valueinstead of +1 or �1. L2-regularized SVR solves the following primal problems:

minw

1

2w

T

w +

(C

Pl

i=1(max(0, |yi

�w

T

x

i

|� ✏)) if using L1 loss,

C

Pl

i=1(max(0, |yi

�w

T

x

i

|� ✏))2 if using L2 loss,

where ✏ � 0 is a parameter to specify the sensitiveness of the loss.Their dual forms are:

min↵

+,↵

�

1

2

⇥↵

+↵

�⇤Q̄ �Q

�Q Q̄

� ↵

+

↵

�

�� y

T (↵+ �↵

�) + ✏e

T (↵+ +↵

�)

subject to 0 ↵

+i

,↵

�i

U, i = 1, . . . , l,

(7)

where e is the vector of all ones, Q̄ = Q+D, Q 2 R

l⇥l is a matrix with Q

ij

⌘ x

T

i

x

j

, D isa diagonal matrix,

D

ii

=

⇢012C

, and U =

(C if using L1-loss SVR,

1 if using L2-loss SVR.

Rather than (7), in LIBLINEAR, we consider the following problem.

min�

1

2�

T

Q̄� � y

T

� + ✏k�k1

subject to � U �

i

U, i = 1, . . . , l,(8)

where � 2 R

l and k · k1 denotes the 1-norm. It can be shown that an optimal solution of(8) leads to the following optimal solution of (7).

↵

+i

⌘ max(�i

, 0) and ↵

�i

⌘ max(��

i

, 0).

A.2

Fan, Chang, Hsieh, Wang and Lin

A.3 L1-regularized L2-loss Support Vector Classification

L1 regularization generates a sparse solution w. L1-regularized L2-loss SVC solves thefollowing primal problem:

minw

kwk1 + C

lX

i=1

(max(0, 1� y

i

w

T

x

i

))2. (5)




minw

kwk1 + C

lX

i=1

log(1 + e

�y

i

w

T

x

i). (6)


A.5 L2-regularized L1- and L2-loss Support Vector Regression

Support vector regression (SVR) considers a problem similar to (1), but y

i

is a real valueinstead of +1 or �1. L2-regularized SVR solves the following primal problems:

minw

1

2w

T

w +

(C

Pl

i=1(max(0, |yi

�w

T

x

i

|� ✏)) if using L1 loss,

C

Pl

i=1(max(0, |yi

�w

T

x

i

|� ✏))2 if using L2 loss,

where ✏ � 0 is a parameter to specify the sensitiveness of the loss.Their dual forms are:

min↵

+,↵

�

1

2

⇥↵

+↵

�⇤Q̄ �Q

�Q Q̄

� ↵

+

↵

�

�� y

T (↵+ �↵

�) + ✏e

T (↵+ +↵

�)

subject to 0 ↵

+i

,↵

�i

U, i = 1, . . . , l,

(7)

where e is the vector of all ones, Q̄ = Q+D, Q 2 R

l⇥l is a matrix with Q

ij

⌘ x

T

i

x

j

, D isa diagonal matrix,

D

ii

=

⇢012C

, and U =

(C if using L1-loss SVR,

1 if using L2-loss SVR.

Rather than (7), in LIBLINEAR, we consider the following problem.

min�

1

2�

T

Q̄� � y

T

� + ✏k�k1

subject to � U �

i

U, i = 1, . . . , l,(8)

where � 2 R

l and k · k1 denotes the 1-norm. It can be shown that an optimal solution of(8) leads to the following optimal solution of (7).

↵

+i

⌘ max(�i

, 0) and ↵

�i

⌘ max(��

i

, 0).

A.2

Medical Image Analysis Machine learning 4A.4 L1-regularized Logistic Regression L1-regularized LR...

Documents

Transcript of Medical Image Analysis Machine learning 4A.4 L1-regularized Logistic Regression L1-regularized LR...