Medical Image Analysis Machine learning 4A.4 L1-regularized Logistic Regression L1-regularized LR...
Transcript of Medical Image Analysis Machine learning 4A.4 L1-regularized Logistic Regression L1-regularized LR...
Medical Image Analysis Machine learning 4 KALLE ÅSTRÖM
Today – wrap up machine learning topics
• ISOMAP – non-linear dimensionality reduction • Logistic regression • Classification, where parameter estimation becomes a
convex optimization problem • ANN, where parameter estimation becomes a non-convex
optimization problem • Boosting (feature selection)
ISOMAP
• Idea (illustrate on blackboard) – For each point choose the k nearest neighbours – Form weighted graph using distance to the k nearest
neighbours – Calculate distance matrix D containing all distances dij
of pairs of feature vectors using shortest distance in graph.
– Use Multi-dimensional scaling to embed points in e g R2
Estimated p(y|x) using Bayes theorem (Discussion -> logistic regression
Parzen Windows Conditional
: An Introduction to Machine Learning 44 / 49
Logistic regression
• Discuss ideas and derivations on blackboard
More machine learning algorithms, where the parameter est => convex opt
• SVM (L2 regularized, L1 loss) • SVM (L2 regularized, L2 loss) • LR (L2 regularized)
• SVM (L1 regularized, L2 loss) • LR (L1 regularized)
• Efficient implementations e g in ’liblinear package’
LIBLINEAR: A Library for Large Linear Classification
Acknowledgments
This work was supported in part by the National Science Council of Taiwan via the grantNSC 95-2221-E-002-205-MY3.
Appendix: Implementation Details and Practical Guide
Appendix A. Formulations
This section briefly describes classifiers supported in LIBLINEAR. Given training vectorsx
i
2 R
n
, i = 1, . . . , l in two class, and a vector y 2 R
l such that y
i
= {1,�1}, a linearclassifier generates a weight vector w as the model. The decision function is
sgn�w
T
x
�.
LIBLINEAR allows the classifier to include a bias term b. See Section 2 for details.
A.1 L2-regularized L1- and L2-loss Support Vector Classification
L2-regularized L1-loss SVC solves the following primal problem:
minw
1
2w
T
w + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
)),
whereas L2-regularized L2-loss SVC solves the following primal problem:
minw
1
2w
T
w + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
))2. (2)
Their dual forms are:
min↵
1
2↵
T
Q̄↵� e
T
↵
subject to 0 ↵
i
U, i = 1, . . . , l.
where e is the vector of all ones, Q̄ = Q+D, D is a diagonal matrix, and Q
ij
= y
i
y
j
x
T
i
x
j
.For L1-loss SVC, U = C and D
ii
= 0, 8i. For L2-loss SVC, U = 1 and D
ii
= 1/(2C), 8i.
A.2 L2-regularized Logistic Regression
L2-regularized LR solves the following unconstrained optimization problem:
minw
1
2w
T
w + C
lX
i=1
log(1 + e
�y
i
w
T
x
i). (3)
Its dual form is:
min↵
1
2↵
T
Q↵+X
i:↵i
>0
↵
i
log↵i
+X
i:↵i
<C
(C � ↵
i
) log(C � ↵
i
)�lX
i=1
C logC
subject to 0 ↵
i
C, i = 1, . . . , l.
(4)
A.1
LIBLINEAR: A Library for Large Linear Classification
Acknowledgments
This work was supported in part by the National Science Council of Taiwan via the grantNSC 95-2221-E-002-205-MY3.
Appendix: Implementation Details and Practical Guide
Appendix A. Formulations
This section briefly describes classifiers supported in LIBLINEAR. Given training vectorsx
i
2 R
n
, i = 1, . . . , l in two class, and a vector y 2 R
l such that y
i
= {1,�1}, a linearclassifier generates a weight vector w as the model. The decision function is
sgn�w
T
x
�.
LIBLINEAR allows the classifier to include a bias term b. See Section 2 for details.
A.1 L2-regularized L1- and L2-loss Support Vector Classification
L2-regularized L1-loss SVC solves the following primal problem:
minw
1
2w
T
w + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
)),
whereas L2-regularized L2-loss SVC solves the following primal problem:
minw
1
2w
T
w + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
))2. (2)
Their dual forms are:
min↵
1
2↵
T
Q̄↵� e
T
↵
subject to 0 ↵
i
U, i = 1, . . . , l.
where e is the vector of all ones, Q̄ = Q+D, D is a diagonal matrix, and Q
ij
= y
i
y
j
x
T
i
x
j
.For L1-loss SVC, U = C and D
ii
= 0, 8i. For L2-loss SVC, U = 1 and D
ii
= 1/(2C), 8i.
A.2 L2-regularized Logistic Regression
L2-regularized LR solves the following unconstrained optimization problem:
minw
1
2w
T
w + C
lX
i=1
log(1 + e
�y
i
w
T
x
i). (3)
Its dual form is:
min↵
1
2↵
T
Q↵+X
i:↵i
>0
↵
i
log↵i
+X
i:↵i
<C
(C � ↵
i
) log(C � ↵
i
)�lX
i=1
C logC
subject to 0 ↵
i
C, i = 1, . . . , l.
(4)
A.1
LIBLINEAR: A Library for Large Linear Classification
Acknowledgments
This work was supported in part by the National Science Council of Taiwan via the grantNSC 95-2221-E-002-205-MY3.
Appendix: Implementation Details and Practical Guide
Appendix A. Formulations
This section briefly describes classifiers supported in LIBLINEAR. Given training vectorsx
i
2 R
n
, i = 1, . . . , l in two class, and a vector y 2 R
l such that y
i
= {1,�1}, a linearclassifier generates a weight vector w as the model. The decision function is
sgn�w
T
x
�.
LIBLINEAR allows the classifier to include a bias term b. See Section 2 for details.
A.1 L2-regularized L1- and L2-loss Support Vector Classification
L2-regularized L1-loss SVC solves the following primal problem:
minw
1
2w
T
w + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
)),
whereas L2-regularized L2-loss SVC solves the following primal problem:
minw
1
2w
T
w + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
))2. (2)
Their dual forms are:
min↵
1
2↵
T
Q̄↵� e
T
↵
subject to 0 ↵
i
U, i = 1, . . . , l.
where e is the vector of all ones, Q̄ = Q+D, D is a diagonal matrix, and Q
ij
= y
i
y
j
x
T
i
x
j
.For L1-loss SVC, U = C and D
ii
= 0, 8i. For L2-loss SVC, U = 1 and D
ii
= 1/(2C), 8i.
A.2 L2-regularized Logistic Regression
L2-regularized LR solves the following unconstrained optimization problem:
minw
1
2w
T
w + C
lX
i=1
log(1 + e
�y
i
w
T
x
i). (3)
Its dual form is:
min↵
1
2↵
T
Q↵+X
i:↵i
>0
↵
i
log↵i
+X
i:↵i
<C
(C � ↵
i
) log(C � ↵
i
)�lX
i=1
C logC
subject to 0 ↵
i
C, i = 1, . . . , l.
(4)
A.1
Fan, Chang, Hsieh, Wang and Lin
A.3 L1-regularized L2-loss Support Vector Classification
L1 regularization generates a sparse solution w. L1-regularized L2-loss SVC solves thefollowing primal problem:
minw
kwk1 + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
))2. (5)
where k · k1 denotes the 1-norm.
A.4 L1-regularized Logistic Regression
L1-regularized LR solves the following unconstrained optimization problem:
minw
kwk1 + C
lX
i=1
log(1 + e
�y
i
w
T
x
i). (6)
where k · k1 denotes the 1-norm.
A.5 L2-regularized L1- and L2-loss Support Vector Regression
Support vector regression (SVR) considers a problem similar to (1), but y
i
is a real valueinstead of +1 or �1. L2-regularized SVR solves the following primal problems:
minw
1
2w
T
w +
(C
Pl
i=1(max(0, |yi
�w
T
x
i
|� ✏)) if using L1 loss,
C
Pl
i=1(max(0, |yi
�w
T
x
i
|� ✏))2 if using L2 loss,
where ✏ � 0 is a parameter to specify the sensitiveness of the loss.Their dual forms are:
min↵
+,↵
�
1
2
⇥↵
+↵
�⇤Q̄ �Q
�Q Q̄
� ↵
+
↵
�
�� y
T (↵+ �↵
�) + ✏e
T (↵+ +↵
�)
subject to 0 ↵
+i
,↵
�i
U, i = 1, . . . , l,
(7)
where e is the vector of all ones, Q̄ = Q+D, Q 2 R
l⇥l is a matrix with Q
ij
⌘ x
T
i
x
j
, D isa diagonal matrix,
D
ii
=
⇢012C
, and U =
(C if using L1-loss SVR,
1 if using L2-loss SVR.
Rather than (7), in LIBLINEAR, we consider the following problem.
min�
1
2�
T
Q̄� � y
T
� + ✏k�k1
subject to � U �
i
U, i = 1, . . . , l,(8)
where � 2 R
l and k · k1 denotes the 1-norm. It can be shown that an optimal solution of(8) leads to the following optimal solution of (7).
↵
+i
⌘ max(�i
, 0) and ↵
�i
⌘ max(��
i
, 0).
A.2
Fan, Chang, Hsieh, Wang and Lin
A.3 L1-regularized L2-loss Support Vector Classification
L1 regularization generates a sparse solution w. L1-regularized L2-loss SVC solves thefollowing primal problem:
minw
kwk1 + C
lX
i=1
(max(0, 1� y
i
w
T
x
i
))2. (5)
where k · k1 denotes the 1-norm.
A.4 L1-regularized Logistic Regression
L1-regularized LR solves the following unconstrained optimization problem:
minw
kwk1 + C
lX
i=1
log(1 + e
�y
i
w
T
x
i). (6)
where k · k1 denotes the 1-norm.
A.5 L2-regularized L1- and L2-loss Support Vector Regression
Support vector regression (SVR) considers a problem similar to (1), but y
i
is a real valueinstead of +1 or �1. L2-regularized SVR solves the following primal problems:
minw
1
2w
T
w +
(C
Pl
i=1(max(0, |yi
�w
T
x
i
|� ✏)) if using L1 loss,
C
Pl
i=1(max(0, |yi
�w
T
x
i
|� ✏))2 if using L2 loss,
where ✏ � 0 is a parameter to specify the sensitiveness of the loss.Their dual forms are:
min↵
+,↵
�
1
2
⇥↵
+↵
�⇤Q̄ �Q
�Q Q̄
� ↵
+
↵
�
�� y
T (↵+ �↵
�) + ✏e
T (↵+ +↵
�)
subject to 0 ↵
+i
,↵
�i
U, i = 1, . . . , l,
(7)
where e is the vector of all ones, Q̄ = Q+D, Q 2 R
l⇥l is a matrix with Q
ij
⌘ x
T
i
x
j
, D isa diagonal matrix,
D
ii
=
⇢012C
, and U =
(C if using L1-loss SVR,
1 if using L2-loss SVR.
Rather than (7), in LIBLINEAR, we consider the following problem.
min�
1
2�
T
Q̄� � y
T
� + ✏k�k1
subject to � U �
i
U, i = 1, . . . , l,(8)
where � 2 R
l and k · k1 denotes the 1-norm. It can be shown that an optimal solution of(8) leads to the following optimal solution of (7).
↵
+i
⌘ max(�i
, 0) and ↵
�i
⌘ max(��
i
, 0).
A.2