Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced...
Transcript of Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced...
![Page 1: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/1.jpg)
StructuredPerceptrons &StructuralSVMs
CS159:AdvancedTopicsinMachineLearning
1
4/6/2017
![Page 2: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/2.jpg)
Recall:SequencePrediction• Input:x=(x1,…,xM)• Predict:y=(y1,…,yM)
– Eachyi oneofLlabels.
• x=“FishSleep”• y=(N,V)
• x=“TheDogAteMyHomework”• y=(D,N,V,D,N)
• x=“TheFoxJumpedOverTheFence”• y=(D,N,V,P,D,N)
2
POSTags:Det,Noun,Verb,Adj,Adv,Prep
L=6
![Page 3: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/3.jpg)
Recall:1st OrderHMM
• x=(x1,x2,x4,x4,…,xM)(sequenceofwords)• y=(y1,y2,y3,y4,…,yM)(sequenceofPOStags)
• P(xj|yj)Probabilityofstateyj generatingxj
• P(yj+1|yj)Probabilityofstateyj transitioningtoyj+1
• P(y1|y0)y0 isdefinedtobetheStartstate• P(End|yM)PriorprobabilityofyMbeingthefinalstate
– Notalwaysused
3
![Page 4: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/4.jpg)
HMMGraphicalModelRepresentation
4
Y1
X1
Y2
X2
YM
XM
…
…
P x, y( ) = P(End | yM ) P(yi | yi−1)i=1
M
∏ P(xi | yi )i=1
M
∏
Optional
Y0 YEnd
![Page 5: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/5.jpg)
MostCommonPredictionProblem
• Giveninputsentence,predictPOSTagseq.
• SolveusingViterbi– Specialcaseofmaxproductalgorithm
5
ℎ 𝑥 = argmax)𝑃 𝑦 𝑥 = argmax log 𝑃(𝑦|𝑥)
log 𝑃 𝑦 𝑥 =2 log𝑃 𝑦3 𝑥3 + log 𝑃(𝑥3|𝑥356)�
8
![Page 6: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/6.jpg)
• x=“FishSleep”• y=(N,V)
6
LogP P(N|*) P(V|*)
P(*|N) -2 1
P(*|V) 2 -2
P(*|Start) 1 -1
LogP P(*|N) P(*|V)
P(Fish|*) 2 1
P(Sleep|*) 1 0
𝐹 y, x ≡ log 𝑃 𝑦 𝑥 =2 log𝑃 𝑦3 𝑥3 + log 𝑃(𝑥3|𝑥356)�
8
SimpleExample
LogP(yj|xj) LogP(xj|xj-1)
![Page 7: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/7.jpg)
NewNotation
• “UnaryFeatures”• “PairwiseTransitionFeatures”
7
F(y, x) ≡ wTϕ j (y j, y j−1 | x)#$ %&j=1
M
∑
ϕ j (a,b | x) =ϕ1
j (a | x)ϕ2 (a,b)
!
"
##
$
%
&&
ϕ1j (a | x) =
1a=Noun( )∧ x j='Fish '( )"
#$%
1a=Noun( )∧ x j='Sleep '( )"
#$%
1a=Verb( )∧ x j='Fish '( )"
#$%
1a=Verb( )∧ x j='Sleep '( )"
#$%
"
#
&&&&&&&&
$
%
''''''''w =
w1w2
!
"##
$
%&&
ϕ2 (a,b) =
1 a=Noun( )∧ b=Start( )"# $%
1 a=Noun( )∧ b=Noun( )"# $%
1 a=Noun( )∧ b=Verb( )"# $%
1 a=Verb( )∧ b=Start( )"# $%
1 a=Verb( )∧ b=Noun( )"# $%
1 a=Verb( )∧ b=Verb( )"# $%
"
#
&&&&&&&&&&
$
%
''''''''''
![Page 8: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/8.jpg)
8
ϕ1j (a | x) =
1a=Noun( )∧ x j='Fish '( )"
#$%
1a=Noun( )∧ x j='Sleep '( )"
#$%
1a=Verb( )∧ x j='Fish '( )"
#$%
1a=Verb( )∧ x j='Sleep '( )"
#$%
"
#
&&&&&&&&
$
%
''''''''
ϕ11(Noun | "Fish Sleep") =
1000
!
"
####
$
%
&&&&
ϕ12 (Verb | "Fish Sleep") =
0001
!
"
####
$
%
&&&&
ϕ12 (Noun | "Fish Sleep") =
0100
!
"
####
$
%
&&&&
ϕ11(Verb | "Fish Sleep") =
0010
!
"
####
$
%
&&&&
NewNotationDuplicatewordfeaturesforeachlabel.
NounClassFeatures
VerbClassFeatures
ϕ1j (a | x) =
1 a=1[ ]φ1(xj )
!1 a=L[ ]φ1(x
j )
!
"
####
$
%
&&&&
![Page 9: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/9.jpg)
9
ϕ1j (a | x) =
1a=Noun( )∧ x j='Fish '( )"
#$%
1a=Noun( )∧ x j='Sleep '( )"
#$%
1a=Verb( )∧ x j='Fish '( )"
#$%
1a=Verb( )∧ x j='Sleep '( )"
#$%
"
#
&&&&&&&&
$
%
''''''''
ϕ2 (Noun,Start) =
100000
!
"
#######
$
%
&&&&&&&
ϕ2 (Verb,Start) =
000100
!
"
#######
$
%
&&&&&&&
ϕ2 (Verb,Noun) =
000010
!
"
#######
$
%
&&&&&&&
ϕ2 (a,b) =
1 a=Noun( )∧ b=Start( )"# $%
1 a=Noun( )∧ b=Noun( )"# $%
1 a=Noun( )∧ b=Verb( )"# $%
1 a=Verb( )∧ b=Start( )"# $%
1 a=Verb( )∧ b=Noun( )"# $%
1 a=Verb( )∧ b=Verb( )"# $%
"
#
&&&&&&&&&&
$
%
''''''''''
NewNotationOnefeatureforeverytransition.
![Page 10: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/10.jpg)
10
F(y, x) ≡ wTϕ j (y j, y j−1 | x)#$ %&j=1
M
∑ ϕ j (a,b | x) =ϕ1
j (a | x)
ϕ2j (a,b)
!
"
##
$
%
&&
w =w1w2
!
"##
$
%&&
w1 =
2110
!
"
####
$
%
&&&&
w2 =
1−22−11−2
"
#
$$$$$$$
%
&
'''''''
ϕ1j (a | x) =
1a=Noun( )∧ x j='Fish '( )"
#$%
1a=Noun( )∧ x j='Sleep '( )"
#$%
1a=Verb( )∧ x j='Fish '( )"
#$%
1a=Verb( )∧ x j='Sleep '( )"
#$%
"
#
&&&&&&&&
$
%
''''''''
P(N|*) P(V|*)P(*|N) -2 1P(*|V) 2 -2P(*|Start) 1 -1
P(*|N) P(*|V)P(Fish|*) 2 1P(Sleep|*) 1 0
OldNotation:
OldNotation:
ϕ2 (a,b) =
1 a=Noun( )∧ b=Start( )"# $%
1 a=Noun( )∧ b=Noun( )"# $%
1 a=Noun( )∧ b=Verb( )"# $%
1 a=Verb( )∧ b=Start( )"# $%
1 a=Verb( )∧ b=Noun( )"# $%
1 a=Verb( )∧ b=Verb( )"# $%
"
#
&&&&&&&&&&
$
%
''''''''''
![Page 11: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/11.jpg)
• Input:x=(x1,…,xM)• Predict:y=(y1,…,yM)
– Eachyi oneofLlabels.
• LinearModelw.r.t.pairwisefeaturesφj(a,b|x):
• PredictionviamaximizingF:
Recap:1st OrderSequentialModel
11
POSTags:Det,Noun,Verb,Adj,Adv,Prep
L=6
ℎ 𝑥 = argmax)𝐹 𝑦, 𝑥 = argmax)𝑤>Ψ(𝑦, 𝑥)
EncodesStructure
![Page 12: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/12.jpg)
F(y = (N,V ), x = "Fish Sleep") = w1Tϕ1
1(N, x)+w2Tϕ2 (N,Start)+w1
Tϕ12 (V, x)+w2
Tϕ2 (V,N ) = w1,1 +w2,1 +w1,4 +w2,5 = 2+1+ 0+1= 4
12
y F(y,x)
(N,N) 2+1+1-2=2
(N,V) 2+1+0+1=4
(V,N) 1-1+1+2=3
(V,V) 1-1+0-2=-2
w1 =
2110
!
"
####
$
%
&&&&
w2 =
1−22−11−2
"
#
$$$$$$$
%
&
'''''''
ϕ1j (a | x) =
1a=Noun( )∧ x j='Fish '( )"
#$%
1a=Noun( )∧ x j='Sleep '( )"
#$%
1a=Verb( )∧ x j='Fish '( )"
#$%
1a=Verb( )∧ x j='Sleep '( )"
#$%
"
#
&&&&&&&&
$
%
''''''''
ϕ2j (a,b) =
1 a=Noun( )∧ b=Start( )"# $%
1 a=Noun( )∧ b=Noun( )"# $%
1 a=Noun( )∧ b=Verb( )"# $%
1 a=Verb( )∧ b=Start( )"# $%
1 a=Verb( )∧ b=Noun( )"# $%
1 a=Verb( )∧ b=Verb( )"# $%
"
#
&&&&&&&&&&
$
%
''''''''''
x=“FishSleep”y=(N,V)
argmaxy
F(y, x)Prediction:
![Page 13: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/13.jpg)
WhyNewNotation?
• Easiertoreasonabout:– Computingpredictions– Learning(linearmodel!)– Extensions(justgeneralizeφ)
13
ϕ j (a,b | x) =ϕ1
j (a | x)ϕ2 (a,b)
!
"
##
$
%
&&
ϕ1j (a | x) =
1a=Noun( )∧ x j='Fish '( )"
#$%
1a=Noun( )∧ x j='Sleep '( )"
#$%
1a=Verb( )∧ x j='Fish '( )"
#$%
1a=Verb( )∧ x j='Sleep '( )"
#$%
"
#
&&&&&&&&
$
%
''''''''
w =w1w2
!
"##
$
%&&
ϕ2 (a,b) =
1 a=Noun( )∧ b=Start( )"# $%
1 a=Noun( )∧ b=Noun( )"# $%
1 a=Noun( )∧ b=Verb( )"# $%
1 a=Verb( )∧ b=Start( )"# $%
1 a=Verb( )∧ b=Noun( )"# $%
1 a=Verb( )∧ b=Verb( )"# $%
"
#
&&&&&&&&&&
$
%
''''''''''
![Page 14: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/14.jpg)
GeneralizesMulticlass
• Stackweightvectorsforeachclass:
14
𝑤 =
𝑤6𝑤6⋮𝑤A
Ψ 𝑦, 𝑥 =
1 )C6 𝑥1 )CD 𝑥
⋮1 )CA 𝑥
𝐹(𝑦, 𝑥) ≡ 𝑤>Ψ(𝑦, 𝑥)
ℎ 𝑥 = argmax)𝑤>Ψ 𝑦, 𝑥 = argmax)𝑤)>𝑥
![Page 15: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/15.jpg)
LearningforStructuredPrediction
15
![Page 16: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/16.jpg)
PerceptronLearningAlgorithm(LinearClassificationModel)
• w1 =0• Fort=1….
– Receiveexample(x,y)– Ifh(x|wt)=y
• wt+1 =wt
– Else• wt+1=wt +yx
16
S = (xi, yi ){ }i=1N
y ∈ +1,−1{ }
TrainingSet:
Gothroughtrainingsetinarbitraryorder(e.g.,randomly)
ℎ 𝑥 = sign(𝑤>𝑥)
![Page 17: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/17.jpg)
StructuredPerceptron(LinearClassificationModel)
• w1 =0• Fort=1….
– Receiveexample(x,y)– Ifh(x|wt)=y
• wt+1 =wt
– Else• wt+1=wt +Ψ(y,x)
17
TrainingSet:
Gothroughtrainingsetinarbitraryorder(e.g.,randomly)
ℎ 𝑥 = argmax)H𝑤>Ψ(𝑦′, 𝑥)
𝑆 = (𝑥8, 𝑦8)𝑦8 structured
Onlythingthatchanges!
![Page 18: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/18.jpg)
StructuredPerceptron
18
DiscriminativeTrainingMethodsforHiddenMarkovModels:TheoryandExperimentswithPerceptronAlgorithmsMichaelCollins,EMNLP2002http://www.cs.columbia.edu/~mcollins/papers/tagperc.pdf
NP Chunking ResultsMethod F-Measure Numits
Perc, avg, cc=0 93.53 13Perc, noavg, cc=0 93.04 35Perc, avg, cc=5 93.33 9Perc, noavg, cc=5 91.88 39ME, cc=0 92.34 900ME, cc=5 92.65 200
POS Tagging Results
Method Error rate/% Numits
Perc, avg, cc=0 2.93 10Perc, noavg, cc=0 3.68 20Perc, avg, cc=5 3.03 6Perc, noavg, cc=5 4.04 17ME, cc=0 3.4 100ME, cc=5 3.28 200
Figure 4: Results for various methods on the part-of-speech tagging and chunking tasks on development data.All scores are error percentages. Numits is the numberof training iterations at which the best score is achieved.Perc is the perceptron algorithm, ME is the maximumentropy method. Avg/noavg is the perceptron with orwithout averaged parameter vectors. cc=5 means onlyfeatures occurring 5 times or more in training are in-cluded, cc=0 means all features in training are included.
4.3 Results
We applied both maximum-entropy models andthe perceptron algorithm to the two taggingproblems. We tested several variants for eachalgorithm on the development set, to gain someunderstanding of how the algorithms’ perfor-mance varied with various parameter settings,and to allow optimization of free parameters sothat the comparison on the final test set is a fairone. For both methods, we tried the algorithmswith feature count cut-o↵s set at 0 and 5 (i.e.,we ran experiments with all features in trainingdata included, or with all features occurring 5times or more included – (Ratnaparkhi 96) usesa count cut-o↵ of 5). In the perceptron algo-rithm, the number of iterations T over the train-ing set was varied, and the method was testedwith both averaged and unaveraged parametervectors (i.e., with ↵
T,n
s
and �
T,n
s
, as defined insection 2.5, for a variety of values for T ). Inthe maximum entropy model the number of it-erations of training using Generalized IterativeScaling was varied.
Figure 4 shows results on development dataon the two tasks. The trends are fairly clear:averaging improves results significantly for the
perceptron method, as does including all fea-tures rather than imposing a count cut-o↵ of 5.In contrast, the ME models’ performance su↵erswhen all features are included. The best percep-tron configuration gives improvements over themaximum-entropy models in both cases: an im-provement in F-measure from 92.65% to 93.53%in chunking, and a reduction from 3.28% to2.93% error rate in POS tagging. In lookingat the results for di↵erent numbers of iterationson development data we found that averagingnot only improves the best result, but also givesmuch greater stability of the tagger (the non-averaged variant has much greater variance inits scores).
As a final test, the perceptron and ME tag-gers were applied to the test sets, with the op-timal parameter settings on development data.On POS tagging the perceptron algorithm gave2.89% error compared to 3.28% error for themaximum-entropy model (a 11.9% relative re-duction in error). In NP chunking the percep-tron algorithm achieves an F-measure of 93.63%,in contrast to an F-measure of 93.29% for theME model (a 5.1% relative reduction in error).5 Proofs of the TheoremsThis section gives proofs of theorems 1 and 2.The proofs are adapted from proofs for the clas-sification case in (Freund & Schapire 99).
Proof of Theorem 1: Let ↵̄
k be the weightsbefore the k’th mistake is made. It follows that↵̄
1 = 0. Suppose the k’th mistake is made atthe i’th example. Take z to the output proposedat this example, z = argmax
y2GEN(xi
) �(xi
, y) ·↵̄
k. It follows from the algorithm updates that↵̄
k+1 = ↵̄
k + �(xi
, y
i
)� �(xi
, z). We take innerproducts of both sides with the vector U:U · ↵̄
k+1 = U · ↵̄
k + U · �(xi
, y
i
)�U · �(xi
, z)� U · ↵̄
k + �
where the inequality follows because of the prop-erty of U assumed in Eq. 3. Because ↵̄
1 = 0,and therefore U · ↵̄
1 = 0, it follows by induc-tion on k that for all k, U · ↵̄
k+1 � k�. Be-cause U · ↵̄
k+1 ||U|| ||↵̄k+1||, it follows that||↵̄k+1|| � k�.
We also derive an upper bound for ||↵̄k+1||2:||↵̄k+1||2 = ||↵̄k||2 + ||�(x
i
, y
i
)� �(xi
, z)||2
+2↵̄
k · (�(xi
, y
i
)� �(xi
, z))
||↵̄k||2 + R
2
![Page 19: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/19.jpg)
LimitationsofPerceptron
• Notallmistakesarecreatedequal– OnePOStagwrongasbadasfive!– Evenworseformorecomplicatedproblems
19
![Page 20: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/20.jpg)
Comparison
20
LARGE MARGIN METHODS FOR STRUCTURED AND INTERDEPENDENT OUTPUT VARIABLES
Method HMM CRF Perceptron SVMError 9.36 5.17 5.94 5.08
Table 2: Results of various algorithms on the named entity recognition task.
Method Train Err Test Err Const Avg LossSVM2 0.2±0.1 5.1±0.6 2824±106 1.02±0.01SVM△s2 0.4±0.4 5.1±0.8 2626±225 1.10±0.08SVM△m2 0.3±0.2 5.1±0.7 2628±119 1.17±0.12
Table 3: Results for various SVM formulations on the named entity recognition task (ε = 0.01,C = 1).
The label set in this corpus consists of non-name and the beginning and continuation of personnames, organizations, locations and miscellaneous names, resulting in a total of |Σ| = 9 differentlabels. In the setup followed in Altun et al. (2003), the joint feature map Ψ(x,y) is the histogramof state transition plus a set of features describing the emissions. An adapted version of the Viterbialgorithm is used to solve the argmax in line 6. For both perceptron and SVM a second degreepolynomial kernel was used.
The results given in Table 2 for the zero-one loss, compare the generative HMM with condi-tional random fields (CRF) (Lafferty et al., 2001), Collins’ perceptron and the SVM algorithm. Alldiscriminative learning methods substantially outperform the standard HMM. In addition, the SVMperforms slightly better than the perceptron and CRFs, demonstrating the benefit of a large marginapproach. Table 3 shows that all SVM formulations perform comparably, attributed to the fact thevast majority of the support label sequences end up having Hamming distance 1 to the correct labelsequence. Notice that for 0-1 loss functions all three SVM formulations are equivalent.
5.3 Sequence Alignment
To analyze the behavior of the algorithm for sequence alignment, we constructed a synthetic datasetaccording to the following sequence and local alignment model. The native sequence and the decoysare generated by drawing randomly from a 20 letter alphabet Σ= {1, ..,20} so that letter c ∈ Σ hasprobability c/210. Each sequence has length 50, and there are 10 decoys per native sequence. Togenerate the homologous sequence, we generate an alignment string of length 30 consisting of 4characters “match”, “substitute”, “insert” , “delete”. For simplicity of illustration, substitutionsare always c→ (c mod 20)+1. In the following experiments, matches occur with probability 0.2,substitutions with 0.4, insertion with 0.2, deletion with 0.2. The homologous sequence is createdby applying the alignment string to a randomly selected substring of the native. The shortening ofthe sequences through insertions and deletions is padded by additional random characters.
We model this problem using local sequence alignment with the Smith-Waterman algorithm.Table 4 shows the test error rates (i.e. the percentage of times a decoy is selected instead of thehomologous sequence) depending on the number of training examples. The results are averagedover 10 train/test samples. The model contains 400 parameters in the substitution matrix Π and acost δ for “insert/delete”. We train this model using the SVM2 and compare against a generative
1479
LargeMarginMethodsforStructuredandInterdependentOutputVariablesIonnis Tsochantaridis,ThorstenJoachims,ThomasHofmann,Yasemin AltunJournalofMachineLearningResearch,Volume6,Pages1453-1484
![Page 21: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/21.jpg)
HammingLoss
• HammingLoss:
• Truey=(D,N,V,D,N)– y’=(D,N,V,N,N)– y”=(V,D,N,V,V)
(Butnotcontinuous!)
21
ℓ y, x,F( ) = 1h(x ) j≠y j"#
$%
j=1
M
∑
y”hasmuchworsehammingloss(lossof5vs lossof1)
NeedtodefinecontinuoussurrogateofHingeLoss!
![Page 22: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/22.jpg)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 30
0.5
1
1.5
2
2.5
3
f(x)
Loss
0/1LossTargety
HingeLossargmin
w,b,ξ
12wTw+ C
Nξi
i∑
∀i : yi wT xi − b( ) ≥1−ξi
∀i :ξi ≥ 0
OriginalHingeLoss(SupportVectorMachine)
ℓ(yi, f (xi )) =max(0,1− yi f (xi )) = ξi
22
![Page 23: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/23.jpg)
PropertyofHingeLoss
23
argminw,b,ξ
12wTw+ C
Nξi
i∑
∀i : yi wT xi − b( ) ≥1−ξi
∀i :ξi ≥ 0
ℓ(yi, f (xi )) =max(0,1− yi f (xi )) = ξi
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 30
0.5
1
1.5
2
2.5
3
Hingeloss=continuousupperboundon0/1loss
h(x) = argmaxy∈ −1,+1{ }
yf (x) = sign( f (x)) ξi ≥1 h(xi )≠yi[ ]è
![Page 24: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/24.jpg)
HammingHingeLoss(StructuralSVM)
24
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
ℓ(yi, xi,F) =maxy '
1y ' j≠yi
j"#
$%
j∑ − F(yi, xi )−F(y ', xi )( )()*
+*
,-*
.*= ξi
ContinuousupperboundonHammingLoss!
SometimesnormalizebyM
h(x) = argmaxy
F(y, x)
ξi ≥ 1h(xi )
j≠yij#
$%&
j∑
è F(yi, xi )−F(h(xi ), xi ) ≤ 0
èLearnedPredictor
![Page 25: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/25.jpg)
Supposeforincorrecty’=h(xi):Then:
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥1Mi
1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
HammingHingeLoss
ξi = 0.6 ≥ 0.5=1Mi
1h(xi )
j≠yij#
$%&
j∑
1.5$ 1.6$
0.5$
0$
0.5$
1$
1.5$
Score(y)$ Score(y')$ Loss(y')$
Score(yi) Score(y’) Loss(y’) Slack
25
SlackvariableupperboundsHammingLoss!
![Page 26: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/26.jpg)
StructuralSVM
26
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
SometimesnormalizebyM
y ' = argmaxy
F(y, x)
ξi ≥ 1y ' j≠yi
j#$
%&
j∑
è F(yi, xi )−F(y ', xi ) ≤ 0
PredictionofLearnedModel
Consider:
y ' = yi
y ' ≠ yi è
è ξi ≥ 0
SlackiscontinuousupperboundonHammingLoss!
“Slack”
![Page 27: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/27.jpg)
ReductiontoIndependentMulticlass
27
F(y, x) ≡ wTϕ j (y j | x)"# $%j=1
M
∑
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
Suppose:
Nopairwisefeatures.
ϕ j (y j | x) =
1y j=1!"
#$φ1(x
j )
!1y j=L!"
#$φ1(x
j )
!
"
%%%%%
#
$
&&&&&
Stackfeaturesϕ1(xj)Ltimes
∀i, j,a :wyijT φ1(x
j )−waTφ1(x
j ) ≥1−ξijDecomposeconstraintstomulticlasshingelosspertoken!
![Page 28: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/28.jpg)
28
xi =“FishSleep”yi =(N,V)
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 2 2 1(N,V) 4 0 0(V,N) 1 3 2(V,V) 1 3 1
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
Example1
ξi = 0
![Page 29: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/29.jpg)
29
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 4 -1 1(N,V) 3 0 0(V,N) 0 3 2(V,V) 1 2 1
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
Example2
xi =“FishSleep”yi =(N,V)
ξi = 2
![Page 30: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/30.jpg)
30
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 2 2 1(N,V) 4 0 0(V,N) 3 1 2(V,V) 1 3 1
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
Example3
xi =“FishSleep”yi =(N,V)
ξi =1
![Page 31: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/31.jpg)
WhenisSlackPositive?
• Whenevermarginnotbigenough!
31
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
ξi =maxy '
1y ' j≠yi
j"#
$%
j∑ − F(yi, xi )−F(y ', xi )( )()*
+*
,-*
.*= ℓ(yi, xi,F)
Verifythatabovedefinition≥0
ξi > 0 ∃y ' : F(yi, xi )−F(y ', xi )< 1y ' j≠yi
j$%
&'
j∑çè
![Page 32: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/32.jpg)
32
yi
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
HighDimensionalPointw
StructuralSVMGeometricInterpretation
y’y’
≤0èξi ≥ 1y ' j≠yi
j#$
%&
j∑
SizeofMarginvs
SizeofMarginViolations(Ccontrolstrade-off)
(MarginscaledbyHammingLoss)
𝐹 𝑦, 𝑥 = 𝑤>Ψ(𝑦, 𝑥)
![Page 33: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/33.jpg)
StructuralSVMTraining
• Strictlyconvexoptimizationproblem– SameformasstandardSVMoptimization– Easyright?
• Intractable#ofconstraints!
33
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
OftenExponentiallyMany!
![Page 34: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/34.jpg)
StructuralSVMTraining
• Thetrickistonotenumerateallconstraints.
• OnlysolvetheSVMobjectiveoverasmallsubsetofconstraints(workingset).– Efficient!
• Butsomeconstraintsmightbeviolated.
∀y ' : F(yi, xi ) ≥ F(y ', xi )+ 1y ' j≠yi
j$%
&'
j∑ −ξi
34
![Page 35: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/35.jpg)
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 2 2 1(N,V) 4 0 0
35
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 2 2 1(N,V) 4 0 0(V,N) 3 1 2(V,V) 1 3 1
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi ∀i :ξi ≥ 0
Example
xi =“FishSleep”yi =(N,V)
ξi = 0
![Page 36: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/36.jpg)
ApproximateHingeLoss• Choosetolerateε>0:
36
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi −ε ∀i :ξi ≥ 0
y ' = argmaxy
F(y, x)
ξi ≥ 1y ' j≠yi
j#$
%&
j∑ −ε
è F(yi, xi )−F(y ', xi ) ≤ 0
PredictionofLearnedModel
Consider:
y ' = yi
y ' ≠ yi è
è ξi ≥ 0
SlackiscontinuousupperboundonHammingLoss- ε!
![Page 37: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/37.jpg)
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 2 2 1(N,V) 4 0 0
37
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss(N,N) 2 2 1(N,V) 4 0 0(V,N) 3 1 2(V,V) 1 3 1
Example
xi =“FishSleep”yi =(N,V)
ξi = 0ε =1
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi −ε ∀i :ξi ≥ 0
![Page 38: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/38.jpg)
StructuralSVMTraining• STEP0:Specifytoleranceε
• STEP1:SolveSVMobjectivefunctionusingonlyworkingsetofconstraintsW(initiallyempty).Thetrainedmodelisw.
• STEP2:Usingw,findthey’whoseconstraintismostviolated.
• STEP3:Ifconstraintisviolatedbymorethanε,addittoW.
• RepeatSTEP1-3untilnoadditionalconstraintsareadded.ReturnmostrecentmodelwtrainedinSTEP1.
*This is known as a “cutting plane” method. 38
1Mi
1y ' j≠yi
j"#
$%
j∑ +ξi
'
())
*
+,,− F(yi, xi )−F(y ', xi )( ) ≥ ε
ConstraintViolationFormula:
![Page 39: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/39.jpg)
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y '∈Wi :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j'(
)*
j∑ −ξi −ε ∀i :ξi ≥ 0
39
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss Viol.(N,N) 0 0 1 1(N,V) 0 0 0 0(V,N) 0 0 2 2(V,V) 0 0 1 1
Example
xi =“FishSleep”yi =(N,V)
ξi = 0Wi =∅Init: Solve:
Chooseε=0.1
Loss– Slack– (F(y,x)-F(y’,x))=ViolConstraintViolation:
![Page 40: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/40.jpg)
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y '∈Wi :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j'(
)*
j∑ −ξi −ε ∀i :ξi ≥ 0
40
Example
xi =“FishSleep”yi =(N,V)
Wi = (V,N ){ }Update: ξi = 0Solve:
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss Viol.(N,N) 0 0 1 1(N,V) 0 0 0 0(V,N) 0 0 2 2(V,V) 0 0 1 1
Chooseε=0.1
Loss– Slack– (F(y,x)-F(y’,x))=ViolConstraintViolation:
![Page 41: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/41.jpg)
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y '∈Wi :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j'(
)*
j∑ −ξi −ε ∀i :ξi ≥ 0
41
Example
xi =“FishSleep”yi =(N,V)
Wi = (V,N ){ }Update: ξi = 0.5Solve:
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss Viol.(N,N) 0.7 0.2 1 0.2(N,V) 0.9 0 0 0(V,N) -0.6 1.5 2 0(V,V) 0 0.9 1 0.4
Chooseε=0.1
Loss– Slack– (F(y,x)-F(y’,x))=ViolConstraintViolation:
![Page 42: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/42.jpg)
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y '∈Wi :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j'(
)*
j∑ −ξi −ε ∀i :ξi ≥ 0
42
Example
xi =“FishSleep”yi =(N,V)
Wi = (V,N ), (N,N ){ }Update: ξi = 0.5Solve:
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss Viol.(N,N) 0.7 0.2 1 0.2(N,V) 0.9 0 0 0(V,N) -0.6 1.5 2 0(V,V) 0 0.9 1 0.4
Chooseε=0.1
Loss– Slack– (F(y,x)-F(y’,x))=ViolConstraintViolation:
![Page 43: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/43.jpg)
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y '∈Wi :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j'(
)*
j∑ −ξi −ε ∀i :ξi ≥ 0
43
Example
xi =“FishSleep”yi =(N,V)
Update: ξi = 0.55Solve:Wi = (V,N ), (N,N ){ }
y’ F(y’,xi) F(yi,xi)– F(y’,xi) Loss Viol.(N,N) 0.55 0.45 1 0(N,V) 1 0 0 0(V,N) -0.65 1.65 2 0(V,V) -0.05 0.95 1 0.05
Chooseε=0.1
Loss– Slack– (F(y,x)-F(y’,x))=ViolConstraintViolation:
Noconstraintisviolatedbymorethanε
![Page 44: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/44.jpg)
GeometricExample
NaïveSVMProblem• Exponentialconstraints• Mostaredominatedbyasmallsetof“important” constraints
StructuralSVMApproach• Repeatedlyfindsthenextmost
violatedconstraint…• …untilsetofconstraintsisagood
approximation.
*Thisisknownasa“cuttingplane”method. 44
![Page 45: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/45.jpg)
GeometricExample
NaïveSVMProblem• Exponentialconstraints• Mostaredominatedbyasmallsetof“important” constraints
StructuralSVMApproach• Repeatedlyfindsthenextmost
violatedconstraint…• …untilsetofconstraintsisagood
approximation.
*Thisisknownasa“cuttingplane”method. 45
![Page 46: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/46.jpg)
GeometricExample
NaïveSVMProblem• Exponentialconstraints• Mostaredominatedbyasmallsetof“important” constraints
StructuralSVMApproach• Repeatedlyfindsthenextmost
violatedconstraint…• …untilsetofconstraintsisagood
approximation.
*Thisisknownasa“cuttingplane”method. 46
![Page 47: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/47.jpg)
GeometricExample
NaïveSVMProblem• Exponentialconstraints• Mostaredominatedbyasmallsetof“important” constraints
StructuralSVMApproach• Repeatedlyfindsthenextmost
violatedconstraint…• …untilsetofconstraintsisagood
approximation.
*Thisisknownasa“cuttingplane”method. 47
![Page 48: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/48.jpg)
• Guaranteeforanyε>0:
• Terminatesafter#iterations:
argminw,ξ
12wTw+ C
Nξi
i∑
∀i, y ' :F(yi, xi )−F(y ', xi ) ≥ 1y ' j≠yi
j&'
()
j∑ −ξi −ε
∀i :ξi ≥ 0
LinearConvergenceRate
48
O 1ε
!
"#$
%&
Prooffoundin:http://www.cs.cornell.edu/people/tj/publications/joachims_etal_09a.pdf
![Page 49: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/49.jpg)
FindingMostViolatedConstraint
• Aconstraintisviolatedwhen:
• Findingmostviolatedconstraintreducesto
• Highlyrelatedtoprediction:
argmaxy '
F(y ', xi )+ 1y ' j≠yi
j"#
$%
j∑
“Lossaugmentedinference”
49
F(y ', xi )−F(yi, xi )+ 1y ' j≠yi
j#$
%&
j∑ −ξi > 0
argmaxy
F(y, xi )
![Page 50: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/50.jpg)
“Augmented”ScoringFunction
50
!F(y, xi, yi ) ≡ !wT !ϕ j (y j, y j−1 | xi, yi )#$ %&j=1
M
∑
!ϕ j (a,b | xi, yi ) =ϕ j (a,b | xi )1a≠yi
j"#
$%
"
#
&&&
$
%
'''
!w = w1
!
"#
$
%&
argmaxy '
!F(y ', xi, yi )Goal:
argmaxy '
F(y ', xi )+ 1y ' j≠yi
j"#
$%
j∑
F(y, xi ) ≡ wTϕ j (y j, y j−1 | xi )#$ %&j=1
M
∑
Goal:
AdditionalUnaryFeature!
SolveUsingViterbi!
![Page 51: Structured Perceptrons& Structural SVMs · Structured Perceptrons& Structural SVMs CS 159: Advanced Topics in Machine Learning 1 4/6/2017](https://reader034.fdocuments.in/reader034/viewer/2022042911/5f442a253888c6153236ab60/html5/thumbnails/51.jpg)
Structural SVM Recipe• Feature map: Ψ(𝑦, 𝑥)
• Inference: ℎ 𝑥 = argmax)𝐹 𝑦, 𝑥 ≡ 𝑤>Ψ(𝑦, 𝑥)
• Loss function: . Δ8 (𝑦)
• Loss-augmented:argmax)𝑤>Ψ 𝑦, 𝑥 +. Δ8 (𝑦)(most violated constraint)