NIPS2007: structured prediction
Transcript of NIPS2007: structured prediction
![Page 1: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/1.jpg)
Structured Prediction:A Large Margin Approach
Ben TaskarUniversity of Pennsylvania
![Page 2: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/2.jpg)
Acknowledgments
Drago AnguelovVassil ChatalbashevCarlos GuestrinMichael Jordan
Dan KleinDaphne KollerSimon Lacoste-JulienPaul Vernaza
![Page 3: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/3.jpg)
Structured Prediction Prediction of complex outputs
Structured outputs: multivariate, correlated, constrained
Novel, general way to solve many learning problems
![Page 4: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/4.jpg)
Handwriting Recognition
brace
Sequential structure
x y
![Page 5: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/5.jpg)
Object Segmentation
Spatial structure
x y
![Page 6: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/6.jpg)
Natural Language Parsing
The screen was a sea of red
Recursive structure
x y
![Page 7: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/7.jpg)
Bilingual Word Alignment
What is the anticipated cost of collecting fees under the new proposal?
En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?
x yWhat
is the
anticipated
costof
collecting fees
under the
new proposal
?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?Combinatorial structure
![Page 8: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/8.jpg)
Protein Structure and Disulfide Bridges
Protein: 1IMT
AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRMHHTCPCAPNLACVQTSPKKFKCLSK
![Page 9: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/9.jpg)
Local Prediction
Classify using local information Ignores correlations & constraints!
br ea c
![Page 10: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/10.jpg)
Local Predictionbuildingtreeshrubground
![Page 11: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/11.jpg)
Structured Prediction
Use local information Exploit correlations
br ea c
![Page 12: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/12.jpg)
Structured Predictionbuildingtreeshrubground
![Page 13: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/13.jpg)
Outline Structured prediction models
Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings
Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation
![Page 14: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/14.jpg)
Structured Models
Mild assumption:
linear combination
space of feasible outputs
scoring function
![Page 15: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/15.jpg)
Chain Markov Net (aka CRF*)
a-z
a-z
a-z
a-z
a-z
y
x
*Lafferty et al. 01
![Page 16: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/16.jpg)
Chain Markov Net (aka CRF*)
a-z
a-z
a-z
a-z
a-z
y
x
*Lafferty et al. 01
![Page 17: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/17.jpg)
Associative Markov Nets
Point featuresspin-images, point height
Edge featureslength of edge, edge orientation
yj
yk
jk
j
“associative” restriction
![Page 18: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/18.jpg)
CFG Parsing
#(NP DT NN)
…
#(PP IN NP)
…
#(NN ‘sea’)
![Page 19: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/19.jpg)
Bilingual Word Alignment
position orthography association
Whatis
theanticipate
dcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
k
![Page 20: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/20.jpg)
Disulfide Bonds: Non-bipartite Matching
1
2 3
4
6 5
RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6
6
1
2
4
5
3
Fariselli & Casadio `01, Baldi et al. ‘04
![Page 21: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/21.jpg)
Scoring Function
RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6
RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6
1
2 3
4
6 5
amino acid identities phys/chem properties
![Page 22: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/22.jpg)
Structured Models
Mild assumptions:
linear combination
sum of part scores
space of feasible outputs
scoring function
![Page 23: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/23.jpg)
Supervised Structured Prediction
Learning Prediction
Estimate w
Example:Weighted matching
Generally: Combinatorial
optimization
Data
Model:
Likelihood(can be intractable)
MarginLocal(ignores
structure)
![Page 24: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/24.jpg)
Local Estimation
Treat edges as independent decisions
Estimate w locally, use globally E.g., naïve Bayes, SVM, logistic regression Cf. [Matusov+al, 03] for matchings
Simple and cheap Not well-calibrated for matching model Ignores correlations & constraints
Data
Model:
![Page 25: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/25.jpg)
Conditional Likelihood Estimation
Estimate w jointly:
Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93]
Tractable model, intractable learning
Need tractable learning method margin-based estimation
Data
Model:
![Page 26: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/26.jpg)
Outline Structured prediction models
Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings
Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation
![Page 27: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/27.jpg)
We want:
Equivalently:
OCR Example
a lot!…
“brace”
“brace”
“aaaaa”
“brace” “aaaab”
“brace” “zzzzz”
![Page 28: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/28.jpg)
We want:
Equivalently:
‘It was red’
Parsing Example
a lot!
SA B
C D
SA BD F
SA B
C D
SE F
G H
SA B
C D
SA B
C D
SA B
C D
…
‘It was red’
‘It was red’
‘It was red’
‘It was red’
‘It was red’
‘It was red’
![Page 29: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/29.jpg)
We want:
Equivalently:
‘What is the’‘Quel est le’
Alignment Example
a lot!…
123
123
‘What is the’‘Quel est le’
123
123
‘What is the’‘Quel est le’
123
123
‘What is the’‘Quel est le’
123
123
123
123
123
123
123
123
‘What is the’‘Quel est le’
‘What is the’‘Quel est le’
‘What is the’‘Quel est le’
![Page 30: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/30.jpg)
Structured Loss
b c a r e b r o r e b r o c eb r a c e
2 2 10
123
123
123
123
123
123
123
123
‘What is the’‘Quel est le’
0 1 2 2S
A EC D
SB E
A C
SB D
A C
SA B
C D‘It was red’
0 1 2 3
![Page 31: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/31.jpg)
Large margin estimation Given training examples , we want:
Maximize margin
Mistake weighted margin:
# of mistakes in y
*Collins 02, Altun et al 03, Taskar 03
![Page 32: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/32.jpg)
Large margin estimation
Eliminate
Add slacks for inseparable case (hinge loss)
![Page 33: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/33.jpg)
Large margin estimation Brute force enumeration
Min-max formulation
‘Plug-in’ linear program for inference
![Page 34: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/34.jpg)
Min-max formulation
LP Inference
Structured loss (Hamming):
Inference
discrete optim.
Key step:
continuous optim.
![Page 35: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/35.jpg)
Simple iterative method
Unstable for structured output: fewer instances, big updates
May not converge if non-separable Noisy
Voted / averaged perceptron [Freund & Schapire 99, Collins 02]
Regularize / reduce variance by aggregating over iterations
Alternatives: Perceptron
![Page 36: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/36.jpg)
Add most violated constraint
Handles several more general loss functions Need to re-solve QP many times Theorem: Only polynomial # of constraints needed
to achieve -error [Tsochantaridis et al, 04]
Worst case # of constraints larger than factored
Alternatives: Constraint Generation
[Collins 02; Altun et al, 03]
![Page 37: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/37.jpg)
Outline Structured prediction models
Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings
Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation
![Page 38: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/38.jpg)
Matching Inference LP
Has integral solutions z(A is totally unimodular)
degree
Whatis
theanticipate
dcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
k
[Nemhauser+Wolsey 88]
Need Hamming-like loss
![Page 39: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/39.jpg)
y z Map for Markov Nets
0 1 . 0
0 0 . 0
. . . 0
0 0 0 0
1
0
:
0
0
1
:
0
1
0
:
0
0
1
:
0
0
1
:
0
a
b
:
z
0 0 . 0
1 0 . 0
. . . 0
0 0 0 0
0 1 . 0
0 0 . 0
. . . 0
0 0 0 0
0 0 . 0
0 1 . 0
. . . 0
0 0 0 0
a
b
:
z
a b . z a b . z a b . z a b . z
![Page 40: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/40.jpg)
Markov Net Inference LP
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0
0
1
0
0 1 0 0
Has integral solutions z for chains, (hyper)treesCan be fractional for untriangulated networks
normalization
agreement
[Chekuri+al 01, Wainright+al 02]
![Page 41: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/41.jpg)
Associative MN Inference LP
For K=2, solutions are always integral (optimal) For K>2, within factor of 2 of optimal (results for larger cliques)
“associative” restriction
0
1
0
0
0
1
0
0
0 1 0 0
[Greig+al 89, Boykov+al 99, Kolmogorov & Zabih 02, Taskar+al 04]
![Page 42: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/42.jpg)
CFG Chart
CNF tree = set of two types of parts: Constituents (A, s, e) CF-rules (A B C, s, m, e)
![Page 43: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/43.jpg)
CFG Inference LP
inside
outside
Has integral solutions z
root
![Page 44: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/44.jpg)
LP Duality Linear programming duality
Variables constraints Constraints variables
Optimal values are the same When both feasible regions are bounded
![Page 45: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/45.jpg)
Min-max Formulation
LP duality
![Page 46: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/46.jpg)
Min-max formulation summary
Formulation produces concise QP for Low-treewidth Markov networks Associative MNs (K=2) Context free grammars Bipartite matchings Approximate for untriangulated MNs, AMNs with K>2
*Taskar et al 04
![Page 47: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/47.jpg)
Unfactored Primal/Dual
QP duality
Exponentially many constraints/variables
![Page 48: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/48.jpg)
Factored Primal/Dual
By QP duality
Dual inherits structure from problem-specific inference LP
Variables correspond to a decomposition of variables of the flat case
![Page 49: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/49.jpg)
The Connection
b c a r e b r o r e b r o c eb r a c e
rc
ao
cr
.2.15.25
.4
.2 .35
.65.8
.4
.61b 1e
2 2 10
![Page 50: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/50.jpg)
Duals and Kernels
Kernel trick works: Factored dual Local functions (log-potentials) can use
kernels
![Page 51: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/51.jpg)
3D Mapping
Laser Range Finder
GPS
IMU
Data provided by: Michael Montemerlo & Sebastian Thrun
Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points
![Page 52: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/52.jpg)
![Page 53: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/53.jpg)
![Page 54: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/54.jpg)
![Page 55: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/55.jpg)
![Page 56: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/56.jpg)
Segmentation results
Hand labeled 180K test pointsModel
Accuracy
SVM 68%
V-SVM
73%
M3N 93%
![Page 57: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/57.jpg)
Fly-through
![Page 58: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/58.jpg)
LAGRbot: Real-time Navigation
LAGRbot: Paul Vernaza & Dan Lee
Range of stereo vision limited to approximately 15 m or less
![Page 59: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/59.jpg)
LAGRbot: Real-time Navigation
Model Error
Local 17%
Structured 8%
160x120 images: Real time prediction/learning (~100ms)Current work with Paul Vernaza, Dan Lee
![Page 60: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/60.jpg)
0
5
10
15
20
Tes
t Err
or
SVMs RMNS M^3Ns
Hypertext Classification WebKB dataset
Four CS department websites: 1300 pages/3500 links Classify each page: faculty, course, student, project, other Train on three universities/test on fourth
53% error reduction over SVMs
38% error reduction over RMNs
relaxed LP
*Taskar et al 02
better
loopy belief propagation
![Page 61: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/61.jpg)
Word Alignment Results
Model *Error
Data: [Hansards – Canadian Parliament] Features induced on 1 mil unsupervised sentences Trained on 100 sentences (10,000 edges) Tested on 350 sentences (35,000 edges)
[Taskar+al 05]
*Error: weighted combination of precision/recall [Lacoste-Julien+Taskar+al 06]
GIZA/IBM4 [Och & Ney 03] 6.5
+Our approach+QAP 4.5
+Local learning+matching 5.4
+Our approach 4.9
![Page 62: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/62.jpg)
Modeling First Order Effects
Monotonicity Local inversion Local fertility
QAP NP-complete Sentences (30 words, 1k vars) few seconds (Mosek) Learning: use LP relaxation Testing: using LP, 83.5% sentences, 99.85% edges integral
![Page 63: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/63.jpg)
Outline Structured prediction models
Sequences (CRFs) Trees (CFGs) Associative Markov networks (Special MRFs) Matchings
Structured large margin estimation Margins and structure Min-max formulation Linear programming inference Certificate formulation
![Page 64: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/64.jpg)
Certificate formulation Non-bipartite matchings:
O(n3) combinatorial algorithm No polynomial-size LP known
Spanning trees No polynomial-size LP known Simple certificate of optimality
Intuition: Verifying optimality easier than optimizing
Compact optimality condition of wrt.
1
2 3
4
6 5
ijkl
![Page 65: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/65.jpg)
Certificate for non-bipartite matching
Alternating cycle: Every other edge is in matching
Augmenting alternating cycle: Score of edges not in matching greater than edges in matching
Negate score of edges not in matching Augmenting alternating cycle = negative length alternating
cycle
Matching is optimal no negative alternating cycles
1
2 3
4
6 5
Edmonds ‘65
![Page 66: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/66.jpg)
Certificate for non-bipartite matching
Pick any node r as root
= length of shortest alternating
path from r to j
Triangle inequality:
Theorem:
No negative length cycle distance function d exists
Can be expressed as linear constraints: O(n) distance variables, O(n2) constraints
1
2 3
4
6 5
![Page 67: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/67.jpg)
Certificate formulation
Formulation produces compact QP for Spanning trees Non-bipartite matchings Any problem with compact optimality condition
*Taskar et al. ‘05
![Page 68: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/68.jpg)
Disulfide Bonding Prediction Data [Swiss Prot 39]
450 sequences (4-10 cysteines) Features:
windows around C-C pair physical/chemical properties
[Taskar+al 05]
Model *Acc
Local learning+matching 41%
Recursive Neural Net [Baldi+al’04] 52%
Our approach (certificate) 55%
*Accuracy: % proteins with all correct bonds
![Page 69: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/69.jpg)
Formulation summary
Brute force enumeration
Min-max formulation ‘Plug-in’ convex program for inference
Certificate formulation Directly guarantee optimality of
![Page 70: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/70.jpg)
Scalable Algorithms Convex quadratic program # variables and constraints linear in # parameters, edges Can solve using off-the-shelf software
Matlab, CPLEX, Mosek, etc. Superlinear convergence
Problem: linear is too large Second-order methods run out of memory (quadratic)
Need scalable memory-efficient methods Space/time tradeoff Structured SMO [Taskar+al 04] Structured exponentiated gradient [Bartlett+al 04,
Collins+al 07] Don’t work for matchings, min-cuts
![Page 71: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/71.jpg)
Saddle-point Problem
![Page 72: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/72.jpg)
Extragradient Method
[Korpelevich76]
Prediction:
Correction:
= Euclidean projection = step size
Theorem: Extragradient converges linearly
Key computation is Euclidean projection
usually easy harder
![Page 73: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/73.jpg)
for Bipartite Matchings: Min Cost Flow
Min-cost quadratic flow computes projection O(N1.5) complexity for fixed precision (N=num
edges) Reduction to flow for min-cuts also possible[Taskar+al 06]
j
s t
k
All capacities = 1quel
est
le
coût
prévu
What
is
the
anticipate
d
cost Flow cost
![Page 74: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/74.jpg)
Structured Extragradient
Extragradient method [Korpelevich 76, Nesterov 03] Linear convergence Key computation: projection min-cost quadratic flow for matchings & cuts
Extensions (using Bregman divergence) dynamic programming for decomposable
models “Online-envy” – want memory proportional
to # parameters independent of # examples solves problems with million edges
[Taskar+al 06]
![Page 75: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/75.jpg)
Other approaches Online methods
Online updates with respect to most violated constraints [Crammer+al 05,06]
Regression based methods Regression from input to transformed output space
[Cortes+al 07]
Learning to search Learn classifier to guide local search for structured
solution [Daume+al 05] Many others
![Page 76: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/76.jpg)
Generalization Bounds
“If the past any indication of the future, he’ll have a cruller.”
![Page 77: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/77.jpg)
Generalization Bounds
![Page 78: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/78.jpg)
Several Pointers Perceptron bound [Colllins 01]
Assume separability with margin Bound on 0-1 loss
Covering-number bound [Taskar+al 03] Bound on Hamming loss Logarithmic dependence on # variables in each y
Regret Bounds [Crammer+al 06] Online-style guarantees for more general loss
PAC-Bayes bound [McAllester 07] Tighter analysis, consistency
Bounds for Learning with Approximate Inference [Kulesza & Pereira, Today]
![Page 79: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/79.jpg)
Open Questions for Large-Margin Estimation
Statistical consistency Hinge loss not consistent for non-binary output [See Tewari & Bartlett 05, McAllester 07]
Semi-supervised Laplacian-regularization [Altun+McAllester 05] Co-regularization [Brefeld+al 05]
Latent variables Machine Translation [Liang+al 06] CCG Parsing to Logical Form
[Zettlemoyer+Collins 07] Learning with approximate inference
![Page 80: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/80.jpg)
Learning with LP relaxations Does constant factor approximate inference
guarantee anything a-priori about learning?
No [See Kulesza & Pereira, tonight] Simple 3-node counter example Separable with exact inference,
not separable with approximate
Question: What other (stronger?) approximate inference
guarantees will translate into learning guarantees?
![Page 81: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/81.jpg)
References Edited collection: G. Bakir+al 07
Predicting Structured Data MIT Press
Code: SVMstruct by Thorsten Joachims
Slides, more papers at: http://www.cis.upenn.edu/~taskar
![Page 82: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/82.jpg)
Thanks!
![Page 83: NIPS2007: structured prediction](https://reader038.fdocuments.in/reader038/viewer/2022103001/559138121a28ab07498b4655/html5/thumbnails/83.jpg)
Segmentation Model Min-Cut
0 1
Local evidence
Spatial smoothness
Computing is hard in general, but if edge potentials attractive min-cut algorithmMultiway-cut for multiclass case use LP relaxation
[Greig+al 89, Boykov+al 99, Kolmogorov & Zabih 02, Taskar+al 04]