Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... ·...
Transcript of Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... ·...
![Page 1: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/1.jpg)
Log-Linear Modelswith Structured Outputs
(continued)
Introduction to Natural Language ProcessingComputer Science 585—Fall 2009
University of Massachusetts Amherst
David Smith
![Page 2: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/2.jpg)
Overview
• What computations do we need?
• Smoothing log-linear models
• MEMMs vs. CRFs again
• Action-based parsing and dependency parsing
![Page 3: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/3.jpg)
Recipe for Conditional Training of p(y | x)
1. Gather constraints/features from training data
2. Initialize all parameters to zero
3. Classify training data with current parameters; calculate expectations
4. Gradient is
5. Take a step in the direction of the gradient
6. Repeat from 3 until convergence 43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
4343 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
![Page 4: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/4.jpg)
Recipe for Conditional Training of p(y | x)
1. Gather constraints/features from training data
2. Initialize all parameters to zero
3. Classify training data with current parameters; calculate expectations
4. Gradient is
5. Take a step in the direction of the gradient
6. Repeat from 3 until convergence 43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
4343 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
Where have we seen expected counts before?
![Page 5: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/5.jpg)
Recipe for Conditional Training of p(y | x)
1. Gather constraints/features from training data
2. Initialize all parameters to zero
3. Classify training data with current parameters; calculate expectations
4. Gradient is
5. Take a step in the direction of the gradient
6. Repeat from 3 until convergence 43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
43 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
4343 43
Recipe for a Conditional
MaxEnt Classifier
1. Gather constraints from training data:
2. Initialize all parameters to zero.
3. Classify training data with current parameters. Calculateexpectations.
4. Gradient is
5. Take a step in the direction of the gradient
6. Until convergence, return to step 3.
43
Where have we seen expected counts before?
EM!
![Page 6: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/6.jpg)
Gradient-Based Training
• λ <- λ + rate * Gradient(F)
• After all training examples? (batch)
• After every example? (on-line)
• Use second derivative?
• A big field: numerical optimization
![Page 7: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/7.jpg)
Overfitting• If we have too many features, we can choose
weights to model the training data perfectly
• If we have a feature that only appears in spam training, not ham training, it will get weight ∞ to maximize p(spam | feature) at 1.
• These behaviors
• Overfit the training data
• Will probably do poorly on test data
![Page 8: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/8.jpg)
Solutions to Overfitting• Throw out rare features.
• Require every feature to occur > 4 times, and > 0 times with ling, and > 0 times with spam.
• Only keep, e.g., 1000 features.
• Add one at a time, always greedily picking the one that most improves performance on held-out data.
• Smooth the observed feature counts.
• Smooth the weights by using a prior.
• max p(λ|data) = max p(λ, data) =p(λ)p(data|λ)
• decree p(λ) to be high when most weights close to 0
![Page 9: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/9.jpg)
Smoothing with Priors• What if we had a prior expectation that parameter values
wouldn’t be very large?
• We could then balance evidence suggesting large (or infinite) parameters against our prior expectation.
• The evidence would never totally defeat the prior, and parameters would be smoothed (and kept finite)
• We can do this explicitly by changing the optimization objective to maximum posterior likelihood:
log P(y, λ | x) = log P(λ) + log P(y | x, λ)
Posterior Prior Likelihood
![Page 10: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/10.jpg)
42 42
42
![Page 11: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/11.jpg)
Parsing as Structured Prediction
![Page 12: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/12.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb ! book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det ! that(Verb Det) flight shift(Verb Det flight) reduce, Noun ! flight(Verb Det Noun) reduce, NOM ! Noun(Verb Det NOM) reduce, NP ! Det NOM(Verb NP) reduce, VP ! Verb NP(Verb) reduce, S ! V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
![Page 13: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/13.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb ! book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det ! that(Verb Det) flight shift(Verb Det flight) reduce, Noun ! flight(Verb Det Noun) reduce, NOM ! Noun(Verb Det NOM) reduce, NP ! Det NOM(Verb NP) reduce, VP ! Verb NP(Verb) reduce, S ! V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
![Page 14: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/14.jpg)
Shift-reduce parsing
Stack Input remaining Action() Book that flight shift(Book) that flight reduce, Verb ! book, (Choice #1 of 2)(Verb) that flight shift(Verb that) flight reduce, Det ! that(Verb Det) flight shift(Verb Det flight) reduce, Noun ! flight(Verb Det Noun) reduce, NOM ! Noun(Verb Det NOM) reduce, NP ! Det NOM(Verb NP) reduce, VP ! Verb NP(Verb) reduce, S ! V(S) SUCCESS!
Ambiguity may lead to the need for backtracking.
Train log-linear model of p(action | context)
![Page 15: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/15.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
slide adapted from Yuji Matsumoto
![Page 16: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/16.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
slide adapted from Yuji Matsumoto
![Page 17: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/17.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
ROOT
![Page 18: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/18.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
SUBJ
ROOT
S-COMP
![Page 19: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/19.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
SUBJ
ROOT
S-COMP
SUBJCOMP
![Page 20: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/20.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
SUBJ
ROOT
S-COMP
SUBJCOMP
COMP
![Page 21: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/21.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
SUBJ
ROOT
S-COMP
SUBJ
SPEC
MODMOD
COMPCOMP
![Page 22: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/22.jpg)
11
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
SUBJ
ROOT
S-COMP
SUBJ
SPEC
MODMOD
COMPCOMP
![Page 23: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/23.jpg)
11
MOD
Word Dependency Parsing
He reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentence
Part-of-speech tagging
He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .
POS-tagged sentence
Word dependency parsing
slide adapted from Yuji Matsumoto
Word dependency parsed sentenceHe reckons the current account deficit will narrow to only 1.8 billion in September .
SUBJ
ROOT
S-COMP
SUBJ
SPEC
MODMOD
COMPCOMP
![Page 24: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/24.jpg)
12 7
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models. p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …
![Page 25: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/25.jpg)
12 7
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models. p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …
![Page 26: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/26.jpg)
12 7
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models. p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …
each choice depends on a limited part of the history
![Page 27: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/27.jpg)
12 7
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models. p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …
each choice depends on a limited part of the history
but which dependencies to allow?what if they’re all worthwhile?
p(D | A,B,C)?p(D | A,B,C)?
… p(D | A,B) * p(C | A,B,D)?
![Page 28: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/28.jpg)
13 8
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …which dependencies to allow? (given limited training data)
![Page 29: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/29.jpg)
13 8
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models.
p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …which dependencies to allow? (given limited training data)
![Page 30: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/30.jpg)
13 8
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models.
p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …which dependencies to allow? (given limited training data)
(1/Z) * Φ(A) * Φ(B,A) * Φ(C,A) * Φ(C,B)
* Φ(D,A,B) * Φ(D,B,C) * Φ(D,A,C) *
…throw them all in!
![Page 31: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/31.jpg)
13 8
Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)
In the beginning, we used generative models.
Solution: Log-linear (max-entropy) modeling
Features may interact in arbitrary ways Iterative scaling keeps adjusting the feature weights
until the model agrees with the training data.
p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * …which dependencies to allow? (given limited training data)
(1/Z) * Φ(A) * Φ(B,A) * Φ(C,A) * Φ(C,B)
* Φ(D,A,B) * Φ(D,B,C) * Φ(D,A,C) *
…throw them all in!
![Page 32: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/32.jpg)
14 9
How about structured outputs?
![Page 33: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/33.jpg)
14
Log-linear models great for n-way classification
9
How about structured outputs?
![Page 34: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/34.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
9
How about structured outputs?
find preferred tags
v a n
![Page 35: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/35.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
9
How about structured outputs?
but to allow fast dynamic programming, only use n-gram featuresfind preferred tags
v a n
![Page 36: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/36.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
9
How about structured outputs?
but to allow fast dynamic programming, only use n-gram featuresfind preferred tags
v a n
![Page 37: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/37.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
9
How about structured outputs?
but to allow fast dynamic programming, only use n-gram featuresfind preferred tags
v a n
![Page 38: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/38.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
9
How about structured outputs?
but to allow fast dynamic programming, only use n-gram featuresfind preferred tags
v a n
![Page 39: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/39.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
Also good for dependency parsing
9
How about structured outputs?
but to allow fast dynamic programming, only use n-gram features
…find preferred links…
find preferred tags
v a n
![Page 40: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/40.jpg)
14
Log-linear models great for n-way classification Also good for predicting sequences
Also good for dependency parsing
9
How about structured outputs?
but to allow fast dynamic programming, only use n-gram features
but to allow fast dynamic programming or MST parsing,only use single-edge features…find preferred links…
find preferred tags
v a n
![Page 41: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/41.jpg)
14 9
How about structured outputs?but to allow fast dynamic programming or MST parsing,only use single-edge features
…find preferred links…
![Page 42: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/42.jpg)
15 10
How about structured outputs?but to allow fast dynamic programming or MST parsing,only use single-edge features…find preferred links…
![Page 43: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/43.jpg)
15 10
How about structured outputs?but to allow fast dynamic programming or MST parsing,only use single-edge features…find preferred links…
![Page 44: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/44.jpg)
15 10
How about structured outputs?but to allow fast dynamic programming or MST parsing,only use single-edge features…find preferred links…
![Page 45: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/45.jpg)
15 10
How about structured outputs?but to allow fast dynamic programming or MST parsing,only use single-edge features…find preferred links…
![Page 46: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/46.jpg)
15 10
How about structured outputs?but to allow fast dynamic programming or MST parsing,only use single-edge features…find preferred links…
![Page 47: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/47.jpg)
16
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
![Page 48: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/48.jpg)
16
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
yes, lots of green ...
![Page 49: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/49.jpg)
17
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
![Page 50: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/50.jpg)
17
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
![Page 51: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/51.jpg)
18
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
V A A A N J N V C
![Page 52: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/52.jpg)
18
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
jasný N(“bright NOUN”)
V A A A N J N V C
![Page 53: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/53.jpg)
19
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
jasný N(“bright NOUN”)
V A A A N J N V C
![Page 54: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/54.jpg)
19
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
jasný N(“bright NOUN”)
V A A A N J N V C
A N
![Page 55: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/55.jpg)
20
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
jasný N(“bright NOUN”)
V A A A N J N V C
A N
![Page 56: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/56.jpg)
20
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
Is this a good edge?
jasný den(“bright day”)
jasný N(“bright NOUN”)
V A A A N J N V C
A Npreceding
conjunction A N
![Page 57: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/57.jpg)
21
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
![Page 58: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/58.jpg)
21
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
not as good, lots of red ...
![Page 59: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/59.jpg)
22
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
![Page 60: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/60.jpg)
22
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
jasný hodiny(“bright clocks”)
![Page 61: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/61.jpg)
22
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
jasný hodiny(“bright clocks”)
... undertrained ...
![Page 62: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/62.jpg)
23
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
jasný hodiny(“bright clocks”)
... undertrained ...
![Page 63: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/63.jpg)
23
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
jasný hodiny(“bright clocks”)
... undertrained ...
jasn hodi(“bright clock,”
stems only)
![Page 64: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/64.jpg)
24
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
jasn hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
jasný hodiny(“bright clocks”)
... undertrained ...
![Page 65: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/65.jpg)
24
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
jasn hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
Aplural Nsingular
jasný hodiny(“bright clocks”)
... undertrained ...
![Page 66: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/66.jpg)
25
jasný hodiny(“bright clocks”)
... undertrained ...
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
jasn hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
Aplural Nsingular
![Page 67: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/67.jpg)
25
jasný hodiny(“bright clocks”)
... undertrained ...
Edge-Factored Parsers (McDonald et al. 2005)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
How about this competing edge?
V A A A N J N V C
jasn hodi(“bright clock,”
stems only)
byl jasn stud dubn den a hodi odbí třin
Aplural Nsingular A N
where N followsa conjunction
![Page 68: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/68.jpg)
26
jasný
Edge-Factored Parsers (McDonald et al. 2005)
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
Which edge is better? “bright day” or “bright clocks”?
![Page 69: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/69.jpg)
27
jasný
Edge-Factored Parsers (McDonald et al. 2005)
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
![Page 70: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/70.jpg)
27
jasný
Edge-Factored Parsers (McDonald et al. 2005)
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
Which edge is better?
![Page 71: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/71.jpg)
27
jasný
Edge-Factored Parsers (McDonald et al. 2005)
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
Which edge is better? our current weight vector
![Page 72: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/72.jpg)
27
jasný
Edge-Factored Parsers (McDonald et al. 2005)
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
Which edge is better? Score of an edge e = θ ⋅ features(e)
our current weight vector
![Page 73: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/73.jpg)
27
jasný
Edge-Factored Parsers (McDonald et al. 2005)
Byl studený dubnový den a hodiny odbíjely třináctou
“It bright cold day April and clocks were thirteen”was a in the striking
V A A A N J N V C
byl jasn stud dubn den a hodi odbí třin
Which edge is better? Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
![Page 74: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/74.jpg)
28
Edge-Factored Parsers (McDonald et al. 2005)
Which edge is better? Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
![Page 75: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/75.jpg)
28
Edge-Factored Parsers (McDonald et al. 2005)
Which edge is better? Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
can’t have both(one parent per word)
![Page 76: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/76.jpg)
28
Edge-Factored Parsers (McDonald et al. 2005)
Which edge is better? Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
can’t have both(one parent per word)
can‘t have both(no crossing links)
![Page 77: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/77.jpg)
28
Edge-Factored Parsers (McDonald et al. 2005)
Which edge is better? Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
can’t have both(one parent per word)
can‘t have both(no crossing links)
Can’t have all three(no cycles)
![Page 78: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/78.jpg)
28
Edge-Factored Parsers (McDonald et al. 2005)
Which edge is better? Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
can’t have both(one parent per word)
can‘t have both(no crossing links)
Can’t have all three(no cycles)
Thus, an edge may lose (or win) because of a consensus of other edges.
![Page 79: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/79.jpg)
29
Finding Highest-Scoring Parse
The cat in the hat wore a stovepipe. ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
![Page 80: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/80.jpg)
29
Finding Highest-Scoring Parse
The cat in the hat wore a stovepipe. ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
Thecat
in
thehat
wore
astovepipe
ROOT
let’s vertically stretch this graph drawing
![Page 81: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/81.jpg)
29
Finding Highest-Scoring Parse
The cat in the hat wore a stovepipe. ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
let’s vertically stretch this graph drawing
![Page 82: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/82.jpg)
29
Finding Highest-Scoring Parse
The cat in the hat wore a stovepipe. ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
let’s vertically stretch this graph drawing
![Page 83: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/83.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
![Page 84: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/84.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3)
![Page 85: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/85.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case
![Page 86: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/86.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case
to score “cat wore” link, not enough to know this is NP
![Page 87: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/87.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case
to score “cat wore” link, not enough to know this is NP must know it’s rooted at “cat”
![Page 88: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/88.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case
to score “cat wore” link, not enough to know this is NP must know it’s rooted at “cat” so expand nonterminal set by O(n): {NPthe, NPcat, NPhat, ...}
![Page 89: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/89.jpg)
30
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT so CKY’s “grammar constant” is no longer constant
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case
to score “cat wore” link, not enough to know this is NP must know it’s rooted at “cat” so expand nonterminal set by O(n): {NPthe, NPcat, NPhat, ...}
![Page 90: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/90.jpg)
31
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
![Page 91: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/91.jpg)
31
Finding Highest-Scoring Parse
each subtree is a linguistic constituent(here a noun phrase)
Thecat
in
thehat
wore
astovepipe
ROOT
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case Solution: Use a different decomposition (Eisner 1996)
Back to O(n3)
![Page 92: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/92.jpg)
32
Spans vs. constituents
Two kinds of substring.»Constituent of the tree: links to the rest only
through its headword (root).
»Span of the tree: links to the rest only through its endwords.
The cat in the hat wore a stovepipe. ROOT
The cat in the hat wore a stovepipe. ROOT
![Page 93: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/93.jpg)
Decomposing a tree into spans
The cat in the hat wore a stovepipe. ROOT
The cat
wore a stovepipe. ROOTcat in the hat wore
+
+
in the hat worecat in +
hat worein the hat +
cat in the hat wore a stovepipe. ROOT
![Page 94: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/94.jpg)
34
Finding Highest-Scoring Parse
![Page 95: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/95.jpg)
34
Finding Highest-Scoring Parse
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case Solution: Use a different decomposition (Eisner 1996)
Back to O(n3)
![Page 96: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/96.jpg)
34
Finding Highest-Scoring Parse
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case Solution: Use a different decomposition (Eisner 1996)
Back to O(n3) Can play usual tricks for dynamic programming parsing
Further refining the constituents or spans Allow prob. model to keep track of even more internal information
A*, best-first, coarse-to-fine Training by EM etc.
![Page 97: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/97.jpg)
34
Finding Highest-Scoring Parse
Convert to context-free grammar (CFG) Then use dynamic programming
CKY algorithm for CFG parsing is O(n3) Unfortunately, O(n5) in this case Solution: Use a different decomposition (Eisner 1996)
Back to O(n3) Can play usual tricks for dynamic programming parsing
Further refining the constituents or spans Allow prob. model to keep track of even more internal information
A*, best-first, coarse-to-fine Training by EM etc. require “outside” probabilities
of constituents, spans, or links
![Page 98: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/98.jpg)
35
Hard Constraints on Valid Trees
Score of an edge e = θ ⋅ features(e) Standard algos valid parse with max total score
our current weight vector
can’t have both(one parent per word)
can‘t have both(no crossing links)
Can’t have all three(no cycles)
Thus, an edge may lose (or win) because of a consensus of other edges.
![Page 99: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/99.jpg)
35
Hard Constraints on Valid Trees
can‘t have both(no crossing links)
![Page 100: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/100.jpg)
36
Non-Projective Parses
can‘t have both(no crossing links)
The “projectivity” restriction.Do we really want it?
![Page 101: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/101.jpg)
36
talk
Non-Projective Parses
can‘t have both(no crossing links)
The “projectivity” restriction.Do we really want it?
I give a on bootstrappingtomorrowROOT ‘ll
![Page 102: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/102.jpg)
36
talk
Non-Projective Parses
can‘t have both(no crossing links)
The “projectivity” restriction.Do we really want it?
I give a on bootstrappingtomorrowROOT ‘ll
![Page 103: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/103.jpg)
36
talk
Non-Projective Parses
can‘t have both(no crossing links)
The “projectivity” restriction.Do we really want it?
I give a on bootstrappingtomorrowROOT ‘ll
subtree rooted at “talk”is a discontiguous noun phrase
![Page 104: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/104.jpg)
37
Non-Projective Parses
I give a on bootstrappingtalk tomorrowROOT ‘ll
occasional non-projectivity in English
![Page 105: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/105.jpg)
37
Non-Projective Parses
ista meam norit gloria canitiemROOT
I give a on bootstrappingtalk tomorrowROOT ‘ll
occasional non-projectivity in English
frequent non-projectivity in Latin, etc.
![Page 106: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/106.jpg)
37
Non-Projective Parses
ista meam norit gloria canitiemROOT
I give a on bootstrappingtalk tomorrowROOT ‘ll
That glory may-know my going-gray (i.e., it shall last till I go gray)
occasional non-projectivity in English
frequent non-projectivity in Latin, etc.
![Page 107: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/107.jpg)
37
Non-Projective Parses
ista meam norit gloria canitiemROOT
I give a on bootstrappingtalk tomorrowROOT ‘ll
thatNOM myACC may-know gloryNOM going-grayACC
That glory may-know my going-gray (i.e., it shall last till I go gray)
occasional non-projectivity in English
frequent non-projectivity in Latin, etc.
![Page 108: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/108.jpg)
37
Non-Projective Parses
ista meam norit gloria canitiemROOT
I give a on bootstrappingtalk tomorrowROOT ‘ll
thatNOM myACC may-know gloryNOM going-grayACC
That glory may-know my going-gray (i.e., it shall last till I go gray)
occasional non-projectivity in English
frequent non-projectivity in Latin, etc.
![Page 109: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/109.jpg)
37
Non-Projective Parses
ista meam norit gloria canitiemROOT
I give a on bootstrappingtalk tomorrowROOT ‘ll
thatNOM myACC may-know gloryNOM going-grayACC
That glory may-know my going-gray (i.e., it shall last till I go gray)
occasional non-projectivity in English
frequent non-projectivity in Latin, etc.
![Page 110: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/110.jpg)
38
Finding highest-scoring non-projective tree Consider the sentence “John saw Mary” (left). The Chu-Liu-Edmonds algorithm finds the maximum-
weight spanning tree (right) – may be non-projective. Can be found in time O(n2).
9
10
30
20
30 0
11
3
9
root
John
saw
Mary
10
30
30
root
John
saw
Mary
slide thanks to Dragomir Radev
Every node selects best parentIf cycles, contract them and repeat
![Page 111: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/111.jpg)
39
Summing over all non-projective treesFinding highest-scoring non-projective tree Consider the sentence “John saw Mary” (left). The Chu-Liu-Edmonds algorithm finds the maximum-
weight spanning tree (right) – may be non-projective. Can be found in time O(n2).
How about total weight Z of all trees? How about outside probabilities or gradients? Can be found in time O(n3) by matrix determinants and
inverses (Smith & Smith, 2007).
slide thanks to Dragomir Radev
![Page 112: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/112.jpg)
40
Graph Theory to the Rescue!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
![Page 113: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/113.jpg)
40
Graph Theory to the Rescue!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
Exactly the Z we need!
![Page 114: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/114.jpg)
40
Graph Theory to the Rescue!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
Exactly the Z we need!
O(n3) time!
![Page 115: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/115.jpg)
41
Building the Kirchoff (Laplacian) Matrix
• Negate edge scores• Sum columns
(children)• Strike root row/col.• Take determinant
![Page 116: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/116.jpg)
41
Building the Kirchoff (Laplacian) Matrix
• Negate edge scores• Sum columns
(children)• Strike root row/col.• Take determinant
![Page 117: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/117.jpg)
41
Building the Kirchoff (Laplacian) Matrix
• Negate edge scores• Sum columns
(children)• Strike root row/col.• Take determinant
![Page 118: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/118.jpg)
41
Building the Kirchoff (Laplacian) Matrix
• Negate edge scores• Sum columns
(children)• Strike root row/col.• Take determinant
![Page 119: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/119.jpg)
41
Building the Kirchoff (Laplacian) Matrix
• Negate edge scores• Sum columns
(children)• Strike root row/col.• Take determinant
N.B.: This allows multiple children of root, but see Koo et al. 2007.
![Page 120: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/120.jpg)
42
Why Should This Work?Chu-Liu-Edmonds analogy:Every node selects best parentIf cycles, contract and recur
Clear for 1x1 matrix; use induction
Undirected case; special root cases for directed
![Page 121: Log-Linear Models with Structured Outputs (continued)dasmith/inlp2009/lect15-cs... · 2009-11-04 · Log-Linear Models with Structured Outputs (continued) Introduction to Natural](https://reader036.fdocuments.in/reader036/viewer/2022070811/5f09e38a7e708231d428fc0d/html5/thumbnails/121.jpg)
42
Why Should This Work?Chu-Liu-Edmonds analogy:Every node selects best parentIf cycles, contract and recur
Clear for 1x1 matrix; use induction
Undirected case; special root cases for directed