Forest-based Algorithms in - Oregon State...
Transcript of Forest-based Algorithms in - Oregon State...
![Page 1: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/1.jpg)
Forest-based Algorithms in
Natural Language Processing
includes joint work with David Chiang, Kevin Knight, Aravind Joshi, Haitao Mi and Qun Liu
CMU LTI Seminar, Pittsburgh, PA, May 14, 2009
INSTITUTE OFCOMPUTING
TECHNOLOGY
Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT)
![Page 2: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/2.jpg)
Forest Algorithms
NLP is all about ambiguities
• to middle school kids: what does this sentence mean?
2
Aravind Joshi
I saw her duck.
![Page 3: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/3.jpg)
Forest Algorithms
NLP is all about ambiguities
• to middle school kids: what does this sentence mean?
2
Aravind Joshi
I saw her duck.
![Page 4: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/4.jpg)
Forest Algorithms
NLP is all about ambiguities
3
Aravind Joshi
I eat sushi with tuna.
• to middle school kids: what does this sentence mean?
![Page 5: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/5.jpg)
Forest Algorithms
NLP is all about ambiguities
4
I saw her duck.
![Page 6: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/6.jpg)
Forest Algorithms
NLP is all about ambiguities
4
I saw her duck.
![Page 7: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/7.jpg)
Forest Algorithms
NLP is all about ambiguities
• how about...
• I saw her duck with a telescope.
• I saw her duck with a telescope in the garden...4
...
I saw her duck.
![Page 8: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/8.jpg)
Forest Algorithms
NLP is HARD!
• exponential explosion of the search space
• non-local dependencies (context)
5
...
S
NP
PRP
I
VP
VBD
saw
NP
PRP$
her
NN
duck
PP
IN
with
NP
DT
a
NN
telescope
![Page 9: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/9.jpg)
Forest Algorithms
Ambiguities in Translation
6
zi zhu zhong duan自 助 终 端
self help terminal device
needs context to disambiguate!
![Page 10: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/10.jpg)
Forest Algorithms
Evil Rubbish; Safety Export
7
needs context for fluency!
![Page 11: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/11.jpg)
Forest Algorithms
Key Problem
8
![Page 12: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/12.jpg)
Forest Algorithms
Key Problem• How to efficiently incorporate non-local information?
8
![Page 13: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/13.jpg)
Forest Algorithms
Key Problem• How to efficiently incorporate non-local information?
• Solution 1: pipelined reranking / rescoring
• postpone disambiguation by propagating k-best lists
• examples: tagging => parsing => semantics
• (open) need efficient algorithms for k-best search
8
![Page 14: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/14.jpg)
Forest Algorithms
Key Problem• How to efficiently incorporate non-local information?
• Solution 1: pipelined reranking / rescoring
• postpone disambiguation by propagating k-best lists
• examples: tagging => parsing => semantics
• (open) need efficient algorithms for k-best search
• Solution 2: exact joint search on a much larger space
• examples: head/parent annotations; often intractable
8
![Page 15: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/15.jpg)
Forest Algorithms
Key Problem• How to efficiently incorporate non-local information?
• Solution 1: pipelined reranking / rescoring
• postpone disambiguation by propagating k-best lists
• examples: tagging => parsing => semantics
• (open) need efficient algorithms for k-best search
• Solution 2: exact joint search on a much larger space
• examples: head/parent annotations; often intractable
• Solution 3: approximate joint search (focus of this talk)
• (open) integrate non-local information on the fly 8
![Page 16: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/16.jpg)
Forest Algorithms
Outline• Forest: Packing Exponential Ambiguities
• Exact k-best Search in Forest (Solution 1)
• Approximate Joint Search withNon-Local Features (Solution 3)
• Forest Reranking
• Forest Rescoring
• Forest-based Translation (Solutions 2+3+1)
• Tree-based Translation
• Forest-based Decoding9
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
![Page 17: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/17.jpg)
Forest Algorithms
Packed Forests• a compact representation of many parses
• by sharing common sub-derivations
• polynomial-space encoding of exponentially large set
10(Klein and Manning, 2001; Huang and Chiang, 2005)
0 I 1 saw 2 him 3 with 4 a 5 mirror 6
nodes hyperedges
a hypergraph
![Page 18: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/18.jpg)
Forest Algorithms
Weight Functions
• Each hyperedge e has a weight function fe
• monotonic in each argument
• e.g. in CKY, fe(a, b) = a x b x Pr (rule)
• optimal subproblem property in dynamic programming
• optimal solutions include optimal sub-solutions
11
B: b C: c
A: f(b, c)
B: b’ ≤ b C: c
A: f(b’, c)≤ f(b, c)
![Page 19: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/19.jpg)
Forest Algorithms
1-best Viterbi on Forest1. topological sort (assumes acyclicity)
2. visit each node v in sorted order and do updates
• for each incoming hyperedge e = ((u1, .., u|e|), v, fe)
• use d(ui)’s to update d(v)
• key observation: d(ui)’s are fixed to optimal at this time
• time complexity: O(V+E) = O(E) for CKY: O(n3)12
vu1
u2
fe
u’
d(v) ! = fe(d(u1), · · · , d(u|e|))
fe’
![Page 20: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/20.jpg)
Forest Algorithms
Outline• Forest: Packing Exponential Ambiguities
• Exact k-best Search in Forest (Solution 1)
• Approximate Joint Search withNon-Local Features (Solution 3)
• Forest Reranking
• Forest Rescoring
• Forest-based Translation (Solutions 2+3)
• Tree-based Translation
• Forest-based Decoding13
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
![Page 21: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/21.jpg)
Forest Algorithms
k-best Viterbi Algorithm 0
• straightforward k-best extension
• a vector of k (sorted) values for each node
• now what’s the result of fe (a, b) ?
• k x k = k2 possibilities! => then choose top k
• time complexity: O(k2 E)14
.1a
b
vu1
u2
fea
b
![Page 22: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/22.jpg)
Forest Algorithms
k-best Viterbi Algorithm 1• key insight: do not need to enumerate all k2
• since vectors a and b are sorted
• and the weight function fe is monotonic
• (a1, b1) must be the best
• either (a2, b1) or (a1, b2) is the 2nd-best
• use a priority queue for the frontier
• extract best
• push two successors
• time complexity: O(k log k E)15
.1a
b
![Page 23: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/23.jpg)
Forest Algorithms
k-best Viterbi Algorithm 1• key insight: do not need to enumerate all k2
• since vectors a and b are sorted
• and the weight function fe is monotonic
• (a1, b1) must be the best
• either (a2, b1) or (a1, b2) is the 2nd-best
• use a priority queue for the frontier
• extract best
• push two successors
• time complexity: O(k log k E)16
.1a
b
![Page 24: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/24.jpg)
Forest Algorithms
k-best Viterbi Algorithm 1• key insight: do not need to enumerate all k2
• since vectors a and b are sorted
• and the weight function fe is monotonic
• (a1, b1) must be the best
• either (a2, b1) or (a1, b2) is the 2nd-best
• use a priority queue for the frontier
• extract best
• push two successors
• time complexity: O(k log k E)17
.1a
b
![Page 25: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/25.jpg)
Forest Algorithms
k-best Viterbi Algorithm 1• key insight: do not need to enumerate all k2
• since vectors a and b are sorted
• and the weight function fe is monotonic
• (a1, b1) must be the best
• either (a2, b1) or (a1, b2) is the 2nd-best
• use a priority queue for the frontier
• extract best
• push two successors
• time complexity: O(k log k E)18
.1a
b
![Page 26: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/26.jpg)
Forest Algorithms
k-best Viterbi Algorithm 2• Algorithm 1 works on each hyperedge sequentially
• O(k log k E) is still too slow for big k
• Algorithm 2 processes all hyperedges in parallel
• dramatic speed-up: O(E + V k log k)
19
VP1, 6
PP1, 3 VP3, 6 PP1, 4 VP4, 6 PP3, 6VP2, 3
hyperedge
NP1, 2
![Page 27: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/27.jpg)
Forest Algorithms
k-best Viterbi Algorithm 3
• Algorithm 2 computes k-best for each node
• but we are only interested in k-best of the root node
• Algorithm 3 computes as many as really needed
• forward-phase
• same as 1-best Viterbi, but stores the forest (keeping alternative hyperedges)
• backward-phase
• recursively asking “what’s your 2nd-best” top-down
• asks for more when need more
20
![Page 28: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/28.jpg)
Forest Algorithms
Summary of Algorithms
• Algorithms 1 => 2 => 3
• lazier and lazier (computation on demand)
• larger and larger locality
• Algorithm 3 is very fast, but requires storing forest
21
locality time space
Algorithm 1
Algorithm 2
Algorithm 3
hyperedge O( E k log k ) O(k V)node O( E + V k log k ) O(k V)global O( E + D k log k ) O(E + k D)
E - hyperedges: O(n3); V - nodes: O(n2); D - derivation: O(n)
![Page 29: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/29.jpg)
Forest Algorithms
Experiments - Efficiency
• on state-of-the-art Collins/Bikel parser (Bikel, 2004)
• average parsing time per sentence using Algs. 0, 1, 3
22
O( E + D k log k )
![Page 30: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/30.jpg)
Forest Algorithms
Reranking and Oracles• oracle - the candidate closest to the correct parse
among the k-best candidates
• measures the potential of real reranking
23
Collins 2000
our Algorithms
kOra
cle
Pars
eval
sco
re
(turns off DP)
![Page 31: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/31.jpg)
Forest Algorithms
Outline
• Packed Forests and Hypergraph Framework
• Exact k-best Search in Forest (Solution 1)
• Approximate Joint Search withNon-Local Features (Solution 3)
• Forest Reranking
• Forest Rescoring
• Application: Forest-based Translation
• Tree-based Translation
• Forest-based Decoding24
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
![Page 32: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/32.jpg)
Forest Algorithms
Why not k-best reranking?
• too few variations (limited scope)
• 41% correct parses are not in ~30-best (Collins, 2000)
• worse for longer sentences
• too many redundancies
• 50-best usually encodes 5-6 binary decisions (25<50<26)25
...
![Page 33: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/33.jpg)
Forest Algorithms
Redundancies in n-best lists
26
(TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (NP (NP (RB Not) (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (RB Not) (NP (NP (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (NP (NP (RB Not) (RB all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .)))
Not all those who wrote oppose the changes.
...
![Page 34: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/34.jpg)
Forest Algorithms
Redundancies in n-best lists
26
(TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (NP (NP (RB Not) (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (RB Not) (NP (NP (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (NP (NP (RB Not) (RB all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .)))
(TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .)))
Not all those who wrote oppose the changes.
packed forest
...
![Page 35: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/35.jpg)
Forest Algorithms
Reranking on a Forest?
• with only local features (Solution 2)
• dynamic programming, exact, tractable (Taskar et al. 2004; McDonald et al., 2005)
• with non-local features (Solution 3)
• on-the-fly reranking at internal nodes
• top k derivations at each node
• use as many non-local features as possible at each node
• chart parsing + discriminative reranking
• we use perceptron for simplicity
27
![Page 36: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/36.jpg)
Forest Algorithms
Features
• a feature f is a function from tree y to a real number
• f1(y)=log Pr(y) is the log Prob from generative parser
• every other feature counts the number of times a particular configuration occurs in y
28
instances of Rule feature
f 100 (y) = f S → NP VP . (y) = 1f 200 (y) = f NP → DT NN (y) = 2
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
our features are from (Charniak & Johnson, 2005)
(Collins, 2000)
![Page 37: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/37.jpg)
Forest Algorithms
Local vs. Non-Local Features
• a feature is local iff. it can be factored among local productions of a tree (i.e., hyperedges in a forest)
• local features can be pre-computed on each hyperedge in the forest; non-locals can not
29
Rule is local
ParentRule is non-local
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
![Page 38: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/38.jpg)
Forest Algorithms
Local vs. Non-Local: Examples
• CoLenPar feature captures the difference in lengths of adjacent conjuncts (Charniak and Johnson, 2005)
30CoLenPar: 2
local!
![Page 39: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/39.jpg)
Forest Algorithms
Local vs. Non-Local: Examples
• CoPar feature captures the depth to which adjacent conjuncts are isomorphic (Charniak and Johnson, 2005)
31CoPar: 4
non-local!
(violates DPprinciple)
![Page 40: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/40.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
32
unit instance of ParentRule feature at VP node
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 41: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/41.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
32
unit instance of ParentRule feature at VP node
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 42: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/42.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
32
unit instance of ParentRule feature at VP node
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 43: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/43.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
33
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
unit instance of ParentRule feature at S node
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 44: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/44.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
33
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
unit instance of ParentRule feature at S node
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 45: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/45.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
33
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
unit instance of ParentRule feature at S node
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 46: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/46.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
34
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
unit instance of ParentRule feature at TOP node
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 47: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/47.jpg)
Forest Algorithms
Factorizing non-local features• going bottom-up, at each node
• compute (partial values of) feature instances that become computable at this level
• postpone those uncomputable to ancestors
34
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
unit instance of ParentRule feature at TOP node
non-local features factor across nodes dynamically
local features factor across hyperedges statically
![Page 48: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/48.jpg)
Forest Algorithms
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
NGramTree (C&J 05)
• an NGramTree captures the smallest tree fragment that contains a bigram (two consecutive words)
• unit instances are boundary words between subtrees
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
unit instance of node A
35
![Page 49: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/49.jpg)
Forest Algorithms
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
NGramTree (C&J 05)
• an NGramTree captures the smallest tree fragment that contains a bigram (two consecutive words)
• unit instances are boundary words between subtrees
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
unit instance of node A
36
![Page 50: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/50.jpg)
Forest Algorithms
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
NGramTree (C&J 05)
• an NGramTree captures the smallest tree fragment that contains a bigram (two consecutive words)
• unit instances are boundary words between subtrees
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
unit instance of node A
36
![Page 51: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/51.jpg)
Forest Algorithms
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
NGramTree (C&J 05)
• an NGramTree captures the smallest tree fragment that contains a bigram (two consecutive words)
• unit instances are boundary words between subtrees
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
unit instance of node A
37
![Page 52: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/52.jpg)
Forest Algorithms
TOP
S
NP
PRP
I
VP
VBD
saw
NP
DT
the
NN
boy
PP
IN
with
NP
DT
a
NN
telescope
.
.
NGramTree (C&J 05)
• an NGramTree captures the smallest tree fragment that contains a bigram (two consecutive words)
• unit instances are boundary words between subtrees
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
unit instance of node A
37
![Page 53: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/53.jpg)
Forest Algorithms
Approximate Decoding
• bottom-up, keeps top k derivations at each node
• non-monotonic grid due to non-local features
38
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
w・fN( ) = 0.5
1.0 3.0 8.0
1.0
1.1
3.5
2.0 + 0.5 4.0 + 5.0 9.0 + 0.5
2.1 + 0.3 4.1 + 5.4 9.1 + 0.3
4.5 + 0.6 6.5 +10.5 11.5 + 0.6
![Page 54: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/54.jpg)
Forest Algorithms
Approximate Decoding
• bottom-up, keeps top k derivations at each node
• non-monotonic grid due to non-local features
38
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
w・fN( ) = 0.5
1.0 3.0 8.0
1.0
1.1
3.5
2.0 + 0.5 4.0 + 5.0 9.0 + 0.5
2.1 + 0.3 4.1 + 5.4 9.1 + 0.3
4.5 + 0.6 6.5 +10.5 11.5 + 0.6
![Page 55: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/55.jpg)
Forest Algorithms
Approximate Decoding
• bottom-up, keeps top k derivations at each node
• non-monotonic grid due to non-local features
39
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
w・fN( ) = 0.5
1.0 3.0 8.0
1.0
1.1
3.5
2.5 9.0 9.5
2.4 9.5 9.4
5.1 17.0 12.1
![Page 56: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/56.jpg)
Forest Algorithms
Algorithm 2 => Cube Pruning
• bottom-up, keeps top k derivations at each node
• non-monotonic grid due to non-local features
40
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
w・fN( ) = 0.5
1.0 3.0 8.0
1.0
1.1
3.5
2.5 9.0 9.5
2.4 9.5 9.4
5.1 17.0 12.1
![Page 57: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/57.jpg)
Forest Algorithms
Algorithm 2 => Cube Pruning
• bottom-up, keeps top k derivations at each node
• non-monotonic grid due to non-local features
41
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
w・fN( ) = 0.5
1.0 3.0 8.0
1.0
1.1
3.5
2.5 9.0 9.5
2.4 9.5 9.4
5.1 17.0 12.1
![Page 58: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/58.jpg)
Forest Algorithms
Algorithm 2 => Cube Pruning
• bottom-up, keeps top k derivations at each node
• non-monotonic grid due to non-local features
42
Ai,k
Bi,j
wi . . . wj!1
Cj,k
wj . . . wk!1
w・fN( ) = 0.5
1.0 3.0 8.0
1.0
1.1
3.5
2.5 9.0 9.5
2.4 9.5 9.4
5.1 17.0 12.1
![Page 59: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/59.jpg)
Forest Algorithms
Algorithm 2 => Cube Pruning
43
VP
PP1, 3 VP3, 6 PP1, 4 VP4, 6 PP3, 6VP2, 3
hyperedge
NP1, 2
• process all hyperedges simultaneously!significant savings of computation
there are search errors, but the trade-off is favorable.
![Page 60: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/60.jpg)
Forest Algorithms
Forest vs. k-best Oracles• on top of Charniak parser (modified to dump forest)
• forests enjoy higher oracle scores than k-best lists
• with much smaller sizes
44
96.7
97.8
97.2
98.6
![Page 61: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/61.jpg)
Forest Algorithms
Forest vs. k-best Oracles• on top of Charniak parser (modified to dump forest)
• forests enjoy higher oracle scores than k-best lists
• with much smaller sizes
44
96.7
97.8
97.2
98.6
![Page 62: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/62.jpg)
Forest Algorithms
Main Results
baseline: 1-best Charniak parser 89.72 feature extract
approach training time F1% space time
50-best reranking 4 x 0.3h 91.43 2.4G 19h
100-best reranking 4 x 0.7h 91.49 5.3G 44h
forest reranking 4 x 6.1h 91.69 1.2G 2.9h
• forest reranking beats 50-best & 100-best reranking
• can be trained on the whole treebank in ~1 day even with a pure Python implementation!
• most previous work only scaled to short sentences (<=15 words) and local features
45
![Page 63: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/63.jpg)
Forest Algorithms
Main Results
baseline: 1-best Charniak parser 89.72 feature extract
approach training time F1% space time
50-best reranking 4 x 0.3h 91.43 2.4G 19h
100-best reranking 4 x 0.7h 91.49 5.3G 44h
forest reranking 4 x 6.1h 91.69 1.2G 2.9h
• forest reranking beats 50-best & 100-best reranking
• can be trained on the whole treebank in ~1 day even with a pure Python implementation!
• most previous work only scaled to short sentences (<=15 words) and local features
45
![Page 64: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/64.jpg)
Forest Algorithms
Comparison with Others
46
type system F1%
D
G
S
Collins (2000) 89.7
Charniak and Johnson (2005) 91.0
updated (2006) 91.4
Petrov and Klein (2008) 88.3
this work 91.7
Carreras et al. (2008) 91.1
Bod (2000) 90.7
Petrov and Klein (2007) 90.1
McClosky et al. (2006) 92.1
best accuracy to date on the Penn Treebank, and fast training
n-bestreranking
dynamicprogramming
semi-supervised
![Page 65: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/65.jpg)
on to Machine Translation...
applying the same ideas of non-locality...
![Page 66: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/66.jpg)
Forest Algorithms
Translate Server Error
48
![Page 67: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/67.jpg)
Forest Algorithms
Translate Server Error
48clear evidence that MT is used in real life.
![Page 68: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/68.jpg)
Forest Algorithms
Context in Translation
49
![Page 69: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/69.jpg)
Forest Algorithms
Context in Translation
49
Algorithm 2 => cube pruning
fluency problem (n-gram)
![Page 70: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/70.jpg)
Forest Algorithms
Context in Translation
49
xiaoxin
小心 X <=> be careful not to X
syntax problem (SCFG)
Algorithm 2 => cube pruning
fluency problem (n-gram)
![Page 71: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/71.jpg)
Forest Algorithms
Context in Translation
49
xiaoxin
小心 X <=> be careful not to X
syntax problem (SCFG)
Algorithm 2 => cube pruning
fluency problem (n-gram)
xiaoxin gou
小心 狗 <=> be aware of dog
![Page 72: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/72.jpg)
Forest Algorithms
Context in Translation
49
xiaoxin
小心 X <=> be careful not to X
syntax problem (SCFG)
Algorithm 2 => cube pruning
fluency problem (n-gram)
小心 VP <=> be careful not to VP
小心 NP <=> be careful of NP xiaoxin gou
小心 狗 <=> be aware of dog
![Page 73: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/73.jpg)
Forest Algorithms
How do people translate?1. understand the source language sentence
2. generate the target language translation
50
Bush holdand/with
meetingSharon [past.]
布什 举行与 会谈沙龙 了
Bùshí juxíngyu huìtánShalóng le
![Page 74: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/74.jpg)
Forest Algorithms
How do people translate?1. understand the source language sentence
2. generate the target language translation
50
Bush holdand/with
meetingSharon [past.]
布什 举行与 会谈沙龙 了
Bùshí juxíngyu huìtánShalóng le
![Page 75: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/75.jpg)
Forest Algorithms
How do people translate?1. understand the source language sentence
2. generate the target language translation
50
Bush holdand/with
meetingSharon [past.]
“Bush held a meeting with Sharon”
布什 举行与 会谈沙龙 了
Bùshí juxíngyu huìtánShalóng le
![Page 76: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/76.jpg)
Forest Algorithms
How do compilers translate?1. parse high-level language program into a syntax tree
2. generate intermediate or machine code accordingly
51
x3 = y + 3;
![Page 77: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/77.jpg)
Forest Algorithms
How do compilers translate?1. parse high-level language program into a syntax tree
2. generate intermediate or machine code accordingly
51
x3 = y + 3;
![Page 78: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/78.jpg)
Forest Algorithms
How do compilers translate?1. parse high-level language program into a syntax tree
2. generate intermediate or machine code accordingly
51
x3 = y + 3;
LD R1, id2ADDF R1, R1, #3.0 // add floatRTOI R2, R1 // real to intST id1, R2
![Page 79: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/79.jpg)
Forest Algorithms
How do compilers translate?1. parse high-level language program into a syntax tree
2. generate intermediate or machine code accordingly
51
x3 = y + 3;
LD R1, id2ADDF R1, R1, #3.0 // add floatRTOI R2, R1 // real to intST id1, R2
syntax-directed translation (~1960)
![Page 80: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/80.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• get 1-best parse tree; then convert to English
52
Bush holdand/with
meetingSharon [past.]
“Bush held a meeting with Sharon”
![Page 81: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/81.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
53
• recursive rewrite by pattern-matching
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 82: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/82.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
53
• recursive rewrite by pattern-matching
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 83: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/83.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
53
• recursive rewrite by pattern-matching
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 84: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/84.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
53
• recursive rewrite by pattern-matching
with
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 85: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/85.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• recursively solve unfinished subproblems
54
with
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 86: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/86.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• recursively solve unfinished subproblems
54
with
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 87: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/87.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• recursively solve unfinished subproblems
54
withBush
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 88: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/88.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• recursively solve unfinished subproblems
54
held
withBush
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 89: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/89.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• continue pattern-matching
55
Bush held with
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 90: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/90.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• continue pattern-matching
55
Bush held with
a meeting Sharon
(Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006)
![Page 91: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/91.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• continue pattern-matching
56
Bush held witha meeting Sharon
![Page 92: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/92.jpg)
Forest Algorithms
Syntax-Directed Machine Translation
• continue pattern-matching
56
this method is simple, fast, and expressive.
but... crucial difference between PL and NL:
ambiguity!
using 1-best parse causes error propagation!
idea: use k-best parses?
use a parse forest!
Bush held witha meeting Sharon
![Page 93: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/93.jpg)
Forest Algorithms
Forest-based Translation
57“and” / “with”
![Page 94: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/94.jpg)
Forest Algorithms
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
![Page 95: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/95.jpg)
Forest Algorithms
“and”
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
![Page 96: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/96.jpg)
Forest Algorithms
“and”
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
![Page 97: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/97.jpg)
Forest Algorithms
“and”
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
![Page 98: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/98.jpg)
Forest Algorithms
“and”
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
![Page 99: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/99.jpg)
Forest Algorithms
“and”
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
![Page 100: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/100.jpg)
Forest Algorithms
“and”
Forest-based Translation
58
pattern-matching on forest
“and” / “with”
directed by underspecified syntax
![Page 101: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/101.jpg)
Forest Algorithms
Translation Forest
59
![Page 102: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/102.jpg)
Forest Algorithms
Translation Forest
59
![Page 103: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/103.jpg)
Forest Algorithms
Translation Forest
59
“held a meeting”
“Sharon”“Bush”
![Page 104: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/104.jpg)
Forest Algorithms
Translation Forest
59
“held a meeting”
“Sharon”“Bush”
“Bush held a meeting with Sharon”
![Page 105: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/105.jpg)
Forest Algorithms
The Whole Pipeline
60
parse forest
translation forest
translation+LM forest
parser
pattern-matching w/translation rules (exact)
Algorithm 2 => cube pruning (approx.)
Algorithm 3 (exact)best derivation
(Huang and Chiang, 2005; 2007; Chiang, 2007)
input sentence
1-best translation k-best translations
![Page 106: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/106.jpg)
Forest Algorithms
The Whole Pipeline
60
parse forest
translation forest
translation+LM forest
parser
pattern-matching w/translation rules (exact)
Algorithm 2 => cube pruning (approx.)
Algorithm 3 (exact)best derivation
pack
ed fo
rest
s
(Huang and Chiang, 2005; 2007; Chiang, 2007)
input sentence
1-best translation k-best translations
![Page 107: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/107.jpg)
Forest Algorithms
k-best trees vs. forest-based
61
1.7 Bleu improvement over 1-best, 0.8 over 30-best, and even faster!
![Page 108: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/108.jpg)
Forest Algorithms
forest as virtual ∞-best list• how often is the ith-best tree picked by the decoder?
62
32%
bey
ond
100-
best
20%
bey
ond
1000
-bes
t
1000
suggested byMark Johnson
![Page 109: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/109.jpg)
Forest Algorithms
Larger Decoding Experiments• 2.2M sentence pairs (57M Chinese and 62M English words)
• larger trigram models (1/3 of Xinhua Gigaword)
• also use bilingual phrases (BP) as flat translation rules
• phrases that are consistent with syntactic constituents
• forest enables larger improvement with BP
63
T2S T2S+BP1-best tree
30-best treesforest
improvement
0.2666 0.2939
0.2755 0.3084
0.2839 0.3149
1.7 2.1
![Page 110: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/110.jpg)
Forest Algorithms
Conclusions: Dynamic Programming
• A general framework of DP on monotonic hypergraphs
• Exact k-best DP algorithms (monotonic)
• Approximate DP with non-local features (non-monotonic)
• Forest Reranking for discriminative parsing
• Forest Rescoring for MT decoding
• Forest-based Translation
• translates a parse forest of millions of trees
• even faster than translating top-30 trees (and better)
• Future Directions: even faster search with richer info...64
![Page 111: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/111.jpg)
Forest is your friend. Save the forest.
Thank you!
![Page 112: Forest-based Algorithms in - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/slides/thesis-CMU.pdfForest-based Algorithms in Natural Language Processing includes joint work](https://reader033.fdocuments.in/reader033/viewer/2022042105/5e83350aab6175700857953b/html5/thumbnails/112.jpg)
Forest Algorithms
Global Feature - RightBranch
66
• length of rightmost (non-punctuation) path
• English has a right-branching tendency
(Charniak and Johnson, 2005)
can not be factored anywherehave to wait till root
(punctuation or not is ambiguous:’: possessive or right quote?)