Recent Advances in Dependency Parsing
Tutorial, EACL, April 27th, 2014
Ryan McDonald1 Joakim Nivre2
1Google Inc., USA/UKE-mail: [email protected]
2Uppsala University, SwedenE-mail: [email protected]
Recent Advances in Dependency Parsing 1(54)
Introduction
Overview of the Tutorial
I Introduction to Dependency Parsing (Joakim)
I Graph-based parsing post-2008 (Ryan)
I Transition-based parsing post-2008 (Joakim)
I Summary and final thoughts (Ryan)
Recent Advances in Dependency Parsing 2(54)
Introduction
Transition-Based Dependency Parsing
Configuration: (S ,B,A)
Initial: ([ ], [0, 1, . . . , n], { })Terminal: (S , [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Reduce: (S |i ,B,A) ⇒ (S ,B,A)
Right-Arc(k): (S |i , j |B,A) ⇒ (S |i |j ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i , j |B,A) ⇒ (S , j |B,A ∪ {(j , i , k)})
⇔Economic news had little effect on financial markets .
adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
Recent Advances in Dependency Parsing 3(54)
Introduction
Overview
I Improved learning and inferenceI Beam search and structured predictionI Dynamic programmingI Easy-first parsingI Dynamic oracles
I Non-projective parsingI Online reorderingI Multiplanar parsing
I Joint morphological and syntactic analysis
Recent Advances in Dependency Parsing 4(54)
Transition-Based Dependency Parsing
The Basic Idea
I Define a transition system for dependency parsing
I Learn a model for scoring possible transitions
I Parse by searching for the optimal transition sequence
Recent Advances in Dependency Parsing 5(54)
Transition-Based Dependency Parsing
Arc-Eager Transition System [Nivre 2003]
Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]
Initial: ([ ], [0, 1, . . . , n], { })Terminal: (S , [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Reduce: (S |i ,B,A) ⇒ (S ,B,A) h(i ,A)
Right-Arc(k): (S |i , j |B,A) ⇒ (S |i |j ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i , j |B,A) ⇒ (S , j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0
Notation: S|i = stack with top i and remainder S
j |B = buffer with head j and remainder B
h(i ,A) = i has a head in A
Recent Advances in Dependency Parsing 6(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT]S [Economic, news, had, little, effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
proot
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, Economic]S [news, had, little, effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
proot
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT]S [news, had, little, effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod
nsubj
dobj
amod prep
pmod
amod
proot
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, news]S [had, little, effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod
nsubj
dobj
amod prep
pmod
amod
proot
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT]S [had, little, effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
proot
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had]S [little, effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, little]S [effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had]S [effect, on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect]S [on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect, on]S [financial, markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect, on, financial]S [markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect, on]S [markets, .]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect, on, markets]S [.]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect, on]S [.]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, effect]S [.]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had]S [.]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
p
root
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Example Transition Sequence
[ROOT, had, .]S [ ]B
ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod prep
pmod
amod
proot
Recent Advances in Dependency Parsing 7(54)
Transition-Based Dependency Parsing
Arc-Standard Transition System [Nivre 2004]
Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]
Initial: ([ ], [0, 1, . . . , n], { })Terminal: ([0], [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Right-Arc(k): (S |i |j ,B,A) ⇒ (S |i ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,A) ⇒ (S |j ,B,A ∪ {(j , i , k)}) i 6= 0
Recent Advances in Dependency Parsing 8(54)
Transition-Based Dependency Parsing
Greedy Inference
I Given an oracle o that correctly predicts the next transitiono(c), parsing is deterministic:
Parse(w1, . . . ,wn)1 c ← ([ ]S , [0, 1, . . . , n]B , { })2 while Bc 6= [ ]3 t ← o(c)4 c ← t(c)5 return G = ({0, 1, . . . , n},Ac)
I Complexity given by upper bound on number of transitions
I Parsing in O(n) time for the arc-eager transition system
Recent Advances in Dependency Parsing 9(54)
Transition-Based Dependency Parsing
From Oracles to Classifiers
I An oracle can be approximated by a (linear) classifier:
o(c) = argmaxt
w · f(c, t)
I History-based feature representation f(c, t)
I Weight vector w learned from treebank data
Recent Advances in Dependency Parsing 10(54)
Transition-Based Dependency Parsing
Feature Representation
I Features over input tokens relative to S and B
I Features over the (partial) dependency graph defined by A
I Features over the (partial) transition sequence
Configuration Features
[ROOT, had, effect]S [on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
root
pos(S2) = ROOT
pos(S1) = verbpos(S0) = nounpos(B0) = preppos(B1) = adjpos(B2) = noun
I Feature representation unconstrained by parsing algorithm
Recent Advances in Dependency Parsing 11(54)
Transition-Based Dependency Parsing
Feature Representation
I Features over input tokens relative to S and B
I Features over the (partial) dependency graph defined by A
I Features over the (partial) transition sequence
Configuration Features
[ROOT, had, effect]S [on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
root
word(S2) = ROOT
word(S1) = hadword(S0) = effectword(B0) = onword(B1) = financialword(B2) = markets
I Feature representation unconstrained by parsing algorithm
Recent Advances in Dependency Parsing 11(54)
Transition-Based Dependency Parsing
Feature Representation
I Features over input tokens relative to S and B
I Features over the (partial) dependency graph defined by A
I Features over the (partial) transition sequence
Configuration Features
[ROOT, had, effect]S [on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
root
dep(S1) = rootdep(lc(S1)) = nsubjdep(rc(S1)) = dobjdep(S0) = dobjdep(lc(S0) = amoddep(rc(S0) = NIL
I Feature representation unconstrained by parsing algorithm
Recent Advances in Dependency Parsing 11(54)
Transition-Based Dependency Parsing
Feature Representation
I Features over input tokens relative to S and B
I Features over the (partial) dependency graph defined by A
I Features over the (partial) transition sequence
Configuration Features
[ROOT, had, effect]S [on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
root
ti−1 = Right-Arc(dobj)ti−2 = Left-Arc(amod)ti−3 = Shiftti−4 = Right-Arc(root)ti−5 = Left-Arc(nsubj)ti−6 = Shift
I Feature representation unconstrained by parsing algorithm
Recent Advances in Dependency Parsing 11(54)
Transition-Based Dependency Parsing
Feature Representation
I Features over input tokens relative to S and B
I Features over the (partial) dependency graph defined by A
I Features over the (partial) transition sequence
Configuration Features
[ROOT, had, effect]S [on, financial, markets, .]B
ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .
amod nsubj
dobj
amod
root
ti−1 = Right-Arc(dobj)ti−2 = Left-Arc(amod)ti−3 = Shiftti−4 = Right-Arc(root)ti−5 = Left-Arc(nsubj)ti−6 = Shift
I Feature representation unconstrained by parsing algorithm
Recent Advances in Dependency Parsing 11(54)
Transition-Based Dependency Parsing
Local Learning
I Given a treebank:I Reconstruct oracle transition sequence for each sentenceI Construct training data set D = {(c, t) | o(c) = t}I Maximize accuracy of local predictions o(c) = t
I Any (unstructured) classifier will do (SVMs are popular)
I Training is local and restricted to oracle configurations
Recent Advances in Dependency Parsing 12(54)
Transition-Based Dependency Parsing
Greedy, Local, Transition-Based Parsing
I Advantages:I Highly efficient parsing – linear time complexity with constant
time oracles and transitionsI Rich history-based feature representations – no rigid
constraints from inference algorithm
I Drawback:I Sensitive to search errors and error propagation due to greedy
inference and local learning
I The major question in transition-based parsing has been howto improve learning and inference, while maintaining highefficiency and rich feature models
Recent Advances in Dependency Parsing 13(54)
Improved Learning and Inference
Beam Search
I Maintain the k best hypotheses [Johansson and Nugues 2006]:
Parse(w1, . . . ,wn)1 Beam ← {([ ]S , [0, 1, . . . , n]B , { })}2 while ∃c ∈ Beam [Bc 6= [ ]]3 foreach c ∈ Beam4 foreach t5 Add(t(c), NewBeam)6 Beam ← Top(k, NewBeam)7 return G = ({0, 1, . . . , n},ATop(1, Beam))
I Note:I Score(c0, . . . , cm) =
∑mi=1 w · f(ci−1, ti )
I Simple combination of locally normalized classifier scoresI Marginal gains in accuracy
Recent Advances in Dependency Parsing 14(54)
Improved Learning and Inference
Structured Prediction
I Parsing as structured prediction [Zhang and Clark 2008]:I Minimize loss over entire transition sequenceI Use beam search to find highest-scoring sequence
I Factored feature representations:
f(c0, . . . , cm) =m∑i=1
f(ci−1, ti )
I Online learning from oracle transition sequences:I Structured perceptron [Collins 2002]I Early update [Collins and Roark 2004]I Max-violation update [Huang et al. 2012]
Recent Advances in Dependency Parsing 15(54)
Improved Learning and Inference
Beam Size and Training Iterations
[Zhang and Clark 2008]
Recent Advances in Dependency Parsing 16(54)
Improved Learning and Inference
The Best of Two Worlds?
I Like graph-based dependency parsing (MSTParser):I Global learning – minimize loss over entire sentenceI Non-greedy search – accuracy increases with beam size
I Like (old school) transition-based parsing (MaltParser):I Highly efficient – complexity still linear for fixed beam sizeI Rich features – no constraints from parsing algorithm
Recent Advances in Dependency Parsing 17(54)
Improved Learning and Inference
Precision by Dependency Length
2 4 6 8 10 12 14
0.4
0.5
0.6
0.7
0.8
0.9 MSTMaltZPar
[Zhang and Nivre 2012]
Recent Advances in Dependency Parsing 18(54)
Improved Learning and Inference
Even Richer Feature Models
ZPar Malt
Baseline 92.18 89.37+distance +0.07 –0.14+valency +0.24 0.00+unigrams +0.40 –0.29+third-order +0.18 0.00+label set +0.07 +0.06Extended 93.14 89.00
[Zhang and Nivre 2011, Zhang and Nivre 2012]
I Adding graph-based features may require special techniques[Zhang and Clark 2008, Bohnet and Kuhn 2012]
Recent Advances in Dependency Parsing 19(54)
Improved Learning and Inference
Dynamic Programming
I If beam search reduces search errors, why not exact inference?I Dynamic programming for transition-based parsers:
I Using a graph-structured stack [Huang and Sagae 2010]I Using push-computations [Kuhlmann et al. 2011]
I Adds constraints on feature representations
Recent Advances in Dependency Parsing 20(54)
Improved Learning and Inference
Deduction System for Arc-Eager Parsing
Items: [ib, j ] ⇔ (S , i |B,A)⇒∗ (S |i , j |B ′,A′)
b =
{1 if [[h(i) ∈ A′]]
0 otherwise
Goal: [00, n + 1]
Axiom: [00, 1]
Rules: Shift: [ib, j ]⇒ [j0, j + 1]
Reduce: [ib,m] ∧ [m1, j ]⇒ [ib, j ]
Right-Arc: [ib, j ]⇒ [j1, j + 1]
Left-Arc: [ib,m] ∧ [m0, j ]⇒ [ib, j ]
[Kuhlmann et al. 2011]
Recent Advances in Dependency Parsing 21(54)
Improved Learning and Inference
Theory and Practice
I Theoretical results:I Arc-eager parsing in O(n3) time (cf. Eisner)I Arc-standard parsing in O(n5) time (cf. CKY)
I In practice:I Results hold only for very simplistic feature modelsI Practical implementations use beam searchI Benefits from ambiguity packing
2370
2373
2376
2379
2382
2385
2388
2391
2394
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
avg.
model
sco
re
b=16 b=64
DPnon-DP
92.2
92.3
92.4
92.5
92.6
92.7
92.8
92.9
93
93.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
dep
enden
cy a
ccura
cy
b=16 b=64
DPnon-DP
(a) search quality vs. time (full model) (b) parsing accuracy vs. time (full model)
2290
2295
2300
2305
2310
2315
2320
2325
2330
2335
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
avg. m
odel
sco
re
b=16b=64
DPnon-DP
88.5
89
89.5
90
90.5
91
91.5
92
92.5
93
93.5
2280 2300 2320 2340 2360 2380 2400
dep
enden
cy a
ccura
cy
full, DPfull, non-DP
edge-factor, DPedge-factor, non-DP
(c) search quality vs. time (edge-factored model) (d) correlation b/w parsing (y) and search (x)
Figure 5: Speed comparisons between DP and non-DP, with beam size b ranging 2�16 for DP and 2�64for non-DP. Speed is measured by avg. parsing time (secs) per sentence on x axis. With the same levelof search quality or parsing accuracy, DP (at b=16) is �4.8 times faster than non-DP (at b=64) with thefull model in plots (a)-(b), or�8 times faster with the simplified edge-factored model in plot (c). Plot (d)shows the (roughly linear) correlation between parsing accuracy and search quality (avg. model score).
100
102
104
106
108
1010
1012
0 10 20 30 40 50 60 70
num
ber
of
tree
s ex
plo
red
sentence length
DP forestnon-DP (16)
93
94
95
96
97
98
99
64 32 16 8 4 1
ora
cle
pre
cisi
on
k
DP forest (98.15)DP k-best in forest
non-DP k-best in beam
(a) sizes of search spaces (b) oracle precision on dev
Figure 6: DP searches over a forest of exponentially many trees, which also produces better and longerk-best lists with higher oracles, while non-DP only explores b trees allowed in the beam (b = 16 here).
1083
2370
2373
2376
2379
2382
2385
2388
2391
2394
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
avg.
model
sco
re
b=16 b=64
DPnon-DP
92.2
92.3
92.4
92.5
92.6
92.7
92.8
92.9
93
93.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
dep
enden
cy a
ccura
cy
b=16 b=64
DPnon-DP
(a) search quality vs. time (full model) (b) parsing accuracy vs. time (full model)
2290
2295
2300
2305
2310
2315
2320
2325
2330
2335
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
avg. m
odel
sco
re
b=16b=64
DPnon-DP
88.5
89
89.5
90
90.5
91
91.5
92
92.5
93
93.5
2280 2300 2320 2340 2360 2380 2400
dep
enden
cy a
ccura
cy
full, DPfull, non-DP
edge-factor, DPedge-factor, non-DP
(c) search quality vs. time (edge-factored model) (d) correlation b/w parsing (y) and search (x)
Figure 5: Speed comparisons between DP and non-DP, with beam size b ranging 2�16 for DP and 2�64for non-DP. Speed is measured by avg. parsing time (secs) per sentence on x axis. With the same levelof search quality or parsing accuracy, DP (at b=16) is �4.8 times faster than non-DP (at b=64) with thefull model in plots (a)-(b), or�8 times faster with the simplified edge-factored model in plot (c). Plot (d)shows the (roughly linear) correlation between parsing accuracy and search quality (avg. model score).
100
102
104
106
108
1010
1012
0 10 20 30 40 50 60 70
num
ber
of
tree
s ex
plo
red
sentence length
DP forestnon-DP (16)
93
94
95
96
97
98
99
64 32 16 8 4 1
ora
cle
pre
cisi
on
k
DP forest (98.15)DP k-best in forest
non-DP k-best in beam
(a) sizes of search spaces (b) oracle precision on dev
Figure 6: DP searches over a forest of exponentially many trees, which also produces better and longerk-best lists with higher oracles, while non-DP only explores b trees allowed in the beam (b = 16 here).
1083
[Huang and Sagae 2010]
Recent Advances in Dependency Parsing 22(54)
Improved Learning and Inference
The Need for Speed
I Beam search helps but slows down the parser
I Dynamic programming in addition constrains feature modelI What can we do to maintain the highest speed?
I Easy-first parsing – give up left-to-right incremental searchI Dynamic oracles – learn how to recover from errors
I These two ideas can be combined
Recent Advances in Dependency Parsing 23(54)
Improved Learning and Inference
Easy-First Non-Directional Parsing
I Process dependencies from easy to hard (not left to right) andfrom local to global (bottom up) [Goldberg and Elhadad 2010]
Configuration: (L,A) [L = List, A = Arcs]
Initial: ([0, 1, . . . , n], { })Terminal: ([0],A)
Attach-Right(i , k):([v1, . . . , vm],A) ⇒ ([v1, . . . , vi−1, vi+1, . . . , vm],A ∪ {(vi+1, vi , k)})Attach-Left(i , k):([v1, . . . , vm],A) ⇒ ([v1, . . . , vi , vi+2, . . . , vm],A ∪ {(vi , vi+1, k)})
Recent Advances in Dependency Parsing 24(54)
Improved Learning and Inference
Parsing Algorithm
I Given an oracle o that selects the highest-confidencetransition o(c), parsing is deterministic:
Parse(w1, . . . ,wn)1 c ← ([0, 1, . . . , n], { })2 while length(Lc) > 13 t ← o(c)4 c ← t(c)5 return G = ({0, 1, . . . , n},Ac)
I Number of possible transitions grows with sentence length
I Parsing in O(n log n) time with priority heap
Recent Advances in Dependency Parsing 25(54)
Improved Learning and Inference
Parsing Example
(1) ATTACHRIGHT(2)
a brown fox jumped with joy
-157
-27
-68
403
-197
-47
-152
-243
231
3
(2) ATTACHRIGHT(1)
a fox
brown
jumped with joy-52
314
-159
0
-176
-146
246
12
(3) ATTACHRIGHT(1)
fox
a brown
jumped with joy
-133
270
-149
-154
246
10
(4) ATTACHLEFT(2)
jumped
fox
a brown
with joy
-161
-435
186
-2
(5) ATTACHLEFT(1)
jumped
fox
a brown
with
joy
430
-232
(6)
jumped
fox
a brown
with
joy
Figure 1: Parsing the sentence “a brown fox jumped with joy”. Rounded arcs represent possible actions.
tionally intensive sampling-based methods (Naka-gawa, 2007). As a result, these models, while accu-rate, are slow (O(n3) for projective, first-order mod-els, higher polynomials for higher-order models, andworse for richer tree-feature models).
We propose a new category of dependency pars-ing algorithms, inspired by (Shen et al., 2007): non-directional easy-first parsing. This is a greedy, de-terministic parsing approach, which relaxes the left-to-right processing order of transition-based pars-ing algorithms. By doing so, we allow the ex-plicit incorporation of rich structural features de-rived from both sides of the attachment point, andimplicitly take into account the entire previously de-rived structure of the whole sentence. This exten-sion allows the incorporation of much richer featuresthan those available to transition- and especially tograph-based parsers, and greatly reduces the local-ity of transition-based algorithm decisions. On theother hand, it is still a greedy, best-first algorithmleading to an efficient implementation.
We present a concrete O(nlogn) parsing algo-rithm, which significantly outperforms state-of-the-art transition-based parsers, while closing the gap tograph-based parsers.2 Easy-first parsingWhen humans comprehend a natural language sen-tence, they arguably do it in an incremental, left-to-
right manner. However, when humans consciouslyannotate a sentence with syntactic structure, theyhardly ever work in fixed left-to-right order. Rather,they start by building several isolated constituentsby making easy and local attachment decisions andonly then combine these constituents into biggerconstituents, jumping back-and-forth over the sen-tence and proceeding from easy to harder phenom-ena to analyze. When getting to the harder decisionsa lot of structure is already in place, and this struc-ture can be used in deciding a correct attachment.
Our parser follows a similar kind of annotationprocess: starting from easy attachment decisions,and proceeding to harder and harder ones. Whenmaking later decisions, the parser has access to theentire structure built in earlier stages. During thetraining process, the parser learns its own notion ofeasy and hard, and learns to defer specific kinds ofdecisions until more structure is available.
3 Parsing algorithm
Our (projective) parsing algorithm builds the parsetree bottom up, using two kinds of actions: AT-TACHLEFT(i) and ATTACHRIGHT(i) . Theseactions are applied to a list of partial structuresp1, . . . , pk, called pending, which is initialized withthe n words of the sentence w1, . . . , wn. Each ac-
[Goldberg and Elhadad 2010]
Recent Advances in Dependency Parsing 26(54)
Improved Learning and Inference
Oracles Revisited
I How do we train the easy-first parser?I Recall our training procedure for greedy parsers:
I Reconstruct oracle transition sequence for each sentenceI Construct training data set D = {(c, t) | o(c) = t}I Maximize accuracy of local predictions o(c) = t
I Presupposes a unique optimal transition for each configurationI Does not make sense for the easy-first parserI Turns out to be a bad idea in general
Recent Advances in Dependency Parsing 27(54)
Improved Learning and Inference
Online Learning with a Conventional Oracle
Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ], [0, 1, . . . , nj ], { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← o(c,Ti )8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)
10 c ← to(c)11 return w
I Oracle o(c,Ti ) returns the optimal transition for c and Ti
Recent Advances in Dependency Parsing 28(54)
Improved Learning and Inference
Online Learning with a Conventional Oracle
Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ], [0, 1, . . . , nj ], { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← o(c,Ti )8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)
10 c ← to(c)11 return w
I Oracle o(c,Ti ) returns the optimal transition for c and Ti
Recent Advances in Dependency Parsing 28(54)
Improved Learning and Inference
Conventional Oracle for Arc-Eager Parsing
o(c,T ) =
Left-Arc if top(Sc) ← first(Bc) in TRight-Arc if top(Sc) → first(Bc) in TReduce if ∃v < top(Sc) : v ↔ first(Bc) in TShift otherwise
I Correct:I Derives T in a configuration sequence Co,T = c0, . . . , cm
I Problems:I Deterministic: Ignores other derivations of TI Incomplete: Valid only for configurations in Co,T
Recent Advances in Dependency Parsing 29(54)
Improved Learning and Inference
Oracle Parse
Transitions:
Stack Buffer Arcs
[ ] [ROOT, He, sent, her, a, letter, .]
ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH
Stack Buffer Arcs
[ROOT] [He, sent, her, a, letter, .]
ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA
Stack Buffer Arcs
[ROOT, He] [sent, her, a, letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA
Stack Buffer Arcs
[ROOT] [sent, her, a, letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH
Stack Buffer Arcs
[ROOT, sent] [her, a, letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA
Stack Buffer Arcs
[ROOT, sent, her] [a, letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA-SH
Stack Buffer Arcs
[ROOT, sent, her, a] [letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA-SH-LA
Stack Buffer Arcs
[ROOT, sent, her] [letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA-SH-LA-RE
Stack Buffer Arcs
[ROOT, sent] [letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA-SH-LA-RE-RA
Stack Buffer Arcs
[ROOT, sent, letter] [.] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA-SH-LA-RE-RA-RE
Stack Buffer Arcs
[ROOT, sent] [.] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Oracle Parse
Transitions: SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
Stack Buffer Arcs
[ROOT, sent, .] [ ] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT He sent her a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 30(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA
Stack Buffer Arcs
[ROOT, sent, her] [a, letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA-RE
Stack Buffer Arcs
[ROOT, sent] [a, letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA-RE-SH
Stack Buffer Arcs
[ROOT, sent, a] [letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA-RE-SH-LA
Stack Buffer Arcs
[ROOT, sent] [letter, .] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA-RE-SH-LA-RA
Stack Buffer Arcs
[ROOT, sent, letter] [.] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA-RE-SH-LA-RA-RE
Stack Buffer Arcs
[ROOT, sent] [.] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Determinisim
Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-RA-RE-SH-LA-RA-RE-RA
Stack Buffer Arcs
[ROOT, sent, .] [ ] ROOTroot−→ sent
Hesbj←− sent
sentiobj−→ her
adet←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 31(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH
Stack Buffer Arcs
[ROOT, sent] [her, a, letter, .] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH
Stack Buffer Arcs
[ROOT, sent, her] [a, letter, .] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH
Stack Buffer Arcs
[ROOT, sent, her, a] [letter, .] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA
Stack Buffer Arcs
[ROOT, sent, her] [letter, .] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH
Stack Buffer Arcs
[ROOT, sent, her, letter] [.] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]
Stack Buffer Arcs
[ROOT, sent, letter, .] [ ] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]
SH-RA-LA-SH-SH-SH-LA
Stack Buffer Arcs
[ROOT, sent, her] [letter, .] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]
SH-RA-LA-SH-SH-SH-LA-LA
Stack Buffer Arcs
[ROOT, sent] [letter, .] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]
SH-RA-LA-SH-SH-SH-LA-LA-RA
Stack Buffer Arcs
[ROOT, sent, letter] [.] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]
SH-RA-LA-SH-SH-SH-LA-LA-RA-RE
Stack Buffer Arcs
[ROOT, sent] [.] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Non-Optimality
Transitions:
SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA
SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]
SH-RA-LA-SH-SH-SH-LA-LA-RA-RE-RA [5/6]
Stack Buffer Arcs
[ROOT, sent, .] [ ] ROOTroot−→ sent
Hesbj←− sent
adet←− letter
her?←− letter
sentdobj−→ letter
sentp−→ .
ROOT She sent him a letter .ROOT pron verb pron det noun .
root
nsubj iobj det
dobj
p
Recent Advances in Dependency Parsing 32(54)
Improved Learning and Inference
Dynamic Oracles
I Optimality:I A transition is optimal if the best tree remains reachableI Best tree = argminT ′ L(T ,T ′)
I Oracle:I Boolean function o(c, t,T ) = true if t is optimal for c and TI Non-deterministic: More than one transition can be optimalI Complete: Correct for all configurations
I New problem:I How do we know which trees are reachable?
Recent Advances in Dependency Parsing 33(54)
Improved Learning and Inference
Reachability for Arcs and Trees
I Arc reachability:I An arc wi → wj is reachable in c iff wi → wj ∈ Ac ,
or wi ∈ Sc ∪ Bc and wj ∈ Bc (same for wi ← wj)
I Tree reachability:I A (projective) tree T is reachable in c iff every arc in T is
reachable in c
I Arc-decomposable systems [Goldberg and Nivre 2013]:I Tree reachability reduces to arc reachabilityI Holds for some transition systems but not all
I Arc-eager and easy-first are arc-decomposableI Arc-standard is not decomposable
Recent Advances in Dependency Parsing 34(54)
Improved Learning and Inference
Oracles for Arc-Decomposable Systems
o(c, t,T ) =
{true if [R(c) − R(t(c))] ∩ T = ∅false otherwise
where R(c) ≡ {a | a is an arc reachable in c }
Arc-Eager
o(c, LA,T ) =
{false if ∃w ∈ Bc : s ↔ w ∈ T (except s ← b)true otherwise
o(c,RA,T ) =
{false if ∃w ∈ Sc : w ↔ b ∈ T (except s → b)true otherwise
o(c,RE,T ) =
{false if ∃w ∈ Bc : s → w ∈ Ttrue otherwise
o(c, SH,T ) =
{false if ∃w ∈ Sc : w ↔ b ∈ Ttrue otherwise
Notation: s = node on top of the stack S
b = first node in the buffer B
Recent Advances in Dependency Parsing 35(54)
Improved Learning and Inference
Online Learning with a Dynamic Oracle
Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ]S , [w1, . . . ,wnj ]B , { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← argmaxt∈{t|o(c,t,Ti )}w · f(c, t)8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)
10 c ← choice(to(c), t∗(c))11 return w
I Ambiguity: use model score to break ties
I Exploration: follow model prediction even if not optimal
Recent Advances in Dependency Parsing 36(54)
Improved Learning and Inference
Online Learning with a Dynamic Oracle
Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ]S , [w1, . . . ,wnj ]B , { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← argmaxt∈{t|o(c,t,Ti )}w · f(c, t)8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)
10 c ← choice(to(c), t∗(c))11 return w
I Ambiguity: use model score to break ties
I Exploration: follow model prediction even if not optimal
Recent Advances in Dependency Parsing 36(54)
Improved Learning and Inference
[Goldberg and Nivre 2012]
Recent Advances in Dependency Parsing 37(54)
Improved Learning and Inference
Ambiguity and Exploration
I Lessons from dynamic oracles:I Do not hide spurious ambiguity from the parser – exploit itI Let the parser explore the consequences of its own mistakes
I Related work:I Bootstrapping [Choi and Palmer 2011]I Selectional branching [Choi and McCallum 2013]I Non-monotonic parsing [Honnibal et al. 2013]I Dynamic parsing strategy [Sartorio et al. 2013]
Recent Advances in Dependency Parsing 38(54)
Improved Learning and Inference
Summary: Learning and Inference
I Beam search and structured prediction:I Explores a larger search space at training and parsing timeI Can be combined with dynamic programming
I Dynamic oracles:I Explores a larger search space only at training timeI Can be combined with selectional branching and with flexible
transition systems (easy-first, dynamic, non-monotonic)
Recent Advances in Dependency Parsing 39(54)
Non-Projective Parsing
Non-Projective Parsing
I So far only projective parsing modelsI Non-projective parsing harder even with greedy inference
I Non-projective: n(n − 1) arcs to consider – O(n2)
I Projective: at most 2(n − 1) arcs to consider – O(n)
I Also harder to construct dynamic oraclesI Conjecture: arc-decomposability presupposes projectivity
Recent Advances in Dependency Parsing 40(54)
Non-Projective Parsing
Previous Approaches
I Pseudo-projective parsing [Nivre and Nilsson 2005]
I Preprocess training data, post-process parser outputI Approximate encoding with incomplete coverageI Relatively high precision but low recall
I Extended arc transitions [Attardi 2006]
I Transitions that add arcs between non-adjacent subtreesI Upper bound on arc degree (limited to local relations)I Exact dynamic programming algorithm [Cohen et al. 2011]
I List-based algorithms [Covington 2001, Nivre 2007]
I Consider all word pairs instead of adjacent subtreesI Increases parsing complexity (and training time)I Improved accuracy and efficiency by adding “projective
transitions” [Choi and Palmer 2011]
Recent Advances in Dependency Parsing 41(54)
Non-Projective Parsing
Novel Approaches
I Online reordering [Nivre 2009, Nivre et al. 2009]:I Reorder words during parsing to make tree projectiveI Add a special transition for swapping adjacent wordsI Quadratic time in the worst case but linear in the best case
I Multiplanar parsing [Gomez-Rodrıguez and Nivre 2010]:I Factor dependency trees into k planes without crossing arcsI Use k stacks to parse each plane separatelyI Linear time parsing with constant k
Recent Advances in Dependency Parsing 42(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Projectivity and Word Order
I Projectivity is a property of a dependency tree only in relationto a particular word order
I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective
order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
0 1 2 6 7 3 4 5 8 9
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 43(54)
Non-Projective Parsing
Transition System for Online Reordering
Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]
Initial: ([ ], [0, 1, . . . , n], { })Terminal: ([0], [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Right-Arc(k): (S |i |j ,B,A) ⇒ (S |i ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,A) ⇒ (S |j ,B,A ∪ {(j , i , k)}) i 6= 0
Swap: (S |i |j ,B,A) ⇒ (S |j , i |B,A) 0 < i < j
I Transition-based parsing with two interleaved processes:
1. Sort words into projective order <p
2. Build tree T by connecting adjacent subtrees
I T is projective with respect to <p but not (necessarily) <
Recent Advances in Dependency Parsing 44(54)
Non-Projective Parsing
Transition System for Online Reordering
Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]
Initial: ([ ], [0, 1, . . . , n], { })Terminal: ([0], [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Right-Arc(k): (S |i |j ,B,A) ⇒ (S |i ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,A) ⇒ (S |j ,B,A ∪ {(j , i , k)}) i 6= 0
Swap: (S |i |j ,B,A) ⇒ (S |j , i |B,A) 0 < i < j
I Transition-based parsing with two interleaved processes:
1. Sort words into projective order <p
2. Build tree T by connecting adjacent subtrees
I T is projective with respect to <p but not (necessarily) <
Recent Advances in Dependency Parsing 44(54)
Non-Projective Parsing
Example Transition Sequence
[ ]S [ROOT, A, hearing, is, scheduled, on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT]S [A, hearing, is, scheduled, on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A]S [hearing, is, scheduled, on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing]S [is, scheduled, on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing]S [is, scheduled, on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, is]S [scheduled, on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, is, scheduled]S [on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled]S [on, the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled, on]S [the, issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled, on, the]S [issue, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled, on, the, issue]S [today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled, on, issue]S [today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled, on]S [today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
pobj
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, on]S [scheduled, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
pobj
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing]S [scheduled, today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
prep
pobj
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, scheduled]S [today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
prep
pobj
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S [today, .]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, today]S [.]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S [.]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
prep
pobj
det
tmod
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, .]S [ ]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
prep
pobj
det
tmod
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S [ ]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT]S [ ]B
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
tmod
Recent Advances in Dependency Parsing 45(54)
Non-Projective Parsing
Analysis
I Correctness:I Sound and complete for the class of non-projective trees
I Complexity for greedy or beam search parsing:I Quadratic running time in the worst caseI Linear running time in the average case
I Works well with beam search and structured prediction
Czech GermanLAS UAS LAS UAS
Projective 80.8 86.3 86.2 88.5Reordering 83.9 89.1 88.7 90.9
[Bohnet and Nivre 2012]
Recent Advances in Dependency Parsing 46(54)
Non-Projective Parsing
Multiplanarity
I Multiplanarity is based on the notion of planarity:I A dependency graph is planar if it has no crossing arcsI A dependency graph is k-planar if it can be decomposed into
(at most) k planar graphs [Yli-Jyra 2003]
I In most treebanks, well over 99% of the trees are at most2-planar [Gomez-Rodrıguez and Nivre 2010]
I We can parse k-planar graphs in linear time using k stacks
Recent Advances in Dependency Parsing 47(54)
Non-Projective Parsing
1-Planar Transition System
Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]
Initial: ([ ], [0, 1, . . . , n], { })Terminal: (S , [ ],A)
Shift: (S , i |B,A) ⇒ (S |i ,B,A)
Reduce: (S |i ,B,A) ⇒ (S ,B,A)
Right-Arc(k): (S |i , j |B,A) ⇒ (S |i , j |B,A ∪ {(i , j , k)}) ¬h(j ,A)
Left-Arc(k): (S |i , j |B,A) ⇒ (S |i , j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0
I Similar to the arc-eager system except:I Reduce does not require popped node to have a headI Left-Arc/Right-Arc do not affect S or B
Recent Advances in Dependency Parsing 48(54)
Non-Projective Parsing
2-Planar Transition System
Configuration: (S1,S2,B,A) [S1 = Stack 1, S2 = Stack 2]
Initial: ([ ], [ ], [0, 1, . . . , n], { })Terminal: (S1,S2, [ ],A)
Shift: (S1,S2, i |B,A) ⇒ (S1|i ,S2|i ,B,A)
Reduce: (S1|i ,S2,B,A) ⇒ (S1, S2,B,A)
Right-Arc(k): (S1|i ,S2, j |B,A) ⇒ (S1|i ,S2, j |B,A ∪ {(i , j , k)}) ¬h(j ,A)
Left-Arc(k): (S1|i ,S2, j |B,A) ⇒ (S1|i ,S2, j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0
Switch: (S1, S2,B,A) ⇒ (S2, S1,B,A)
I Similar to 1-planar system except:I Shift pushes a node to both stacksI Left-Arc/Right-Arc/Reduce only affect S1
I Switch swaps S1 and S2
Recent Advances in Dependency Parsing 49(54)
Non-Projective Parsing
Example Transition Sequence
[ ]S1 [ROOT, A, hearing, is, scheduled, on, the, issue, today, .]B
[ ]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT]S1 [A, hearing, is, scheduled, on, the, issue, today, .]B
[ROOT]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A]S1 [hearing, is, scheduled, on, the, issue, today, .]B
[ROOT, A]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A]S1 [hearing, is, scheduled, on, the, issue, today, .]B
[ROOT, A]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT]S1 [hearing, is, scheduled, on, the, issue, today, .]B
[ROOT, A]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing]S1 [is, scheduled, on, the, issue, today, .]B
[ROOT, A, hearing]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, is]S1 [scheduled, on, the, issue, today, .]B
[ROOT, A, hearing, is]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing, is]S1 [scheduled, on, the, issue, today, .]B
[ROOT, A, hearing, is]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing]S1 [scheduled, on, the, issue, today, .]B
[ROOT, A, hearing, is]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, hearing]S1 [scheduled, on, the, issue, today, .]B
[ROOT, A, hearing, is]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT]S1 [scheduled, on, the, issue, today, .]B
[ROOT, A, hearing, is]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT]S1 [scheduled, on, the, issue, today, .]B
[ROOT, A, hearing, is]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S1 [on, the, issue, today, .]B
[ROOT, A, hearing, is, scheduled]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, is, scheduled]S1 [on, the, issue, today, .]B
[ROOT, scheduled]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, is]S1 [on, the, issue, today, .]B
[ROOT, scheduled]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing]S1 [on, the, issue, today, .]B
[ROOT, scheduled]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing]S1 [on, the, issue, today, .]B
[ROOT, scheduled]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, on]S1 [the, issue, today, .]B
[ROOT, scheduled, on]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, on, the]S1 [issue, today, .]B
[ROOT, scheduled, on, the]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, on, the]S1 [issue, today, .]B
[ROOT, scheduled, on, the]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, on]S1 [issue, today, .]B
[ROOT, scheduled, on, the]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, on]S1 [issue, today, .]B
[ROOT, scheduled, on, the]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, A, hearing, on, issue]S1 [today, .]B
[ROOT, scheduled, on, the, issue]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, on, the, issue]S1 [today, .]B
[ROOT, A, hearing, on, issue]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, on, the]S1 [today, .]B
[ROOT, A, hearing, on, issue]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, on]S1 [today, .]B
[ROOT, A, hearing, on, issue]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S1 [today, .]B
[ROOT, A, hearing, on, issue]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S1 [today, .]B
[ROOT, A, hearing, on, issue]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
tmod
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, today]S1 [.]B
[ROOT, A, hearing, on, issue, today]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
tmod
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S1 [.]B
[ROOT, A, hearing, on, issue, today]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
tmod
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled]S1 [.]B
[ROOT, A, hearing, on, issue, today]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 50(54)
Non-Projective Parsing
Example Transition Sequence
[ROOT, scheduled, .]S1 [ ]B
[ROOT, A, hearing, on, issue, today, .]S2
ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .
root
det aux
nsubj
prep
pobj
det
tmod
p
Recent Advances in Dependency Parsing 50(54)
Joint Morphological and Syntactic Analysis
Morphology and Syntax
I Morphological analysis in dependency parsing:I Crucially assumed as input, not predicted by the parserI Pipeline approach may lead to error propagationI Most PCFG-based parsers at least predict their own tags
I Recent interest in joint models for morphology and syntax:I Graph-based [McDonald 2006, Lee et al. 2011, Li et al. 2011]I Transition-based [Hatori et al. 2011, Bohnet and Nivre 2012]
I Can improve both morphology and syntax
Recent Advances in Dependency Parsing 51(54)
Joint Morphological and Syntactic Analysis
Transition System for Morphology and Syntax
Configuration: (S ,B,M,A) [M = Morphology]
Initial: ([ ], [0, 1, . . . , n], { }, { })Terminal: ([0], [ ],M,A)
Shift(p): (S , i |B,M,A) ⇒ (S |i ,B,M ∪ {(i ,m)},A)
Right-Arc(k): (S |i |j ,B,M,A) ⇒ (S |i ,B,M,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,M,A) ⇒ (S |j ,B,M,A ∪ {(j , i , k)}) i 6= 0
Swap: (S |i |j ,B,M,A) ⇒ (S |j , i |B,M,A) 0 < i < j
I Transition-based parsing with three interleaved processes:I Assign morphology when words are shifted onto the stackI Optionally sort words into projective order <p
I Build dependency tree T by connecting adjacent subtrees
Recent Advances in Dependency Parsing 52(54)
Joint Morphological and Syntactic Analysis
Transition System for Morphology and Syntax
Configuration: (S ,B,M,A) [M = Morphology]
Initial: ([ ], [0, 1, . . . , n], { }, { })Terminal: ([0], [ ],M,A)
Shift(p): (S , i |B,M,A) ⇒ (S |i ,B,M ∪ {(i ,m)},A)
Right-Arc(k): (S |i |j ,B,M,A) ⇒ (S |i ,B,M,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,M,A) ⇒ (S |j ,B,M,A ∪ {(j , i , k)}) i 6= 0
Swap: (S |i |j ,B,M,A) ⇒ (S |j , i |B,M,A) 0 < i < j
I Transition-based parsing with three interleaved processes:I Assign morphology when words are shifted onto the stackI Optionally sort words into projective order <p
I Build dependency tree T by connecting adjacent subtrees
Recent Advances in Dependency Parsing 52(54)
Joint Morphological and Syntactic Analysis
Parsing Richly Inflected Languages
I Full morphological analysis: lemma + postag + featuresI Beam search and structured predicationI Parser selects from k best tags + featuresI Rule-based morphology provides additional features
I Evaluation metrics:I PM = morphology (postag + features)I LAS = labeled attachment score
Czech Finnish German Hungarian RussianPM LAS PM LAS PM LAS PM LAS PM LAS
Pipeline 93.0 83.1 88.8 79.9 89.1 91.8 96.1 88.4 92.6 87.4Joint 94.4 83.5 91.6 82.5 91.2 92.1 97.4 89.1 95.1 88.0
[Bohnet et al. 2013]
Recent Advances in Dependency Parsing 53(54)
Conclusion
Summary
I Transition-based parsing:I Efficient parsing using heuristic inferenceI Unconstrained history-based feature models
I Recent advances in synergy:I Beam search and structured predictionI Easy-first parsing and dynamic oraclesI Online reordering for non-projective treesI Joint morphological and syntactic analysis
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
References and Further Reading
I Giuseppe Attardi. 2006. Experiments with a multilanguage non-projectivedependency parser. In Proceedings of the 10th Conference on ComputationalNatural Language Learning (CoNLL), pages 166–170.
I Bernd Bohnet and Jonas Kuhn. 2012. The best of both worlds – a graph-basedcompletion model for transition-based parsers. In Proceedings of the 13thConference of the European Chpater of the Association for ComputationalLinguistics (EACL), pages 77–87.
I Bernd Bohnet and Joakim Nivre. 2012. A transition-based system for jointpart-of-speech tagging and labeled non-projective dependency parsing. InProceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1455–1465.
I Bernd Bohnet, Joakim Nivre, Igor Boguslavsky, Richard Farkas, Filip Ginter, andJan Hajic. 2013. Joint morphological and syntactic analysis for richly inflectedlanguages. Transactions of the Association for Computational Linguistics,1:415–428.
I Jinho D. Choi and Andrew McCallum. 2013. Transition-based dependency parsingwith selectional branching. In Proceedings of the 51st Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers), pages1052–1062.
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
I Jinho D. Choi and Martha Palmer. 2011. Getting the most out of transition-baseddependency parsing. In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics: Human Language Technologies, pages 687–692.
I Shay B. Cohen, Carlos Gomez-Rodrıguez, and Giorgio Satta. 2011. Exact inferencefor generative probabilistic non-projective dependency parsing. In Proceedings ofthe 2011 Conference on Empirical Methods in Natural Language Processing, pages1234–1245.
I Michael Collins and Brian Roark. 2004. Incremental parsing with the perceptronalgorithm. In Proceedings of the 42nd Annual Meeting of the Association forComputational Linguistics (ACL), pages 112–119.
I Michael Collins. 2002. Discriminative training methods for hidden markov models:Theory and experiments with perceptron algorithms. In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP),pages 1–8.
I Michael A. Covington. 2001. A fundamental algorithm for dependency parsing. InProceedings of the 39th Annual ACM Southeast Conference, pages 95–102.
I Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm for easy-firstnon-directional dependency parsing. In Human Language Technologies: The 2010Annual Conference of the North American Chapter of the Association forComputational Linguistics (NAACL HLT), pages 742–750.
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
I Yoav Goldberg and Joakim Nivre. 2012. A dynamic oracle for arc-eager dependencyparsing. In Proceedings of COLING 2012, pages 959–976.
I Yoav Goldberg and Joakim Nivre. 2013. Training deterministic parsers withnon-deterministic oracles. Transactions of the Association for ComputationalLinguistics, 1:403–414.
I Carlos Gomez-Rodrıguez and Joakim Nivre. 2010. A transition-based parser for2-planar dependency structures. In Proceedings of the 48th Annual Meeting of theAssociation for Computational Linguistics, pages 1492–1501.
I Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2011.Incremental joint pos tagging and dependency parsing in chinese. In Proceedings of5th International Joint Conference on Natural Language Processing (IJCNLP),pages 1216–1224.
I Matthew Honnibal, Yoav Goldberg, and Mark Johnson. 2013. A non-monotonicarc-eager transition system for dependency parsing. In Proceedings of theSeventeenth Conference on Computational Natural Language Learning, pages163–172.
I Liang Huang and Kenji Sagae. 2010. Dynamic programming for linear-timeincremental parsing. In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics (ACL), pages 1077–1086.
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
I Liang Huang, Suphan Fayong, and Yang Guo. 2012. Structured perceptron withinexact search. In Proceedings of the 2012 Conference of the North AmericanChapter of the Association for Computational Linguistics: Human LanguageTechnologies, pages 142–151.
I Richard Johansson and Pierre Nugues. 2006. Investigating multilingual dependencyparsing. In Proceedings of the Tenth Conference on Computational NaturalLanguage Learning (CoNLL), pages 206–210.
I Marco Kuhlmann, Carlos Gomez-Rodrıguez, and Giorgio Satta. 2011. Dynamicprogramming algorithms for transition-based dependency parsers. In Proceedings ofthe 49th Annual Meeting of the Association for Computational Linguistics (ACL),pages 673–682.
I John Lee, Jason Naradowsky, and David A. Smith. 2011. A discriminative modelfor joint morphological disambiguation and dependency parsing. In Proceedings ofthe 29th Annual Meeting of the Association for Computational Linguistics (ACL),pages 885–894.
I Zhenghua Li, Min Zhang, Wanxiang Che, Ting Liu, Wenliang Chen, and HaizhouLi. 2011. Joint models for chinese pos tagging and dependency parsing. InProceedings of the Conference on Empirical Methods in Natural LanguageProcessing (EMNLP), pages 1180–1191.
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
I Ryan McDonald. 2006. Discriminative Training and Spanning Tree Algorithms forDependency Parsing. University of Pennsylvania. Ph.D. thesis, PhD Thesis.
I Joakim Nivre and Jens Nilsson. 2005. Pseudo-projective dependency parsing. InProceedings of the 43rd Annual Meeting of the Association for ComputationalLinguistics (ACL), pages 99–106.
I Joakim Nivre, Marco Kuhlmann, and Johan Hall. 2009. An improved oracle fordependency parsing with online reordering. In Proceedings of the 11th InternationalConference on Parsing Technologies (IWPT’09), pages 73–76.
I Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. InGertjan Van Noord, editor, Proceedings of the 8th International Workshop onParsing Technologies (IWPT), pages 149–160.
I Joakim Nivre. 2004. Incrementality in deterministic dependency parsing. In FrankKeller, Stephen Clark, Matthew Crocker, and Mark Steedman, editors, Proceedingsof the Workshop on Incremental Parsing: Bringing Engineering and CognitionTogether (ACL), pages 50–57.
I Joakim Nivre. 2007. Incremental non-projective dependency parsing. InProceedings of Human Language Technologies: The Annual Conference of theNorth American Chapter of the Association for Computational Linguistics(NAACL-HLT), pages 396–403.
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
I Joakim Nivre. 2009. Non-projective dependency parsing in expected linear time. InProceedings of the 47th Annual Meeting of the Association for ComputationalLinguistics (ACL), pages 351–359.
I Francesco Sartorio, Giorgio Satta, and Joakim Nivre. 2013. A transition-baseddependency parser using a dynamic parsing strategy. In Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers), pages 135–144.
I Katerina Vesela, Havelka Jiri, and Eva Hajicova. 2004. Condition of projectivity inthe underlying dependency structures. In Proceedings of the 20th InternationalConference on Computational Linguistics (COLING), pages 289–295.
I Anssi Yli-Jyra. 2003. Multiplanarity – a model for dependency structures intreebanks. In Proceedings of the Second Workshop on Treebanks and LinguisticTheories (TLT), pages 189–200.
I Yue Zhang and Stephen Clark. 2008. A tale of two parsers: Investigating andcombining graph-based and transition-based dependency parsing. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing (EMNLP),pages 562–571.
I Yue Zhang and Joakim Nivre. 2011. Transition-based parsing with rich non-localfeatures. In Proceedings of the 29th Annual Meeting of the Association forComputational Linguistics (ACL), pages 188–193.
Recent Advances in Dependency Parsing 54(54)
References and Further Reading
I Yue Zhang and Joakim Nivre. 2012. Analyzing the effect of global learning andbeam-search on transition-based dependency parsing. In Proceedings of COLING2012: Posters, pages 1391–1400.
Recent Advances in Dependency Parsing 54(54)
Top Related