Statistical NLP Winter 2009
description
Transcript of Statistical NLP Winter 2009
Statistical NLPWinter 2009
Lecture 10: Parsing I
Roger Levy
Thanks to Jason Eisner & Dan Klein for slides
Why is natural language parsing hard?
• As language structure gets more abstract, computing it gets harder
• Document classification• finite number of classes• fast computation at test time
• Part-of-speech tagging (recovering label sequences)• Exponentially many possible tag sequences• But exact computation possible in O(n)
• Parsing (recovering labeled trees)• Exponentially many, or even infinite, possible trees• Exact inference worse than tagging, but still within reach
Why is parsing harder than tagging
• How many trees are there for a given string?• Imagine a rule VPVP
• …∞!
• This is not a problem for inferring availability of structures (why?)
• Nor is this a problem for inferring the most probable structure in a PCFG (why?)
Why parsing is harder than tagging II
• Ingredient 1: syntactic category ambiguity• Exponentially many category sequences, like tagging
• Ingredient 2: attachment ambiguity• Classic case: prepositional-phrase (PP) attachment• 1 PP: no ambiguity
• 2 PPs: some ambiguity
Why parsing is harder than tagging III
• 3 PPs: much more attachment ambiguity!
• 5 PPs: 14 trees, 6 PPs: 42 trees, 7 PPs: 132 trees…
Why parsing is harder than tagging IV
• Tree-structure ambiguity grows like the Catalan numbers (Knuth, 1975; Church & Patil, 1982)
• This is factorial growth on top of the exponential growth associated with sequence label ambiguity
Why parsing is still tractable
• This all makes parsing look really bad• But there’s still hope• Those factorially many parses are different
combinations of common subparts
How to parse tractably
• Recall that we did HMM part-of-speech tagging by storing partial results in a trellis
• An HMM is a special type of grammar with essentially two types of rules:• “Category Y can follow category X (with cost π)”• “Category X can be realized as word w (with cost η)”
• The trellis is a graph whose structure reflects its rules• Edges between all sequentially adjacent category
pairs
How to parse tractably II
• But a (weighted) CFG has more complicated rules:1. “Category X can rewrite as categories α (with cost π)”2. “Preterminal X can be realized as word w (with cost η)”
• (2 is really a special case of 1)• A graph is not rich enough to reflect CFG/tree structure
• Phrases need to be stored as partial results• We also need rule combination structure
• We’ll do this with hypergraphs
How to parse tractably III
• Hypergraphs are like graphs, but have hyper-edges instead of edges
• “We observe a DT as word 1 and an NN as word 2.”• “Together, these let us infer an NP spanning words 1—2.”
start state allows us to infer each of these
both of these are needed to infer this
How to parse tractably IV
• Hypergraph for Bird shot flies• (only partial)
Spanning words 1—2 Spanning words 2—3
Spanning words 1—3
Grammar:S NP VPVP V NP VP VNP N NP N N
Goal
How to parse tractably V
• The nodes in the hypergraph can be thought of as being arranged in a triangle
• For a sentence of length N, this is the upper right triangle of an N×N matrix
• This matrix is called the parse chart
How to parse tractably VI
• Before we study examples of parsing, let’s linger on the hypergraph for a moment
• The goal of parsing is to fully interconnect all the evidence (words) and the goal
• This could be done from the bottom up…
• …or from the top down & left to right• These correspond to different parse
strategies• Today: bottom-up (later: top-down)
Bottom-up (CKY) parsing
• Bottom-up is the most straightforward efficient parsing algorithm to implement
• Known as Cocke-Kasami-Young (CKY) algorithm• We’ll illustrate it for the weighted CFG instance• Each rule has a weight (log-prob) associated with it• We’re looking for the “lightest” (lowest-weight or,
equivalently, highest-probability) tree T for sentence S• Implicitly this is Bayes’ rule!
CKY parsing II
• Here’s the (partial) grammar we’ll use:
• The sentence we’ll parse (see the ambiguity?):
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
Time flies like an arrow
Imperative verb:“Do the dishes!”
3 NP time4 NP flies4 VP flies3 Vst time2 P like5 V like1 Det an8 N arrow
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
1NP4VP4
2P2V5
3 Det1
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
1NP4VP4
2P2V5
3 Det1
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10
1NP4VP4
2P2V5
3 Det1
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8
1NP4VP4
2P2V5
3 Det1
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
3 Det1
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
3 Det1
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
PP12
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
NP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
NP18S21
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
SFollow backpointers …
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
S
NP VP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
S
NP VP
VP PP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
S
NP VP
VP PP
P NP
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
S
NP VP
VP PP
P NP
Det N
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
Which entries do we need?
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
Which entries do we need?
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
Not worth keeping …
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
… since it just breeds worse options
time 1 flies 2 like 3 an 4 arrow 5
0
NP3Vst3
NP10S8S13
NP24S22S27NP24S27S22S27
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
Keep only best-in-class!
“inferior stock”
time 1 flies 2 like 3 an 4 arrow 5NP3Vst3
NP10S8
NP24S22
1NP4VP4
NP18S21VP18
2P2V5
PP12VP16
3 Det1
NP10
4 N8
1 S NP VP6 S Vst NP2 S S PP
1 VP V NP2 VP VP PP
1 NP Det N2 NP NP PP3 NP NP NP
0 PP P NP
Keep only best-in-class!(and backpointers so you can recover parse)
Computational complexity of parsing
• This approach has good space complexity• O(GN2) where G is the # categories in the grammar
• What is the time complexity of the algorithm?• It’s cubic in N…why?
• What about time complexity in G?• First, a clarification is in order• CFG rules can have right-hand sides of arbitrary length
X α• But CKY works only w/ right-hand sides of max length 2
• So we need to convert the CFG for use with CKY
Computational complexity II
• Any CFG can be transformed into a new CFG whose rules are at most binary-branching (α=2)• (Look up Chomsky normal form in the book for an example)
• This transformation is reversible with no loss of information• It’s also possible to similarly transform weighted CFGs• This makes CKY possible, and it is cubic in G