Chapter 9: Parsing with Context-Free Grammars Heshaam Faili [email protected] University of...
-
Upload
kenzie-fairall -
Category
Documents
-
view
224 -
download
2
Transcript of Chapter 9: Parsing with Context-Free Grammars Heshaam Faili [email protected] University of...
2
Context-Free Grammars Context-Free Grammars are of the form:
A , where is a string of terminals and/or non-terminals
Note that the regular grammars are a proper subset of the context-free grammars.
This means that every regular grammar is context-free, but there are context-free grammars that aren’t regular
CFGs only specify what trees look like, not how they should be computationally derived We need to parse the sentences
3
Parsing Intro Input: a string Output: a (single) parse tree
A useful step in the process of obtaining meaning We can view the problem as searching through all
possible parses (tree structures) to find the right one Strategies
Top-Down (goal-directed) vs. Bottom-Up (data-directed)
Breadth-First vs. Depth-First Adding Bottom-Up to Top-Down: Left-Corner Parsing
Example Book that flight!
5
Top-Down Parsing Expand rules, starting with S and working
down to leaves Replace the left-most non-terminal with each of
its possible expansions. The full search is on p. 361, Fig. 10.3
While we guarantee that any parse in progress will be S-rooted, it will expand non-terminals that can’t lead to the existing input e.g., 5 of 6 trees in third ply = level of the
search space None of the trees take the properties of the
lexical items into account until the last stage
6
Top-down (breadth-first) parsing
S
S
NP VP
S
VP
S
Aux VPNP
S
NP VP
Det Nom
S
NP VP
PropN
S
NP VP
Det Nom
Aux
S
NP VPAux
PropN
S
VP
V NP
S
VP
V
7
Expansion techniques Breadth-First Expansion (shown in figure)
All the nodes at each level are expanded once before going to the next (lower) level.
This is memory intensive when many grammar rules are involved
Depth-First (shown on p. 367, Fig. 10.7) Expand a particular node at a level, only
considering an alternate node at that level if the parser fails as a result of the earlier expansion
i.e., expand the tree all the way down until you can’t expand any more
8
Top-down (depth-first) parsing
S
NP VP
S
S
NP VPAuxDoes
S
NP VP
Det Nom
DoesAux
this
S
NP VP
Det NomDoes
FAIL
Does this flight include a meal ?
9
Top-Down Depth-First Parsing There are still some choices that have to be made:
1. Which leaf node should be expanded first? Left-to-right strategy moves through the leaf nodes in a
left-to-right fashion 2. Which rule should be applied first?
There are multiple NP rules; which should be used first? Can just use the textual order of rules from the grammar
There may be reasons to take rules in a particular order (e.g., probabilities)
10
Parsing with an agenda Search states are kept in an agenda
Search states consist of partial trees and a pointer to the next input word in the sentence
Based on what we’ve seen before, apply the next item on the agenda to the current tree
Add new items to (the front of) the agenda, based on the rules in the grammar which can expand at the (leftmost) node
We maintain the depth-first strategy by adding new hypotheses (rules) to the front of the agenda
If we added them to the back, we would have a breadth-first strategy
See figure 10.6 pg. 366E
11
Bottom-Up Parsing Bottom-Up Parsing is input-driven start from the
words and move up to form a tree Here we match one or more nodes on the upper
fringe of the parse tree against the right-hand side of a CFG rule, building the left-hand side as a parent node of those nodes.
We can also have breadth-first and depth-first approaches The example on the next slide (p. 362, Fig. 10.4) moves in
a breadth-first fashion While any parse in progress will be tied to the
input, many may not lead to an S! e.g., left-most trees in plies 1-4 of Fig 10.4
12
Book that flight
Book that flight
Noun Det Noun
Book that flight
Verb Det Noun
Book that flight
Noun Det Noun
NOM NOM
Book that flight
Verb Det Noun
NOM
Book that flight
Verb Det Noun
NOM
VP NP
Book that flight
Noun Det Noun
NOMNOM
NP
Book that flight
Verb Det Noun
NOM
NP
Book that flight
Verb Det Noun
NOMVP
Bottom-up parsing
Book that flight
Verb Det Noun
NOM
NP
VP
13
Comparing Top-Down and Bottom-Up Parsing Top-Down:
While we guarantee that any parse in progress will be S-rooted, it will expand non-terminals that can’t lead to the existing input, e.g., first 4 trees in third ply.
Bottom-Up: While any parse in progress will be tied to the
input, many may not lead to an S, e.g., left-most trees in plies 1-4 of p. 362, Fig 10.4.
So, both pure top-down and pure bottom up approaches are highly inefficient.
14
Left-Corner Parsing Motivation:
Both pure top-down and bottom-up approaches are inefficient
The correct top-down parse has to be consistent with the left-most word of the input
Left-corner parsing: a way of using bottom-up constraints as part of a top-down strategy.
Left-corner rule: expand a node with a grammar rule only if the current input can serve as the left corner from this rule.
Left-corner from a rule: first word along the left edge of a derivation from the rule
Put the left-corners into a table, which can then guide parsing
15
Left-Corner Example
S NP VPS VPS Aux NP VPNP Det Nominal | ProperNounNominal Noun Nominal |
NounVP Verb | Verb NPNoun book | flight | meal |
moneyVerb book | include | preferAux doesProperNoun Houston | TWA
Left Corners S => NP …=> Det,
ProperNoun VP => Verb Aux … => Aux NP => Det,
ProperNounVP => VerbNominal => Noun
16
Other problems: Left-Recursion Left-corner parsers still guided by top-down parsing Consider rules like:
S S and SNP NP PP A top-down left-to-right depth-first parser could apply a rule to
expand a node (e.g., S), and then apply that same rule again, and again, ad infinitum.
Left Recursion: A grammar is left-recursive if a non-terminal leads to a derivation that includes itself as its leftmost immediate or non-immediate child (i.e., along its leftmost branch).
PROBLEM: Top-Down parsers may not terminate on a left-recursive grammar
17
Other problems: Repeated Parsing of Subtrees When parser backtracks to an alternative expansion
of a non-terminal, it loses all parses of subconstituents that it built.
There is a good chance that it will rebuild the parses of some of those constituents again.
This can occur repeatedly. a flight from Indianopolis to Houston on TWA
NP Det Nom Will build an NP for a flight, before failing when the parser
realizes the input PPs aren’t covered NP NP PP
Will again build an NP for a flight, before failing when the parser realizes the two remaining PPs in the input aren’t covered
18
Other problems: Ambiguity Repeated parsing of subtrees is even more of a
problem for ambiguous sentences PP attachment:
NP or VP: I shot an elephant in my pajamas. NP bracketing: the meal [on flight 286] [from SF] [to
Denver] Coordination:
[old men] and women vs. old [men and women]
3 kinds of ambiguities: attachment, coordination, noun-phrase bracketing.
Parsers have to disambiguate between lots of valid parses or return all parses
Will repeat a lot of work parsing the commonalities of each ambiguity
20
Addressing the problems: Chart Parsing More or less a standard method for carrying
out parsing; keeps tables of constituents that have been parsed earlier, so it doesn’t reduplicate the work.
Each possible sub-tree is represented exactly once.
This makes it a form of dynamic programming (which we saw with minimum edit distance and the Viterbi algorithm)
Combines bottom-up and top-down parsing Rather simple and elegant in the way it works!
21
Earley Chart Parsing Representation
The parser uses a representation for parse state based on dotted rules. S NP VP
Dotted rules distinguish what has been seen so far from what has not been seen (i.e., the remainder).
The constituents seen so far are to the left of the dot in the rule, the remainder is to the right.
Parse information is stored in a chart, represented as a graph.
The nodes represent word positions. The labels represent the portion (using the dot notation)
of the grammar rule that spans that word position. In other words, at each position, there is a set of labels
(each of which is a dotted rule, also called a state), indicating the partial parse tree produced until then.
22
Example: Chart for A Dog Given a trivial grammar
NP D ND aN dog
Here’s the chart for the complete parse of “a dog”[0] D a [1] (scan)
[1] N dog [2] (scan)
[ 0] NP D N [0] (predict)
[0] NP D N [1] (complete)
[0] NP D N [2] (complete)
23
More Chart Parsing Terminology A state is complete if it has a dot at the right-
hand side of its rule. Otherwise, it is incomplete. At each position, there is a list (actually, a
queue) of states. The parser moves through the N+1 sets of states in the
chart left-to-right, processing the states in each set in order.
States will be stored in a FIFO (first-in first-out) queue at each start position
The processing applies one of three operators, each of which takes a state and produces new states added to the chart.
Scanner, Predictor, Completer There is no backtracking.
24
Earley Parsing Algorithm The parsing algorithm is just a few lines long,
as can be seen on p. 381, Figure 10.16 In the top level loop, for each position, for
each state, it calls the predictor, or else the scanner, or else the completer. The algorithm never backtracks and never
removes states, so we don’t redo any work The goal is to have S α• as the last chart entry,
i.e. the dot has moved over the entire input to derive an S
26
The 3 Operators: Predictor, Scanner, CompleterProcedure PREDICTOR((AB, [i, j]))For each (B) in grammar do Enqueue((B , [j, j]), chart[j])End
Procedure SCANNER ((AB, [i, j]))If B is a part-of-speech for word[j] then Enqueue((B word[j], [j, j+1]), chart[j+1])
Procedure COMPLETER((B, [j, k]))For each (AB, [i, j]) in chart[j] do Enqueue((A B, [i, k]), chart[k])End
27
PredictionProcedure PREDICTOR((AB, [i, j]))For each (B) in grammar do Enqueue((B , [j, j]), chart[j])End Predicting is the task of saying we kinds of
input we expect to see Add a rule to the chart saying that we have not
seen , but when we do, it will form a B The rule covers no input, so it goes from j to j
Such rules provide the top-down aspect of the algorithm
28
Scanning
Procedure SCANNER ((AB, [i, j]))If B is a part-of-speech for word[j] then Enqueue((B word[j], [j, j+1]), chart[j+1])Scanning reads in lexical items
We add a dotted rule indicating that a word has been seen between j and j+1
This is then added to the following (j+1) chart Such a completed dotted rule can be used to
complete other dotted rules These rules also show how the Earley parser
has a bottom-up component
29
CompletionProcedure COMPLETER((B, [j, k]))For each (AB, [i, j]) in chart[j] do Enqueue((A B, [i, k]), chart[k])End Completion combines two rules in order to move
the dot, i.e., indicate that something has been seen A rule covering B has been seen, so any rule A which refers
to B in its RHS moves the dot Instead of spanning from i to j, A now spans from i to k,
which is where B ended Once the dot is moved, the rule will not be created
again
35
Earley parsing The Earley algorithm is efficient,
running in polynomial time. Technically, however, it is a recognizer,
not a parser To make it a parser, each state needs to be
augmented with a pointer to the states that its rule covers
For example, a VP would point to the state where its V was completed and the state where its NP was completed
36
Other Dynamic Programming methods
CYK (Cocke-Kasami-Younger) Parser Using CNF grammar rules
Chart Parsing Modified version of Earley parsing
with dynamic ordering of states in the algorithm
37
CYK Parsing The DP method by using CNF grammar
ABC Am
Any CFG can be converted to CNF, So, don’t loss anything … AB : unit productions (can be rewrited by A for any
A) Like other DP methods, a simple (n+1)*(n+1) matrix
used to encode the structure of the sentence (n: sentence length)
Indexed is the gap between words [0 Book 1 that 2 flight 3 ]
[i,j] : is a set of non-terminals that represent all the constituents that span positions i through j of the input
38
CYK Parsing, cont,d Since our grammar is in CNF, the non-
terminal entries in the table have exactly two daughters in the parse.
for each constituent represented by an entry [i, j] in the table there must be a position in the input, k, where it can be split into two parts such that i < k < j. Given such a k, the first constituent [i,k]
must lie to the left of entry [i, j] somewhere along row i, and the second entry [k, j] must lie beneath it, along column j
43
CYK in practice Does not have major problem
theoretically The resulted parse tree are not
consistent to syntacticians…(because of CNF formal)
Syntax to Semantic approach complicated …
Post-processing needed to return-back the result to more acceptable form
44
Chart Parser In both the CKY and Earley algorithms, the
order in which events occur (adding entries to the table, reading words, making predictions, etc.) is statically determined by the procedures that make up these algorithms.
Unfortunately, dynamically determining the order in which events occur based on the current information is often necessary
Chart Parsing facilitates just such dynamic determination of the order in which chart entries are processed.
Using Agenda
45
Chart Parser fundamental rule: generalized the ideas
in CYK and Earley: if the chart contains two edges A → α • B β ,
[i, j] and B → γ •, [ j,k] then we should add the new edge A →α B • β [i,k] to the chart
Prediction can be top-down of botton-up
48
Inadequacies of parsing with plain CFGs
While the Earley algorithm works well for CFGs, we have to at some point question the validity of using plain CFGs
We’ll show this by looking at two phenomena (although, there are many more): Subject-verb agreement Subcategorization frames
49
Modeling Subject-Verb Agreement in CFGsThe flights leave vs. The flight leaves.S 3sgNP 3sgVPS PluralNP PluralVP3sgVP 3sgVerb # flies
| 3sgVerb NP # wants + a flight| 3sgVerb NP PP # leaves + Boston + in the morning| 3sgVerb PP # leaves + on Thursday
3sgNP Pronoun # I| ProperNoun # Denver| Det 3sgNominal # a + flight
3sgNominal Noun 3sgNominal # morning + flight | 3sgNoun #flight
50
Problems with Modeling Agreement in CFGs You can see how messy this is, resulting in a
massive increase in the size of the grammar. Of course, once we add in determiner-noun agreement
(e.g., “a flight” vs. “(the) flights”), it would get even larger.
Other languages which have gender agreement (e.g., French) will make it even worse.
Furthermore, we miss generalizations: all transitive verbs have an NP object, regardless of whether the verb is 3rd singular or not
We will need to go to feature-based grammars to address these problems.
51
Subcategorization Frames in CFGs V1. eat, sleep I want to eat V2. NP prefer, find, leave Find [NP the flight from Pittsburgh to
Boston] V3. NP NP show, give, find Show [NP me] [NP the airlines with
flights from Pittsburgh] V4. PPfrom PPto fly, travel I would like to fly [PP from Boston]
[PP to Philadelphia] V5. NP PPwith help, load Can you help [NP me] [PP with a
flight] V6. VPto prefer, want, need I would prefer [VP to go by
United Airlines] V7. VPbare_stem can, would, might I can [VP go from Boston] V8. V_S mean, imply Does this mean [S American has a hub in
Boston]
52
CFG Grammar For Subcategorization VP V1
| V2 NP| V3 NP NP| V4 PPfrom PPto| V5 NP PPwith|V6 VPto|V7 VPbare_stem|V8 S
V1 eat | sleep,
53
Problem with Modeling Subcat in CFGs Again, this results in an explosion in the
number of rules, especially when a full set of subcategorization frames is included. If we combine these rules with the agreement
rules, it gets even worse Also, nouns, adjectives, and prepositions can
also subcategorize for complements. And again, we have no way to state what’s
in common about these rules So, we turn to feature-based grammars