CS460/626 : Natural Language Processing/Speech, NLP and the Web
(Lecture 29– CYK; Inside Probability; Parse Tree construction)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
22nd March, 2011
Penn POS Tags
[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]
• John wrote those words in the Book of Proverbs.
Penn Treebank
(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in
(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))
• John wrote those words in the Book of Proverbs.
PSG Parse Tree
Official trading in the shares will start in Paris on Nov 6. S
VP
NP
NAP
official
PP
trading will start on Nov 6
A
PP
NP
in
P
the shares
NP
PPVAux
in Paris
Penn POS Tags
[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]
• Official trading in the shares will start in Paris on Nov 6.
Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP
CYK Parsing
(some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)
Shared Sub-Problems
Observation: ambiguous parses still share sub-trees
We don’t want to redo work that’s already been done
Unfortunately, naïve backtracking leads to duplicate work
Shared Sub-Problems: Example
Efficient Parsing Dynamic programming to the
rescue! Intuition: store partial results in
tables, thereby: Avoiding repeated work on shared
sub-problems Efficiently storing ambiguous
structures with shared sub-parts Two algorithms:
CKY: roughly, bottom-up Earley: roughly, top-down
CKY Parsing: CNF
CKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form:
A BC or Aa What does the tree look like?
What if my CFG isn’t in CNF?
A → B C D → w
CKY Parsing with Arbitrary CFGs Problem: my grammar has rules like VP → NP
PP PP Can’t apply CKY!
Solution: rewrite grammar into CNF Introduce new intermediate non-terminals
into the grammar What does this mean?
= weak equivalence The rewritten grammar accepts (and
rejects) the same set of strings as the original grammar…
But the resulting derivations (trees) are different
A B C DA X DX B C(Where X is a symbol that
doesn’t occur anywhere else in the grammar)
CKY Parsing: Intuition Consider the rule D → w
Terminal (word) forms a constituent Trivial to apply
Consider the rule A → B C If there is an A somewhere in the input then
there must be a B followed by a C in the input
First, precisely define span [ i, j ] If A spans from i to j in the input then there
must be some k such that i<k<j Easy to apply: we just need to try different
values for k
i j
k
CKY Parsing: Table Any constituent can conceivably span [ i, j ] for
all 0≤i<j≤N, where N = length of input string We need an N × N table to keep track of all
spans… But we only need half of the table
Semantics of table: cell [ i, j ] contains A iff A spans i to j in the input string Of course, must be allowed by the
grammar!
CKY Parsing: Table-Filling In order for A to span [ i, j ]:
A B C is a rule in the grammar, and There must be a B in [ i, k ] and a C in [ k, j ] for
some i<k<j Operationally:
To apply rule A B C, look for a B in [ i, k ] and a C in [ k, j ]
In the table: look left in the row and down in the column
CKY Algorithm
CKY Parsing: Recognize or Parse
Is this really a parser? Recognizer to parser: add
backpointers!
CKY: Algorithmic Complexity
What’s the asymptotic complexity of CKY?
O(n3)
CKY: Analysis Since it’s bottom up, CKY populates the table with a lot
of “phantom constituents” Spans that are constituents, but cannot really occur
in the context in which they are suggested Conversion of grammar to CNF adds additional non-
terminal nodes Leads to weak equivalence wrt original grammar Additional terminal nodes not (linguistically)
meaningful: but can be cleaned up with post processing
Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control?
Yes: Earley Parsing
Penn Treebank
( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start
(PP-LOC in (NP Paris))
(PP-TMP on (NP (NP Nov 6)
• Official trading in the shares will start in Paris on Nov 6.
Probabilistic Context Free Grammars
S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4
DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0
Example Parse t1
The gunman sprayed the building with bullets. S1.0
NP0.5 VP0.6
DT1.0NN0.5
VBD1.0NP0.5
PP1.0
DT1.0 NN0.5
P1.0 NP0.3
NNS1.0
bullets
with
buildingthe
The gunman
sprayed
P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225VP0.4
Another Parse t2
S1.0
NP0.5 VP0.4
DT1.0NN0.5VBD1.0
NP0.5 PP1.0
DT1.0 NN0.5 P1.0 NP0.3
NNS1.
0bullets
withbuilding
the
Thegunman
sprayed
NP0.2
P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015
The gunman sprayed the building with bullets.
Illustrating CYK [Cocke, Younger, Kashmi] Algo
S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4
• DT the 1.0• NN gunman 0.5• NN building 0.5• VBD sprayed 1.0• NNS bullets 1.0
CYK: Start with (0,1)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT
1 -------
2 ------- ---------
3 ------- ---------
--------
4 --------
---------
--------
---------
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK: Keep filling diagonals0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT
1 ------- NN
2 ------- ---------
3 ------- ---------
--------
4 --------
---------
--------
---------
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK: Try getting higher level structures
0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP
1 ------- NN
2 ------- ---------
3 ------- ---------
--------
4 --------
---------
--------
---------
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK: Diagonal continues0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP
1 ------- NN
2 ------- ---------
VBD
3 ------- ---------
--------
4 --------
---------
--------
---------
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
1 ------- NN --------
2 ------- ---------
VBD
3 ------- ---------
--------
4 --------
---------
--------
---------
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
1 ------- NN --------
2 ------- ---------
VBD
3 ------- ---------
--------
DT
4 --------
---------
--------
---------
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
1 ------- NN --------
---------
2 ------- ---------
VBD ---------
3 ------- ---------
--------
DT
4 --------
---------
--------
---------
NN
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK: starts filling the 5th column0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
1 ------- NN --------
---------
2 ------- ---------
VBD ---------
3 ------- ---------
--------
DT NP
4 --------
---------
--------
---------
NN
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
1 ------- NN --------
---------
2 ------- ---------
VBD ---------
VP
3 ------- ---------
--------
DT NP
4 --------
---------
--------
---------
NN
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
1 ------- NN --------
---------
---------
2 ------- ---------
VBD ---------
VP
3 ------- ---------
--------
DT NP
4 --------
---------
--------
---------
NN
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK: S found, but NO termination!
0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S
1 ------- NN --------
---------
---------
2 ------- ---------
VBD ---------
VP
3 ------- ---------
--------
DT NP
4 --------
---------
--------
---------
NN
5 --------
---------
--------
---------
---------
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S
1 ------- NN --------
---------
---------
2 ------- ---------
VBD ---------
VP
3 ------- ---------
--------
DT NP
4 --------
---------
--------
---------
NN
5 --------
---------
--------
---------
---------
P
6 --------
---------
--------
---------
---------
---------
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
1 ------- NN --------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
3 ------- ---------
--------
DT NP ---------
4 --------
---------
--------
---------
NN ---------
5 --------
---------
--------
---------
---------
P
6 --------
---------
--------
---------
---------
---------
CYK: Control moves to last column
0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
1 ------- NN --------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
3 ------- ---------
--------
DT NP ---------
4 --------
---------
--------
---------
NN ---------
5 --------
---------
--------
---------
---------
P
6 --------
---------
--------
---------
---------
---------
NPNNS
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
1 ------- NN --------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
3 ------- ---------
--------
DT NP ---------
4 --------
---------
--------
---------
NN ---------
5 --------
---------
--------
---------
---------
P PP
6 --------
---------
--------
---------
---------
---------
NPNNS
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
1 ------- NN --------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
3 ------- ---------
--------
DT NP ---------
NP
4 --------
---------
--------
---------
NN ---------
---------
5 --------
---------
--------
---------
---------
P PP
6 --------
---------
--------
---------
---------
---------
NPNNS
CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
1 ------- NN --------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
VP
3 ------- ---------
--------
DT NP ---------
NP
4 --------
---------
--------
---------
NN ---------
---------
5 --------
---------
--------
---------
---------
P PP
6 --------
---------
--------
---------
---------
---------
NPNNS
CYK: filling the last column0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
1 ------- NN --------
---------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
VP
3 ------- ---------
--------
DT NP ---------
NP
4 --------
---------
--------
---------
NN ---------
---------
5 --------
---------
--------
---------
---------
P PP
6 --------
---------
--------
---------
---------
---------
NPNNS
CYK: terminates with S in (0,7)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.
To From
1 2 3 4 5 6 7
0 DT NP --------
---------
S ---------
S
1 ------- NN --------
---------
---------
---------
---------
2 ------- ---------
VBD ---------
VP ---------
VP
3 ------- ---------
--------
DT NP ---------
NP
4 --------
---------
--------
---------
NN ---------
---------
5 --------
---------
--------
---------
---------
P PP
6 --------
---------
--------
---------
---------
---------
NPNNS
CYK: Extracting the Parse Tree
The parse tree is obtained by keeping back pointers.
S (0-7)
NP (0-2)
VP (2-7)
VBD (2-3)
NP (3-7)
DT (0-1)
NN (1-2)
The gunman
sprayed
NP (3-5)
PP (5-7)
DT (3-4)
NN (4-5)
P (5-6)
NP (6-7)
NNS (6-7)the buildin
g
withbullets
Top Related