Chapter 12Lexicalized and
Probabilistic Parsing
Guoqiang Shan
University of Arizona
November 30, 2006
Probabilistic Context-Free Grammars Intuition Behind
To find “correct” parse for the ambiguous sentences i.e. can you book TWA flights? i.e. the flights include a book
Definition of Context-Free Grammar 4-tuple G = (N, Σ, P, S)
N: a finite set of non-terminal symbols Σ: a finite set of terminal symbols, where N Λ Σ = Φ P: A β , where A is in N, and β is in (N V Σ)* S: start symbol in N
Definition of Probabilistic Context-Free Grammar 5-tuple G = (N, Σ, P, S, D)
D: A function P [0,1] to assign a probability to each rule in P
Rules are written as A β[p], where p = D(A β) i.e. A a B [0.6], B C D [0.3]
PCFG ExampleS NP VP .8S Aux NP VP .15S VP .05NP Det Nom .2NP ProperN .35NP Noun .05NP ProNoun .4Nom Noun .75Nom Noun Nom .2Nom ProperN Nom.05VP Verb .55VP Verb NP .4VP Verb NP NP .05
Det that .5Det the .8 Det a .15Noun book .1Noun flights .5Noun meal .4Verb book .3Verb include .3Verb want .4Aux can .4Aux does .3Aux do .3ProperN TWA .4ProperN Denver .6Pronoun you .4Pronoun I .6
Probability of A Sentence in PCFG Probability of any parse tree T of S
P(T,S) = Π D(r(n)) T is the parse tree and S is the sentence to be parsed n is a sub tree of T and r(n) is a rule to expand n
Probability of A parse tree P(T,S) = P(T) * P(S|T) A parse tree T uniquely corresponds a sentence S, so P(S|T) = 1 P(T) = P(T,S)
Probability of a sentence P(S) = Σ P(T), where T is in τ(S), the set of all the parse trees of S In particular, for an unambiguous sentence, P(S) = P(T)
Example
P(Tl) = 0.15*0.40*0.05* 0.05*0.35*0.75* 0.40*0.40*0.30* 0.40*0.50= 3.78*10-7
P(Tr) = 0.15*0.40*0.40* 0.05*0.05*0.75* 0.40*0.40*0.30* 0.40*0.50= 4.32*10-7
Probabilistic CYK Parsing of PCFG Bottom-Up approach
Dynamic Programming: fill the tables of partial solutions to the sub-problems until they contain all the solutions to the entire problem
Input CNF: ε-free, each production in form A β or A BC n words, w1, w2, …, wn
Data Structure Π[i, j, A]: the maximum probability for a constituent with
non-terminal A spanning j words from wi
β[i, j, A] = {k, B, C}, where A BC, and B spans k words from wi (for rebuilding the parse tree)
Output The maximum probability parse will be Π[1,n,1] The root of the parse tree is S, and spans entire string
Base case Consider the input
strings of length one By the rules A wi
Recursive case For strings of words of
length>1, A → wij
There exists some rules A BC and k
0<k<j B → wik (known) C → w(i+k)(j-k) (known)
Compute the probability of wij by multiplying the two probabilities
If there are more than one A BC, pick the one that maximize the probability of wij
CYK Algorithm
Π [i,0,A]
{k, B, C}
My implementation is in lectura under directory /home/shan/538share/pcyk.c
PCFG Example – Revisit to rewrite
S NP VP .8S Aux NP VP .15S VP .05NP Det Nom .2NP ProperN .35NP Noun .05NP ProNoun .4Nom Noun .75Nom Noun Nom .2Nom ProperN Nom.05VP Verb .55VP Verb NP .4VP Verb NP NP .05
Det that .5Det the .8 Det a .15Noun book .1Noun flights .5Noun meal .4Verb book .3Verb include .3Verb want .4Aux can .4Aux does .3Aux do .3ProperN TWA .4ProperN Denver .6Pronoun you .4Pronoun I .6
Example (CYK Parsing) - Rewrite as CNF
S NP VP .8(S Aux NP VP .15)S Aux NV .15NV NP VP 1.0(S VP .05)
S book.00825
S include .00825S want .011S Verb NP .02S Verb DNP .0025NP Det Nom .2(NP ProperN .35)
NP TWA .14NP Denver .21(NP Nom .05)
NP book .00375NP flights .01875NP meal .015NP Noun Nom .01NP ProperN Nom .0025
(NP ProNoun .4)
NP you .16NP I .24(Nom Noun .75)
Nom book .075Nom flights .375Nom meal .3Nom Noun Nom .2Nom ProperN Nom .05(VP Verb .55)
VP book .165VP include .165VP want .22VP Verb NP .4
(VP Verb NP NP .05)VP Verb DNP .05DNP NP NP 1.0
Example (CYK Parsing) – Π matrix Π i+j i
1 2 3 4 5
1 Aux: .4
2Pronoun: .4
NP: .16
3
Noun: .1Verb: .3VP: .165
Nom: .075NP: .00375S: .00825
4ProperN: .4
NP: .14
5Noun: .5
Nom: .375NP: .01875
can you book TWA flights
Example (CYK Parsing) – Π matrix Π i+j i
1 2 3 4 5
1 Aux: .4 0
2Pronoun: .4
NP: .16
S: .02112NV: .0264
DNP: .0006
3
Noun: .1Verb: .3VP: .165
Nom: .075NP: .00375S: .00825
S: .00084VP: .0168
DNP: 000525
4ProperN: .4
NP: .14
NP: .000375Nom: .0075
DNP: .002625
5Noun: .5
Nom: .375NP: .01875
can you book TWA flights
Example (CYK Parsing) – Π matrix Π i+j i
1 2 3 4 5
1 Aux: .4 0 S: .01584
2Pronoun: .4
NP: .16
S: .02112NV: .0264
DNP: .0006
S: .021504NV: .002688
3
Noun: .1Verb: .3VP: .165
Nom: .075NP: .00375S: .00825
S: .00084VP: .0168
DNP: 000525
S: .00000225NP: .0000075Nom: .00015VP: .000045
DNP: .000001406
4ProperN: .4
NP: .14
NP: .000375Nom: .0075
DNP: .002625
5Noun: .5
Nom: .375NP: .01875
can you book TWA flights
Example (CYK Parsing) – Π matrix Π i+j i
1 2 3 4 5
1 Aux: .4 0 S: .01584 S: .00016128
2Pronoun: .4
NP: .16
S: .02112NV: .0264
DNP: .0006
S: .021504NV: .002688
S: .00000576NV: .0000072
DNP: .0000012
3
Noun: .1Verb: .3VP: .165
Nom: .075NP: .00375S: .00825
S: .00084VP: .0168
DNP: 000525
S: .00000225NP: .0000075Nom: .00015VP: .000045
DNP: .000001406
4ProperN: .4
NP: .14
NP: .000375Nom: .0075
DNP: .002625
5Noun: .5
Nom: .375NP: .01875
can you book TWA flights
Example (CYK Parsing) – Π matrix Π i+j i
1 2 3 4 5
1 Aux: .4 0 S: .01584 S: .00016128 S: .000000432
2Pronoun: .4
NP: .16
S: .02112NV: .0264
DNP: .0006
S: .021504NV: .002688
S: .00000576NV: .0000072
DNP: .0000012
3
Noun: .1Verb: .3VP: .165
Nom: .075NP: .00375S: .00825
S: .00084VP: .0168
DNP: 000525
S: .00000225NP: .0000075Nom: .00015VP: .000045
DNP: .000001406
4ProperN: .4
NP: .14
NP: .000375Nom: .0075
DNP: .002625
5Noun: .5
Nom: .375NP: .01875
can you book TWA flights
Example (CYK Parsing) – β matrix
B i+j i
1 2 3 4 5
1 N/A S Aux NV, k = 1 S Aux NV, k = 1 S Aux NV, k = 1
2S NP VP, k = 1
NV NP VP, k = 1DNP NP NP, k = 1
S NP VP, k = 1NV NP VP, k = 1
S NP VP, k = 1NV NP VP, k = 1
DNP NP NP, k = 1
3S Verb NP, k = 1
VP Verb NP, k = 1DNP NP NP, k = 1
S Verb NP, k = 1NP Noun Nom, k = 1
Nom Noun Nom, k = 1VP Verb NP, k = 1DNP NP NP, k = 1
4NP ProperN Nom, k = 1
Nom ProperN Nom, k = 1DNP NP NP, k = 1
5
can you book TWA flights
PCFG Problems Independence Assumption
Assumption: the expansion of one nonterminal is independent of the expansion of others.
However, examination shows that how a node expands is dependent on the location of the node
91% of the subjects are pronouns. She’s able to take her baby to work with her. (91%) Uh, my wife worked until we had a family. (9%)
But only 34% of the objects are pronouns. Some laws absolutely prohibit it. (34%) All the people signed confessions. (66%)
PCFG Problems Lack of sensitivity of words
Lexical information in a PCFG can only be represented via the probability of pre-terminal nodes (such as Verb, Noun, Det)
However, lexical information and dependencies turns out to be important in modeling syntactic probabilities.
Example: Moscow sent more than 100,000 soldiers into Afghanistan.
In PCFG, into Afghanistan may attach NP (more than 100,000 soldiers) or VP (sent)
Statistics shows that NP attachment is 67% or 52% Thus, PCFG will produce an incorrect result. Why? the word “Send” subcategorizes for a destination, which
can be expressed with the preposition “into”. In fact, when the verb is “send”, “into” always attaches to it
PCFG Problems Coordination
ambiguity Look at the following
case Example: dogs in
houses and cats Semantically, dogs is
a better conjunct for cats than houses
Thus, the parse [dogs in [NP houses and cats]] intuitively sounds unnatural, and should be dispreferred.
However, PCFG assigns them the same probability, since the structures are using exactly the same rules.
References NLTK Tutorial: Probabilistic Parsing: http://
nltk.sourceforge.net/tutorial/pcfg/index.html Stanford Probabilistic Parsing Group:
http://nlp.stanford.edu/projects/stat-parsing.shtml General CYK algorithm http://en.wikipedia.org/wiki/CYK_algorithm General CYK algorithm web compute http://www2.informatik.hu-berlin.de/~pohl/cyk.php?action=example Probabilistic CYK parsing http://www.ifi.unizh.ch/cl/gschneid/ParserVorl/ParserVorl7.pdf http://catarina.ai.uiuc.edu/ling306/slides/lecture23.pdf
Top Related