Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by...
-
Upload
mercy-oneal -
Category
Documents
-
view
227 -
download
0
Transcript of Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by...
Introduction to Syntactic Parsing
Roxana Girju
November 18, 2004
Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Overview
An introduction to the parsing problem
Context free grammars (CFDs)
A brief(!) sketch of the syntax of English
Examples of ambiguous structures
PCFGs, their formal properties
Weaknesses of PCFGs
Heads in CFGs
Chart parsing – algorithm and an example
Syntactic Parsing• Syntax: provides rules to put together words to form
components of sentence and to put together these components to form sentences.
• Knowledge of syntax is useful for:
1. Parsing
2. QA
3. IE
4. Translation, etc.• Grammar: is the formal specification of rules of
a language.• Parsing/Syntactic Parsing: is a method to perform
syntactic analysis of a sentence.
Parsing (Syntactic Structure)
INPUT:Boeing is located in Seattle.
OUTPUT:S
NP
N
Boeing
VP
V
is
VP
V
located
PP
P
in
NP
N
Seattle
Data for Parsing Experiments
Canadian Utilities had 1988 revenue of
NNP NNPSVBD
CDNN
NP
IN
C$ 1.16 billion ,
$ CD CD PUNC,
QP
NP
PP
NP
mainly
RB
ADVP
from its
IN
PRP$
natural gas
JJNN
and electric utility businessesin
CC JJ NN NNS
NP
IN
Alberta , where
NNP PUNC, WHADVP
NP
WRB
the company serves about 800,000 customers .
DTNN
NP
VBZ
RB CD
QP NNS PUNC.
NP
VP
S
SBAR
NP
PP
NP
PP
VP
Penn WSJ Treebank = 50,000 sentences with associated trees
Usual set-up: 40,000 training sentences, 2400 test sentences
An example tree:TOP
S
NP
Canadian Utilities had 1988 revenue of C$ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company serves about 800,000 customers .
The Information Conveyed by Parse Trees
1) Part of speech (POS) for each word(N/NN = noun, V = verb, D/DT = determiner, P/IN=preposition)
S
NP
D
theburglar
N
VP
V
robbed
NP
D
the apartment
N
2) Phrases S
NP
DT
theburglar
N
VP
V
robbed DT
NP
the apartment
Noun
Phrases (NP): “the burglar”, “the apartment” Verb
Phrases (VP): “robbed the apartment”
Sentences (S): “the burglar robbed the apartment”
N
3) Useful Relationships
SNP
subject V
VP
verb
S
NP
DT
theburglar
N
VP
V
robbed
NP
DT
the apartment
=>“the burglar” is the subject of “robbed”
N
An Example Application: Machine Translation
English word order is
Japanese word order is
subject – verb – object
subject – object – verb
English: Japanese:
IBM bought LotusIBM Lotus bought
English: Japanese:
Sources said that IBM bought Lotus yesterdaySources yesterday IBM Lotus bought that said
1
23
4
5
7
6
NP => NN 8
NOTE: VI/VT=VB
DERIVATION S
RULES USED
DERIVATION S
RULES USED S=>NP VP
NP VP
DERIVATION S
RULES USED S=>NP VPNP=>DT NNP VP
DT N VP
The Problem with Parsing: Ambiguity
INPUT:She announced a program to promote safety in trucks and vans
+
POSSIBLE OUTPUTS:
S
N P
She
V P
a nnounc e d N P
N P
a pr ogr a m
V P
to pr om ote N P
sa fe ty PP
in N P
truc ks a nd va ns
S
N P
She
V P
a nnounc e d N P
N P
N P
a pr ogr a m
V P
to pr om ote N P
sa fe ty PP
in N P
truc ks
a nd N P
va ns
S
N P
She
V P
a nnounc e d N P
N P
a pr ogr a m
V P
to pr om ote N P
N P a nd
sa fe ty PP
in N P
truc ks
N P
va ns
S
N P
She
V P
a nnounc e d N P
N P
a pr ogr a m
V P
to pr om ote N P PP
sa fe ty
in N P
truc ks a nd va ns
S
N P
She
V P
a nnounc e d N P
N P
N P
a pr ogr a m
V P
to pr om ote N P PP
sa fe ty in N P
truc ks
a nd N P
va ns
S
N P
She
V P
a nnounc e d N P
N P
N P V P
a pr ogr a m
to pr om ote N P
sa fe ty
PP
in N P
truc ks a nd va ns
And there are more...
9101112
13
VP
VP
Vt
drove
PP
down the street
PP
in the car
VP
Vt
drove
PP
down NP
theN
street PP
in the car
NP
D
the
N
N
JJ
fast
N
NN
car
N
NNmechanic
PP
IN
under
NP
D
the
N
N
NNpigeon in
PP
IN NP
DN
theNN box
NP
D
the
N
N
N
JJ
fast
N
NN
car
N
NNmechanic
PP
IN
under
NP
D
theN
N
NNpigeon
PP
IN
in
NP
DN
theNN box
Sources of Ambiguity: Noun Premodifiers
fast NN
car
N
NN
Noun premodifiers:᷃
NP
NPD
the
N
JJN
mechanic
D
the
N
N
JJN
fastNN car
N
NNmechanic
A Funny Thing about the Penn Treebank
Leaves NP premodifier structure flat, or underspecified:
NP
DT
the
JJ
fast
NN
car mechanic
NN
NP
NP
DT
the
JJ
fast
NN
car mechanic
NN
PP
IN
under
NP
DT
the pigeon
NN