Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by...

49
Introductio n to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Transcript of Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by...

Page 1: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Introduction to Syntactic Parsing

Roxana Girju

November 18, 2004

Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Page 2: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Overview

An introduction to the parsing problem

Context free grammars (CFDs)

A brief(!) sketch of the syntax of English

Examples of ambiguous structures

PCFGs, their formal properties

Weaknesses of PCFGs

Heads in CFGs

Chart parsing – algorithm and an example

Page 3: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Syntactic Parsing• Syntax: provides rules to put together words to form

components of sentence and to put together these components to form sentences.

• Knowledge of syntax is useful for:

1. Parsing

2. QA

3. IE

4. Translation, etc.• Grammar: is the formal specification of rules of

a language.• Parsing/Syntactic Parsing: is a method to perform

syntactic analysis of a sentence.

Page 4: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Parsing (Syntactic Structure)

INPUT:Boeing is located in Seattle.

OUTPUT:S

NP

N

Boeing

VP

V

is

VP

V

located

PP

P

in

NP

N

Seattle

Page 5: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Data for Parsing Experiments

Canadian Utilities had 1988 revenue of

NNP NNPSVBD

CDNN

NP

IN

C$ 1.16 billion ,

$ CD CD PUNC,

QP

NP

PP

NP

mainly

RB

ADVP

from its

IN

PRP$

natural gas

JJNN

and electric utility businessesin

CC JJ NN NNS

NP

IN

Alberta , where

NNP PUNC, WHADVP

NP

WRB

the company serves about 800,000 customers .

DTNN

NP

VBZ

RB CD

QP NNS PUNC.

NP

VP

S

SBAR

NP

PP

NP

PP

VP

Penn WSJ Treebank = 50,000 sentences with associated trees

Usual set-up: 40,000 training sentences, 2400 test sentences

An example tree:TOP

S

NP

Canadian Utilities had 1988 revenue of C$ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company serves about 800,000 customers .

Page 6: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

The Information Conveyed by Parse Trees

1) Part of speech (POS) for each word(N/NN = noun, V = verb, D/DT = determiner, P/IN=preposition)

S

NP

D

theburglar

N

VP

V

robbed

NP

D

the apartment

N

Page 7: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

2) Phrases S

NP

DT

theburglar

N

VP

V

robbed DT

NP

the apartment

Noun

Phrases (NP): “the burglar”, “the apartment” Verb

Phrases (VP): “robbed the apartment”

Sentences (S): “the burglar robbed the apartment”

N

Page 8: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

3) Useful Relationships

SNP

subject V

VP

verb

S

NP

DT

theburglar

N

VP

V

robbed

NP

DT

the apartment

=>“the burglar” is the subject of “robbed”

N

Page 9: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

An Example Application: Machine Translation

English word order is

Japanese word order is

subject – verb – object

subject – object – verb

English: Japanese:

IBM bought LotusIBM Lotus bought

English: Japanese:

Sources said that IBM bought Lotus yesterdaySources yesterday IBM Lotus bought that said

Page 10: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 11: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

1

23

4

5

7

6

NP => NN 8

NOTE: VI/VT=VB

Page 12: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 13: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

DERIVATION S

RULES USED

Page 14: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

DERIVATION S

RULES USED S=>NP VP

NP VP

Page 15: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

DERIVATION S

RULES USED S=>NP VPNP=>DT NNP VP

DT N VP

Page 16: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 17: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 18: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 19: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 20: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 21: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 22: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 23: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 24: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 25: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 26: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 27: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 28: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 29: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 30: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 31: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 32: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 33: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 34: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 35: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 36: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 37: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 38: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

The Problem with Parsing: Ambiguity

INPUT:She announced a program to promote safety in trucks and vans

+

POSSIBLE OUTPUTS:

S

N P

She

V P

a nnounc e d N P

N P

a pr ogr a m

V P

to pr om ote N P

sa fe ty PP

in N P

truc ks a nd va ns

S

N P

She

V P

a nnounc e d N P

N P

N P

a pr ogr a m

V P

to pr om ote N P

sa fe ty PP

in N P

truc ks

a nd N P

va ns

S

N P

She

V P

a nnounc e d N P

N P

a pr ogr a m

V P

to pr om ote N P

N P a nd

sa fe ty PP

in N P

truc ks

N P

va ns

S

N P

She

V P

a nnounc e d N P

N P

a pr ogr a m

V P

to pr om ote N P PP

sa fe ty

in N P

truc ks a nd va ns

S

N P

She

V P

a nnounc e d N P

N P

N P

a pr ogr a m

V P

to pr om ote N P PP

sa fe ty in N P

truc ks

a nd N P

va ns

S

N P

She

V P

a nnounc e d N P

N P

N P V P

a pr ogr a m

to pr om ote N P

sa fe ty

PP

in N P

truc ks a nd va ns

And there are more...

Page 39: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 40: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

9101112

Page 41: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 42: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 43: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

13

Page 44: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
Page 45: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

VP

VP

Vt

drove

PP

down the street

PP

in the car

VP

Vt

drove

PP

down NP

theN

street PP

in the car

Page 46: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

NP

D

the

N

N

JJ

fast

N

NN

car

N

NNmechanic

PP

IN

under

NP

D

the

N

N

NNpigeon in

PP

IN NP

DN

theNN box

Page 47: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

NP

D

the

N

N

N

JJ

fast

N

NN

car

N

NNmechanic

PP

IN

under

NP

D

theN

N

NNpigeon

PP

IN

in

NP

DN

theNN box

Page 48: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Sources of Ambiguity: Noun Premodifiers

fast NN

car

N

NN

Noun premodifiers:᷃

NP

NPD

the

N

JJN

mechanic

D

the

N

N

JJN

fastNN car

N

NNmechanic

Page 49: Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

A Funny Thing about the Penn Treebank

Leaves NP premodifier structure flat, or underspecified:

NP

DT

the

JJ

fast

NN

car mechanic

NN

NP

NP

DT

the

JJ

fast

NN

car mechanic

NN

PP

IN

under

NP

DT

the pigeon

NN