Probabilistic Parsing

Probabilistic Parsing

Ling 571Fei Xia

Week 5: 10/25-10/27/05

Outline

• Lexicalized CFG (Recap)• Hw5 and Project 2• Parsing evaluation measures: ParseVal • Collin’s parser

• TAG• Parsing summary

Lexicalized CFG recap

Important equations

),...|(),...,(

),...,()(

111

1

,...,11

2

ii

in

AAn

AAAPAAP

AAPAPn

Lexicalized CFG

• Lexicalized rules:

• Sparse data problem– First generate the head– Then generate the unlexicalized rule

)()...()()()...()()()....()(

1111

11

mmnn

nn

rRrRhHlLlLhAwBwBwA

Lexicalized models

))(),(|(*)))((),(|)((

)),...,),(|(*)),...,|)((

),...,|)(,(

),...|(

),...,(),(

1

11111

111

111

1

iiiiiii

iiiiii

iii

i

ii

i

n

rhrlhsrPrmhrlhsrhP

lrlrrhrPlrlrrhP

lrlrrhrP

lrlrlrP

lrlrPSTP

An example

• he likes her

),|Pr(*),|(*),|(*),|(*

),|Pr(*),|(*),|(*),|(

),(

herNPonNPPlikesNPherPlikesVPVNPVPPlikesVPlikesPheNPonNPPlikesNPheP

likesSNPVPSPSlikesPSTP

An example

• he likes her

),Pr|(Pr*),Pr|(*),|(*),|(*),Pr|(Pr*),Pr|(*),|Pr(*),|(*),|(*),|(*

),|Pr(*),|(*),|(*),|(*

),|(*),|(),(

heronheronPheronherPlikesVlikesVPlikesVlikesP

heonheonPheonhePherNPonNPPlikesNPherPlikesVPVNPVPPlikesVPlikesPheNPonNPPlikesNPheP

likesSNPVPSPlikesSlikesPlikesTopSTopPToplikesP

STP

Head-head probability

)...)(...)(()....)(...)((

),(),,(),(),,(

),|(

1

21

1

12

1

12

12

wAwXCwAwXC

wACwAwC

wAPwAwP

wAwP

w

)...)(...)(()...)(...)((),|(wNPlikesXC

heNPlikesXClikesNPheP

w

Head-rule probability

))(())((

))(())((

))(())((

),(),,(

),|(

wACwAC

wACwAC

wAPwAP

wAPwAAP

wAAP

))(()Pr)((),|Pr(

heNPConheNPCheNPonNPP

Estimate parameters

))(())((),|(

)...)(...)(()....)(...)((

),|(1

2112

wACwACwAAP

wAwXCwAwXC

wAwP

w

Building a statistical tool • Design a model:

– Objective function: generative model vs. discriminative model

– Decomposition: independence assumption– The types of parameters and parameter size

• Training: estimate model parameters– Supervised vs. unsupervised– Smoothing methods

• Decoding:

Team Project 1 (Hw5)• Form a team: program language, schedule, expertise,

etc.

• Understand the lexicalized model

• Design the training algorithm

• Work out the decoding (parsing) algorithm: augment CYK algorithm.

• Illustrate the algorithms with a real example.

Team Project 2

• Task: parse real data with a real grammar extracted from a treebank.

• Parser: PCFG or lexicalized PCFG

• Training data: English Penn Treebank Section 02-21

• Development data: section 00

Team Project 2 (cont)

• Hw6: extract PCFG from the treebank• Hw7: make sure your parser works given

real grammar and real sentences; measure parsing performance

• Hw8: improve parsing results• Hw10: write a report and give a

presentation

Parsing evaluation measures

Evaluation of parsers: ParseVal

• Labeled recall: • Labeled precision: • Labeled F-measure:

• Complete match: % of sents where recall and precision are 100%

• Average crossing: # of crossing per sent• No crossing: % of sents which have no crossing.

An example

Gold standard: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope))))

Parser output: (VP (V saw) (NP (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))))

ParseVal measures

• Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6)

• System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,

6), (NP, 5, 6)

• Recall=4/4, Prec=4/5, crossing=0

A different annotationGold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope)))))

Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))

ParseVal measures (cont)• Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6)

• System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6)

• Recall=4/6, Prec=4/6, crossing=1

EVALB

• A tool that calculates ParseVal measures• To run it:

evalb –p parameter_file gold_file system_output• A copy is available in my dropbox• You will need it for Team Project 2

Summary of Parsing evaluation measures

• ParseVal is the widely used: F-measure is the most important

• The results depend on annotation style• EVALB is a tool that calculates ParseVal

measures• Other measures are used too: e.g.,

accuracy of dependency links

History-based models

History-based models

• History-based approaches maps (T, S) into a decision sequence

• Probability of tree T for sentence S is:

)),....,(|(

),...,|(

),....,(),(

11

11

1

iii

iii

n

ddfdP

dddP

ddPSTP

ndd ,....1

History-based models (cont)

• PCFGs can be viewed as a history-based model

• There are other history-based models– Magerman’s parser (1995)– Collin’s parsers (1996, 1997, ….)– Charniak’s parsers (1996,1997,….)– Ratnaparkhi’s parser (1997)

Collins’ models

• Model 1: Generative model of (Collins, 1996)

• Model 2: Add complement/adjunct distinction

• Model 3: Add wh-movement

Model 1

• First generate the head constituent label• Then generate left and right dependents

),,|)()...((*),,|)()...((*

),|())()...(,,,|)()...((*

),,|)()...((*),|(

),|)()...()()()...()((

11

11

1111

11

1111

hHArRrRPhHAlLlLP

hAHPlLlLhHArRrRP

hHAlLlLPhAHP

hArRrRhHlLlLhAP

mm

nn

H

nnmm

nn

H

mmnn

Model 1(cont)

),,|)((

))()....(,,,|)((

),,|)()...((

),,|)((

))()....(,,,|)((

),,|)()...((

1111

11

1111

11

hHArRP

rRrRhHArRP

hHArRrRP

hHAlLP

lLlLhHAlLP

hHAlLlLP

iiL

iiiiL

mm

iiL

iiiiL

nn

An example

),,|(*),,|(

*),,|)((*),,|)((

*),|(),|(

boughtVPSSTOPPboughtVPSSTOPP

boughtVPSweekNPPboughtVPSmarksNPP

boughtSVPPboughtSruleP

R

L

L

L

H

)()()()(: boughtVPMarksNPweekNPboughtSrule

Sentence: Last week Marks bought Brooks.

Model 2

• Generate a head label H• Choose left and right subcat frames• Generate left and right arguments• Generate left and right modifiers

An example

{}),,,|(*{}),,,|(

*{}),,,|)((*}){,,,|)((

*),,|({}*),,|}({*),|(

),|(

boughtVPSSTOPPboughtVPSSTOPP

boughtVPSweekNPPNPboughtVPSmarksNPP

boughtVPSPboughtVPSNPPboughtSVPP

boughtSruleP

R

L

L

ccL

rcclc

H

)()()()(: boughtVPMarksNPweekNPboughtSrule c

Model 3

• Add Trace and wh-movement• Given that the LHS of a rule has a gap,

there are three ways to pass down the gap– Head: S(+gap)NP VP(+gap)– Left: S(+gap)NP(+gap) VP– Right: SBAR(that)(+gap)WHNP(that)

S(+gap)

Parsing results

LR LP

Model 1 87.4% 88.1%

Model 2 88.1% 88.6%

Model 3 88.1% 88.6%

Tree Adjoining Grammar (TAG)

TAG

• TAG basics:

• Extension of LTAG– Lexicalized TAG (LTAG)– Synchronous TAG (STAG)– Multi-component TAG (MCTAG)– ….

TAG basics• A tree-rewriting formalism (Joshi et. al, 1975)

• It can generate mildly context-sensitive languages.

• The primitive elements of a TAG are elementary trees.

• Elementary trees are combined by two operations: substitution and adjoining.

• TAG has been used in – parsing, semantics, discourse, etc.– Machine translation, summarization, generation, etc.

Two types of elementary trees

VP

ADVP

ADV

still

VP*

Initial tree: Auxiliary tree:

S

NP VP

V NP

draft

Substitution operation

They draft policies

Adjoining operation

Y

Y*Y*

They still draft policies

Derivation tree

Elementary trees

Derived tree

Derivation tree

Derived tree vs. derivation tree

• The mapping is not 1-to-1.• Finding the best derivation is not the same

as finding the best derived tree.

S

V

do

S*

they

PN

NP

Wh-movementWhat do they draft ?

i

S

iNP S

NP VP

V NP

draft

N

what do

PN

they

i

i

S

NP S

V S

NP VPV NP

draft

what

NP

N

What does John think they draft ?

S

V

does

S*

S

NP VP

V S*

think

Long-distance wh-movement

S

SNP

NP VP

V NP

draft i

i

does

think

i

i

S

NP S

V S

NP VP

S

NP VPV

draft

NP

what

John

they

Who did you have dinner with?

have

S

NP VP

NPV

S

NPS*

PN

who

iPP

P NP

with

VP

VP*

i

S

NP

PN

who PP

P NP

with

VP

VP

have

S

NP

NPV

i

i

TAG extension

• Lexicalized TAG (LTAG)• Synchronized TAG (STAG)• Multi-component TAG (MCTAG)• ….

STAG

• The primitive elements in STAG are elementary tree pairs.

• Used for MT

Summary of TAG• A formalism beyond CFG• Primitive elements are trees, not rules• Extended domain of locality• Two operations: substitution and adjoining

• Parsing algorithm: • Statistical parser for TAG• Algorithms for extracting TAG from treebanks.

)( 6nO

Parsing summary

Types of parsers• Phrase structure vs. dependency tree• Statistical vs. rule-based• Grammar-based or not• Supervised vs. unsupervised

Our focus:Phrase structureMainly statisticalMainly Grammar-based: CFG, TAGSupervised

Grammars• Chomsky hierarchy:

– Unstricted grammar (type 0)– Context-sensitive grammar – Context-free grammar– Regular grammarHuman languages are beyond context-free

• Other formalism– HPSG, LFG– TAG– Dependency grammars

Parsing algorithm for CFG

• Top-down• Bottom-up• Top-down with bottom-up filter• Earley algorithm• CYK algorithm

– Requiring CFG to be in CNF– Can be augmented to deal with PCFG,

lexicalized CFG, etc.

Extensions of CFG

• PCFG: find the most likely parse trees

• Lexicalized CFG: – use less strong independence assumption– Account for certain types of lexical and

structural dependency

Beyond CFG

• History-based models– Collins’ parsers

• TAG– Tree-writing– Mildly context-sensitive grammar– Many extensions: LTAG, STAG, …

Statistical approach• Modeling

– Choose the objective function– Decompose the function:

• Common equations: joint, conditional, marginal probabilities• Independency assumptions

• Training: – Supervised vs. unsupervised– Smoothing

• Decoding– Dynamic programming– Pruning

Evaluation of parsers

• Accuracy: ParseVal• Robustness• Resources needed• Efficiency• Richness

Other things

• Converting into CNF:– CFG– PCFG– Lexicalized CFG

• Treebank annotation– Tagset: syntactic labels, POS tag, function

tag, empty categories– Format: indentation, brackets

Probabilistic Parsing

Documents

Transcript of Probabilistic Parsing