Dynamic Programming for Linear-Time Incremental Parsing

88
Dynamic Programming for Linear-Time Incremental Parsing Liang Huang Information Sciences Institute University of Southern California (Joint work with Kenji Sagae, USC/ICT) JHU CLSP Seminar September 14, 2010

Transcript of Dynamic Programming for Linear-Time Incremental Parsing

Page 1: Dynamic Programming for Linear-Time Incremental Parsing

Dynamic Programming for

Linear-Time Incremental Parsing

Liang HuangInformation Sciences Institute

University of Southern California

(Joint work with Kenji Sagae, USC/ICT)

JHU CLSP Seminar September 14, 2010

Page 2: Dynamic Programming for Linear-Time Incremental Parsing

Remembering Fred Jelinek (1932-2010)

Prof. Jelinek hosted my visit and this talk on his last day.

He was very supportive of this work, which is related to his work on structured language models, and I dedicate my work to his memory.

Page 3: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguity and Incrementality

3

• NLP is (almost) all about ambiguity resolution

• human-beings resolve ambiguity incrementally

Page 4: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas;

• NLP is (almost) all about ambiguity resolution

• human-beings resolve ambiguity incrementally

Page 5: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas;

how he got into my pajamas I’ll never know.

• NLP is (almost) all about ambiguity resolution

• human-beings resolve ambiguity incrementally

Page 6: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas;

how he got into my pajamas I’ll never know.

• NLP is (almost) all about ambiguity resolution

• human-beings resolve ambiguity incrementally

Page 7: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas;

how he got into my pajamas I’ll never know.

• NLP is (almost) all about ambiguity resolution

• human-beings resolve ambiguity incrementally

Page 8: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

Ambiguities in Translation

4

Page 9: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

Ambiguities in Translation

4

Page 10: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

Ambiguities in Translation

4

Google translate: carefully slide

Page 11: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

Ambiguities in Translation

4

Google translate: carefully slide

Page 12: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

Ambiguities in Translation

4

Google translate: carefully slide

Page 13: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

If you are stolen...

5

Page 14: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

If you are stolen...

5

Google translate: Once the theft to the police

Page 15: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

or even...

6

Page 16: Dynamic Programming for Linear-Time Incremental Parsing

CS 562 - Intro

or even...

6clear evidence that NLP is used in real life!

Page 17: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

• let’s focus on dependency structures for simplicity

• ambiguous attachments of nearby and in

• ambiguity explodes exponentially with sentence length

• must design efficient (polynomial) search algorithm

• typically using dynamic programming (DP); e.g. CKY

7

Page 18: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

• let’s focus on dependency structures for simplicity

• ambiguous attachments of nearby and in

• ambiguity explodes exponentially with sentence length

• must design efficient (polynomial) search algorithm

• typically using dynamic programming (DP); e.g. CKY

7

Page 19: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

• let’s focus on dependency structures for simplicity

• ambiguous attachments of nearby and in

• ambiguity explodes exponentially with sentence length

• must design efficient (polynomial) search algorithm

• typically using dynamic programming (DP); e.g. CKY

7

Page 20: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

• let’s focus on dependency structures for simplicity

• ambiguous attachments of nearby and in

• ambiguity explodes exponentially with sentence length

• must design efficient (polynomial) search algorithm

• typically using dynamic programming (DP); e.g. CKY

7

Page 21: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

But full DP is too slow...

8

I feed cats nearby in the garden ...

• full DP (like CKY) is too slow (cubic-time)

• while human parsing is fast & incremental (linear-time)

Page 22: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

But full DP is too slow...

8

I feed cats nearby in the garden ...

• full DP (like CKY) is too slow (cubic-time)

• while human parsing is fast & incremental (linear-time)

• how about incremental parsing then?

• yes, but only with greedy search (accuracy suffers)

• explores tiny fraction of trees (even w/ beam search)

Page 23: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

But full DP is too slow...

8

I feed cats nearby in the garden ...

• full DP (like CKY) is too slow (cubic-time)

• while human parsing is fast & incremental (linear-time)

• how about incremental parsing then?

• yes, but only with greedy search (accuracy suffers)

• explores tiny fraction of trees (even w/ beam search)

• can we combine the merits of both approaches?

• a fast, incremental parser with dynamic programming?

• explores exponentially many trees in linear-time?

Page 24: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Linear-Time Incremental DP

9

greedy search

principled search

incremental parsing (e.g. shift-reduce)

(Nivre 04; Collins/Roark 04; ...)

this work:fast shift-reduce parsing with dynamic programming

full DP(e.g. CKY)

(Eisner 96; Collins 99; ...)

fast(linear-time)

slow(cubic-time)

☹☺☺

Page 25: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Big Picture

10

natural languages

programming languages

human

computer

☺psycholinguistics ☹

☹NLP☺

compiler theory(LR, LALR, ...)

Page 26: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Big Picture

10

natural languages

programming languages

human

computer

☺psycholinguistics ☹

☹NLP☺

compiler theory(LR, LALR, ...)

Page 27: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Preview of the Results• very fast linear-time dynamic programming parser

• best reported dependency accuracy on PTB/CTB

• explores exponentially many trees (and outputs forest)

11

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70

parsing time (secs)

sentence length

Page 28: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Preview of the Results• very fast linear-time dynamic programming parser

• best reported dependency accuracy on PTB/CTB

• explores exponentially many trees (and outputs forest)

11

Ch

arn

iak

Ber

kele

yM

ST

this work 0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70

parsing time (secs)

sentence length

Page 29: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Preview of the Results• very fast linear-time dynamic programming parser

• best reported dependency accuracy on PTB/CTB

• explores exponentially many trees (and outputs forest)

11

Ch

arn

iak

Ber

kele

yM

ST

this work 0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70

parsing time (secs)

sentence length

100

102

104

106

108

1010

0 10 20 30 40 50 60 70number of trees explored

sentence length

DP: exponen

tial

non-DP beam search

Page 30: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Outline

• Motivation

• Incremental (Shift-Reduce) Parsing

• Dynamic Programming for Incremental Parsing

• Experiments

12

Page 31: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

Shift-Reduce Parsing

13

action stack queue

I feed cats nearby in the garden.

0 -

Page 32: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

Shift-Reduce Parsing

14

action stack queue

I feed cats nearby in the garden.

I

0 -

1 shift

Page 33: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

Shift-Reduce Parsing

15

action stack queue

I feed cats nearby in the garden.

I feed

I

0 -

1 shift

2 shift

Page 34: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

Shift-Reduce Parsing

16

action stack queue

I feed cats nearby in the garden.

I feed

I

feedI

0 -

1 shift

2 shift

3 l-reduce

Page 35: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

Shift-Reduce Parsing

17

action stack queue

I feed cats nearby in the garden.

I feed

I

feedI feed cats

I

0 -

1 shift

2 shift

3 l-reduce

4 shift

Page 36: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Shift-Reduce Parsing

18

action stack queue

I feed cats nearby in the garden.

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

I feed

I

feedI feed cats

I feed

I cats

0 -

1 shift

2 shift

3 l-reduce

4 shift

5a r-reduce

Page 37: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Shift-Reduce Parsing

19

action stack queue

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

I feed cats nearby in the garden.

I feed

I

feedI feed cats

I feed

I cats

feed cats nearbyI

0 -

1 shift

2 shift

3 l-reduce

4 shift

5a r-reduce

5b shift

Page 38: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Shift-Reduce Parsing

20

action stack queue

shift-reduceconflict

I feed cats nearby in the garden.

I feed cats ...

feed cats nearby ...

cats nearby in ...

cats nearby in ...

nearby in the ...

nearby in the ...

in the garden ...

I feed

I

feedI feed cats

I feed

I cats

feed cats nearbyI

0 -

1 shift

2 shift

3 l-reduce

4 shift

5a r-reduce

5b shift

Page 39: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Choosing Parser Actions

• score each action using features f and weights w

• features are drawn from a local window

• abstraction (or signature) of a state -- this inspires DP!

• weights trained by structured perceptron (Collins 02)

21

... s2 s1 s0 q0 q1 ...

← stack queue →← stack queue →

features:(s0.w, s0.rc, q0, ...) = (cats, nearby, in, ...)

... feed cats I nearby

in the garden ...

Page 40: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Greedy Search

22

• each state => three new states (shift, l-reduce, r-reduce)

• search space should be exponential

• greedy search: always pick the best next state

Page 41: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Greedy Search

23

• each state => three new states (shift, l-reduce, r-reduce)

• search space should be exponential

• greedy search: always pick the best next state

Page 42: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Beam Search

24

• each state => three new states (shift, l-reduce, r-reduce)

• search space should be exponential

• beam search: always keep top-b states

Page 43: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Dynamic Programming• each state => three new states (shift, l-reduce, r-reduce)

• key idea of DP: share common subproblems

• merge equivalent states => polynomial space

25

Page 44: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Dynamic Programming• each state => three new states (shift, l-reduce, r-reduce)

• key idea of DP: share common subproblems

• merge equivalent states => polynomial space

26

“graph-structured stack” (Tomita, 1988)

Page 45: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Dynamic Programming• each state => three new states (shift, l-reduce, r-reduce)

• key idea of DP: share common subproblems

• merge equivalent states => polynomial space

27

“graph-structured stack” (Tomita, 1988)

Page 46: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Dynamic Programming• each state => three new states (shift, l-reduce, r-reduce)

• key idea of DP: share common subproblems

• merge equivalent states => polynomial space

27

“graph-structured stack” (Tomita, 1988)

each DP state corresponds toexponentially many non-DP states

Page 47: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Dynamic Programming• each state => three new states (shift, l-reduce, r-reduce)

• key idea of DP: share common subproblems

• merge equivalent states => polynomial space

28

“graph-structured stack” (Tomita, 1988)

100

102

104

106

108

1010

0 10 20 30 40 50 60 70number of trees explored

sentence length

DP: exponen

tial

non-DP beam search

each DP state corresponds toexponentially many non-DP states

Page 48: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI

• feed cats nearby in the gardenI

29

... s2 s1 s0 q0 q1 ...

← stack queue →

... catsre

feed... feedIshsh

Page 49: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI

• feed cats nearby in the gardenI

29

... s2 s1 s0 q0 q1 ...

← stack queue →

... catsre

feed... feedIshsh assume features only

look at root of s0

Page 50: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI

• feed cats nearby in the gardenI

29

... s2 s1 s0 q0 q1 ...

← stack queue →

... catsre

feed... feedIshsh assume features only

look at root of s0

two states are equivalent if they agree on root of s0

Page 51: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI

• feed cats nearby in the gardenI

29

... s2 s1 s0 q0 q1 ...

← stack queue →

... catsre

feed... feedIshsh assume features only

look at root of s0

two states are equivalent if they agree on root of s0

Page 52: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI

• feed cats nearby in the gardenI cats

30

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

...

Page 53: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI

• feed cats nearby in the gardenI cats

30

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

...

Page 54: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed cats nearby in the gardenI nearby

• feed cats nearby in the gardenI cats

31

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

re... cats

sh... nearby

...

Page 55: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed in the gardenI cats nearby

• feed in the gardenI cats nearby

32

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

re... cats ... feedre

... feedresh

... nearby

...

Page 56: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed in the gardenI cats nearby

• feed in the gardenI cats nearby

32

... s2 s1 s0 q0 q1 ...

← stack queue →

equivalent if featuresonly look at s0 and q0

sh

re... cats

... nearby

... feed

re... cats ... feedre

... feedresh

... nearby

...

Page 57: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed in the gardenI cats nearby

• feed in the gardenI cats nearby

33

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

re... cats re

... feedresh

... nearby

...

Page 58: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed in the gardenI cats nearby

• feed in the gardenI cats nearby

33

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

re... cats re

... feedresh

... nearby

...

(local) ambiguity-packing!

Page 59: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed in the gardenI cats nearby

• feed in the gardenI cats nearby

34

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

re... cats re

... feedresh

... nearby... in

sh

...

Page 60: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Merging Equivalent States

• two states are equivalent if they agree on features

• because same features guarantee same cost

• shift-reduce conflict:

• feed in the gardenI cats nearby

• feed in the gardenI cats nearby

34

... s2 s1 s0 q0 q1 ...

← stack queue →

sh

re... cats

... nearby

... feed

re... cats re

... feedresh

... nearby... in

sh

...

graph-structured stack

Page 61: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Theory: Polynomial-Time DP

• this DP is exact and polynomial-time if features are:

• a) bounded -- for polynomial time

• features can only look at a local window

• b) monotonic -- for correctness (optimal substructure)

• features should draw no more info from trees farther away from stack top than from trees closer to top

• both are intuitive: a) always true; b) almost always true35

... s2 s1 s0 q0 q1 ...

← stack queue →

Page 62: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Theory: Monotonic History• related: grammar refinement by annotation (Johnson, 1998)

• annotate vertical context history (e.g., parent)

• monotonicity: can’t annotate grand-parent without annotating the parent (otherwise DP would fail)

• our features: left-context history instead of vertical-context

• similarly, can’t annotate s2 without annotating s1

• but we can always design “minimum monotonic superset”

36

parent

grand-parent

s1

s2

s0

stack

Page 63: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Related Work

• Graph-Structured Stack (Tomita 88): Generalized LR

• GSS is just a chart viewed from left to right (e.g. Earley 70)

• this line of work started w/ Lang (1974); stuck since 1990

• b/c explicit LR table is impossible with modern grammars

• Jelinek (2004) independently rediscovered GSS

37

Page 64: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Related Work

• Graph-Structured Stack (Tomita 88): Generalized LR

• GSS is just a chart viewed from left to right (e.g. Earley 70)

• this line of work started w/ Lang (1974); stuck since 1990

• b/c explicit LR table is impossible with modern grammars

• Jelinek (2004) independently rediscovered GSS

• We revived and advanced this line of work in two aspects

• theoretical: implicit LR table based on features

• merge and split on-the-fly; no pre-compilation needed

• monotonic feature functions guarantee correctness (new)

• practical: achieved linear-time performance with pruning

37

Page 65: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Jelinek (2004)

38In: M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.): Mathematical Foundations of Speech and Language Processing, 2004

Page 66: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Jelinek (2004)

38

graph-structured stack!

In: M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.): Mathematical Foundations of Speech and Language Processing, 2004

Page 67: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Jelinek (2004)

38

I don’t know anything about this paper...

graph-structured stack!

In: M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.): Mathematical Foundations of Speech and Language Processing, 2004

Page 68: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Jelinek (2004)• structured language model as graph-structured stack

39see also (Chelba and Jelinek, 98; 00; Xu, Chelba, Jelinek, 02)

Page 69: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Jelinek (2004)• structured language model as graph-structured stack

39

pSLM( a | has, show)

p3gram( a | its, host)

see also (Chelba and Jelinek, 98; 00; Xu, Chelba, Jelinek, 02)

Page 70: Dynamic Programming for Linear-Time Incremental Parsing

Experiments

Page 71: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Speed Comparison

41

• 5 times faster with the same parsing accuracy

time (hours)

non-DPDP

Page 72: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Correlation of Search and Parsing

42

92.2

92.3

92.4

92.5

92.6

92.7

92.8

92.9

93

93.1

2365 2370 2375 2380 2385 2390 2395

dependency accuracy

average model score

DPnon-DP

• better search quality <=> better parsing accuracy

Page 73: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Search Space: Exponential

43

100

102

104

106

108

1010

0 10 20 30 40 50 60 70number of trees explored

sentence length

DP: exponen

tial

non-DP: fixed (beam-width)num

ber

of t

rees

exp

lore

d

Page 74: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

N-Best / Forest Oracles

44

DP forest oracle (98.15)

DP k-best in forest

non-

DP

k-bes

t

in be

am

Page 75: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Better Search => Better Learning

45

• DP leads to faster and better learning w/ perceptron

Page 76: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Learning Details: Early Updates• greedy search: update at first error

• beam search: update when gold is pruned (Collins/Roark 04)

• DP search: also update when gold is “merged” (new!)

• b/c we know gold can’t make to the top again

46

DP non-DP

Page 77: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Parsing Time vs. Sentence Length

47

• parsing speed (scatter plot) compared to other parsers

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70

parsing time (secs)

sentence length

Page 78: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Parsing Time vs. Sentence Length

47

• parsing speed (scatter plot) compared to other parsers

Ch

arn

iak

Ber

kele

yM

ST

this work

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70

parsing time (secs)

sentence length

Page 79: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Parsing Time vs. Sentence Length

47

• parsing speed (scatter plot) compared to other parsers

Ch

arn

iak

Ber

kele

yM

ST

this work

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70

parsing time (secs)

sentence length

O(n2)

O(n)

O(n2.4)O(n2.5)

Page 80: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Final Results• much faster than major parsers (even with Python!)

• first linear-time incremental dynamic programming parser

• best reported dependency accuracy on Penn Treebank

time complexity trees searched

0.12 O(n2) exponential

- O(n4) exponential

0.11 O(n) constant

0.04 O(n) exponential

0.49 O(n2.5) exponential

0.21 O(n2.4) exponential

McDonald et al 05 - MST

Koo et al 08 baseline*

Zhang & Clark 08 single

this work

Charniak 00

Petrov & Klein 07

89 91 93

92.4

92.5

92.1

91.4

92.0

90.2

Page 81: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Final Results• much faster than major parsers (even with Python!)

• first linear-time incremental dynamic programming parser

• best reported dependency accuracy on Penn Treebank

time complexity trees searched

0.12 O(n2) exponential

- O(n4) exponential

0.11 O(n) constant

0.04 O(n) exponential

0.49 O(n2.5) exponential

0.21 O(n2.4) exponential

McDonald et al 05 - MST

Koo et al 08 baseline*

Zhang & Clark 08 single

this work

Charniak 00

Petrov & Klein 07

89 91 93

92.4

92.5

92.1

91.4

92.0

90.2

Page 82: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Final Results• much faster than major parsers (even with Python!)

• first linear-time incremental dynamic programming parser

• best reported dependency accuracy on Penn Treebank

time complexity trees searched

0.12 O(n2) exponential

- O(n4) exponential

0.11 O(n) constant

0.04 O(n) exponential

0.49 O(n2.5) exponential

0.21 O(n2.4) exponential

McDonald et al 05 - MST

Koo et al 08 baseline*

Zhang & Clark 08 single

this work

Charniak 00

Petrov & Klein 07

89 91 93

92.4

92.5

92.1

91.4

92.0

90.2

*at this ACL: Koo & Collins 10: 93.0 with O(n4)

Page 83: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Final Results on Chinese

• also the best parsing accuracy on Chinese

• Penn Chinese Treebank (CTB 5)

• all numbers below use gold-standard POS tags

49

Duan et al. 2007

Zhang & Clark 08 (single)

this work

70 85

78.3

76.7

73.7

85.5

84.7

84.4

85.2

84.3

83.9 wordnon-rootroot

Page 84: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Conclusion

50

greedy search

principled search

incremental parsing

(e.g. shift-reduce)

☺✓full dynamic

programming(e.g. CKY)

fast(linear-time)

slow(cubic-time)

Page 85: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Conclusion

50

greedy search

principled search

incremental parsing

(e.g. shift-reduce)

☺✓full dynamic

programming(e.g. CKY)

fast(linear-time)

slow(cubic-time)

linear-timeshift-reduce parsing

w/ dynamic programming

Page 86: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Zoom out to Big Picture...

51

natural languages

programming languages

human

computer

☺psycholinguistics ☹

☺?NLP

☺ compiler theory

still a long way to go...

Page 87: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

Thank You

• a general theory of DP for shift-reduce parsing

• as long as features are bounded and monotonic

• fast, accurate DP parser release coming soon:

• http://www.isi.edu/~lhuang

• future work

• adapt to constituency parsing (straightforward)

• other grammar formalisms like CCG and TAG

• integrate POS tagging into the parser

• integrate semantic interpretation52

Page 88: Dynamic Programming for Linear-Time Incremental Parsing

DP for Incremental Parsing

How I was invited to give this talk• Fred attended ACL 2010 in Sweden

• Mark Johnson mentioned to him about this work

• Fred saw my co-author Kenji Sagae giving the talk

• but didn’t realize it was Kenji; he thought it was me

• he emailed me (but mis-spelled my name in the address)

• not getting a reply, he asked Kevin Knight to “forward it to Liang Haung or his student Sagae.”

• Fred complained that my paper is very hard to read“As you can see, I am completely confused!” And he was right.

• finally he said “come here to give a talk and explain it.”53