Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky...

42
Sentence Parsing Parsing 3 Dynamic Programming

Transcript of Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky...

Page 1: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Sentence Parsing

Parsing 3Dynamic Programming

Page 2: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 2

Acknowledgement

Lecture based on Jurafsky and Martin Ch. 13 (2nd Edition) J & M Lecture Notes

Page 3: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 3

Avoiding Repeated Work

Dynamic Programming CKY Parsing Earley Algorithm Chart Parsing

Page 4: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 4

Dynamic Programming

DP search methods fill tables with partial results and thereby Avoid doing avoidable repeated work Solve exponential problems in polynomial time

(well, no not really) Efficiently store ambiguous structures with

shared sub-parts. We’ll cover two approaches that roughly

correspond to top-down and bottom-up approaches. CKY (after authors Cocke, Kasami and Younger) Earley (often referred to as chart parsing,

because it uses a data structure called a chart)

Page 5: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 5

CKY Parsing

First we’ll limit our grammar to epsilon-free, binary rules (more later)

Key intuition: consider the rule A BC If there is an A somewhere in the input then

there must be a B followed by a C in the input.

If the A spans from i to j in the input then there must be some k st. i<k<j i.e. The B splits from the C someplace.

This intuition plays a role in both CKY and Earley methods.

Page 6: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 6

Problem

What if your grammar isn’t binary? As in the case of the TreeBank grammar?

Convert it to binary… any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically.

What does this mean? The resulting grammar accepts (and rejects)

the same set of strings as the original grammar.

But the resulting derivations (trees) are different.

Page 7: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 7

Problem

More specifically, we want our rules to be of the formA B C

Or

A w

That is, rules can expand to either 2 non-terminals or to a single terminal.

Page 8: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 8

Binarization Intuition

Eliminate chains of unit productions. Introduce new intermediate non-

terminals into the grammar that distribute rules with length > 2 over several rules. So… S A B C turns into S X C andX A BWhere X is a symbol that doesn’t occur

anywhere else in the the grammar.

Page 9: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 9

Sample L1 Grammar

Page 10: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 10

CNF Conversion

Page 11: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 11

CKY

Build a table so that an A spanning from i to j in the input is placed in cell [i,j] in the table.

So a non-terminal spanning an entire string will sit in cell [0, n] Hopefully an S

If we build the table bottom-up, we’ll know that the parts of the A must go from i to k and from k to j, for some k.

Page 12: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 12

CKY

Meaning that for a rule like A B C we should look for a B in [i,k] and a C in [k,j].

In other words, if we think there might be an A spanning i,j in the input… AND

A B C is a rule in the grammar THEN

There must be a B in [i,k] and a C in [k,j] for some i<k<j

Page 13: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 13

CKY

So to fill the table loop over the cell[i,j] values in some systematic way What constraint should we put on that

systematic search?

For each cell, loop over the appropriate k values to search for things to add.

Page 14: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 14

CKY Algorithm

Page 15: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 15

CKY Parsing

Is that really a parser?

Page 16: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 16

CKY Parsing

Is that really a parser? No it’s a recogniser. But the parse tree

can in principle be recovered from the table provided that each time a LH symbol A is put into the table deriving from a rule A B C, a pointer is kept to the RH instances B, C from which it was derived.

The parse tree is then reconstructed by following the pointers from the top item.

Page 17: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 17

Note

We arranged the loops to fill the table a column at a time, from left to right, bottom to top. This assures us that whenever we’re

filling a cell, the parts needed to fill it are already in the table (to the left and below)

It’s somewhat natural in that it processes the input left to right one word at a time Known as online Other ways of filling the table are possible

Page 18: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 18

Example

Page 19: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 19

Example

Filling column 5

Page 20: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 20

Example

Page 21: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 21

Example

Page 22: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 22

Example

Page 23: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 23

Example

Page 24: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 24

CKY Notes

Since it’s bottom up, CKY populates the table with a lot of phantom constituents. Segments that by themselves are

constituents but cannot really occur in the context in which they are being suggested.

To avoid this we can switch to a top-down control strategy

In addition we can add some kind of filtering that blocks constituents where they can not happen in a final analysis.

Page 25: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 25

Earley Parsing

Allows arbitrary CFGs (not just binary ones). This requires dotted rules

Some top-down control Uses a prediction operation

Parallel top-down search Fills a table in a single sweep over

the input Table is length N+1; N is number of

words Table entries consist of dotted rules

Page 26: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 26

Earley Parsing Dynamic Programming: solution

involves filling in table of solutions to subproblems.

Parallel Top Down Search Worst case complexity = O(N3) in

length N of sentence. Table is sometimes called a chart.

Earley parsing also called chart parsing.

Chart contains N+1 entries● book ● that ● flight ● 0 1 2 3

Page 27: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 27

The Chart Each table entry contains a list of

states Each state represents all partial parses

that have been reached so far at that point in the sentence.

States are represented using dotted rules containing information about Rule/subtree: which rule has been used Progress: dot indicates how much of rule's

RHS has been recognised. Position: text segment to which this parse

applies

Page 28: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 28

Examples of Dotted Rules

Initial S Rule (incomplete) S -> . VP,[0,0]

Partially recognised NP (incomplete) NP -> Det . Nominal,[1,2]

Fully recognised VP (complete)VP -> V VP .,[0,3]

These states can also be represented graphically on the chart

Page 29: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 29

The Chart

Page 30: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 30

Earley Algorithm

Main Algorithm: proceeds through each text position, applying one of the three operators below.

Predictor: Creates "initial states" (ie states whose RHS is completely unparsed).

Scanner: checks current input when next category to be recognised is pre-terminal.

Completer: when a state is "complete" (nothing after dot), advance all states to the left that are looking for the associated category.

Page 31: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 31

Earley Main Algorithm

Page 32: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 32

Earley Sub Functions

Page 33: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 33

Early Algorithm – Sub Functions

Predictor(A -> alpha . B beta, [i,j])

for each B -> gamma in Grammar(B)

enqueue((B -> . gamma, [j,j]), chart[j])

Scanner(A -> alpha . B beta, [i,j])

if B in PartOfSpeech(word[j]) then

enqueue((B -> word[j], [j,j+1]), chart[j+1])

Completer(B -> gamma . , [j,k])

for each (A -> . B beta) in chart[j]

enqueue((A -> B . beta , [j,j]), chart[j])

Page 34: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 34

Grammar

S -> NP VPS -> Aux NP VPS -> VPNP -> Det NominalNominal -> NounNominal -> Noun NominalNP -> Proper-NounVP -> VerbVP -> Verb NP

Page 35: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 35

Page 36: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 36

Page 37: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 37

fl

Page 38: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 38

Retrieving Trees

To turn recogniser into a parser, representation of each state must also include information about completed states that generated its constituents

Page 39: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 39

Page 40: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 40

Chart[3]

↑Extra Field

Page 41: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 41

Back to Ambiguity

Did we solve it?

Page 42: Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

Jan 2009 Speech and Language Processing - Jurafsky and Martin 42

Ambiguity

No… Both CKY and Earley will result in

multiple S structures for the [0,N] table entry.

They both efficiently store the sub-parts that are shared between multiple parses.

And they obviously avoid re-deriving those sub-parts.

But neither can tell us which one is right.