CSCI 5832 Natural Language Processing

49
03/25/22 1 CSCI 5832 Natural Language Processing Jim Martin Lecture 14

description

CSCI 5832 Natural Language Processing. Jim Martin Lecture 14. Today 3/4. Parsing CKY again Earley. Sample Grammar. Dynamic Programming. DP methods fill tables with partial results and Do not do too much avoidable repeated work Solve exponential problems in polynomial time (sort of) - PowerPoint PPT Presentation

Transcript of CSCI 5832 Natural Language Processing

Page 1: CSCI 5832 Natural Language Processing

041923 1

CSCI 5832Natural Language Processing

Jim Martin

Lecture 14

041923 2

Today 34

bull Parsing CKY again Earley

041923 3

Sample Grammar

041923 4

Dynamic Programming

bull DP methods fill tables with partial results and Do not do too much avoidable repeated work Solve exponential problems in polynomial

time (sort of) Efficiently store ambiguous structures with

shared sub-parts

041923 5

CKY Parsing

bull First wersquoll limit our grammar to epsilon-free binary rules (more later)

bull Consider the rule A -gt BC If there is an A in the input then there must

be a B followed by a C in the input If the A spans from i to j in the input then

there must be some k st iltkltj Ie The B splits from the C someplace

041923 6

CKY

bull So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

bull So a non-terminal spanning an entire string will sit in cell [0 n]

bull If we build the table bottom up wersquoll know that the parts of the A must go from i to k and from k to j

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 2: CSCI 5832 Natural Language Processing

041923 2

Today 34

bull Parsing CKY again Earley

041923 3

Sample Grammar

041923 4

Dynamic Programming

bull DP methods fill tables with partial results and Do not do too much avoidable repeated work Solve exponential problems in polynomial

time (sort of) Efficiently store ambiguous structures with

shared sub-parts

041923 5

CKY Parsing

bull First wersquoll limit our grammar to epsilon-free binary rules (more later)

bull Consider the rule A -gt BC If there is an A in the input then there must

be a B followed by a C in the input If the A spans from i to j in the input then

there must be some k st iltkltj Ie The B splits from the C someplace

041923 6

CKY

bull So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

bull So a non-terminal spanning an entire string will sit in cell [0 n]

bull If we build the table bottom up wersquoll know that the parts of the A must go from i to k and from k to j

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 3: CSCI 5832 Natural Language Processing

041923 3

Sample Grammar

041923 4

Dynamic Programming

bull DP methods fill tables with partial results and Do not do too much avoidable repeated work Solve exponential problems in polynomial

time (sort of) Efficiently store ambiguous structures with

shared sub-parts

041923 5

CKY Parsing

bull First wersquoll limit our grammar to epsilon-free binary rules (more later)

bull Consider the rule A -gt BC If there is an A in the input then there must

be a B followed by a C in the input If the A spans from i to j in the input then

there must be some k st iltkltj Ie The B splits from the C someplace

041923 6

CKY

bull So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

bull So a non-terminal spanning an entire string will sit in cell [0 n]

bull If we build the table bottom up wersquoll know that the parts of the A must go from i to k and from k to j

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 4: CSCI 5832 Natural Language Processing

041923 4

Dynamic Programming

bull DP methods fill tables with partial results and Do not do too much avoidable repeated work Solve exponential problems in polynomial

time (sort of) Efficiently store ambiguous structures with

shared sub-parts

041923 5

CKY Parsing

bull First wersquoll limit our grammar to epsilon-free binary rules (more later)

bull Consider the rule A -gt BC If there is an A in the input then there must

be a B followed by a C in the input If the A spans from i to j in the input then

there must be some k st iltkltj Ie The B splits from the C someplace

041923 6

CKY

bull So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

bull So a non-terminal spanning an entire string will sit in cell [0 n]

bull If we build the table bottom up wersquoll know that the parts of the A must go from i to k and from k to j

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 5: CSCI 5832 Natural Language Processing

041923 5

CKY Parsing

bull First wersquoll limit our grammar to epsilon-free binary rules (more later)

bull Consider the rule A -gt BC If there is an A in the input then there must

be a B followed by a C in the input If the A spans from i to j in the input then

there must be some k st iltkltj Ie The B splits from the C someplace

041923 6

CKY

bull So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

bull So a non-terminal spanning an entire string will sit in cell [0 n]

bull If we build the table bottom up wersquoll know that the parts of the A must go from i to k and from k to j

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 6: CSCI 5832 Natural Language Processing

041923 6

CKY

bull So letrsquos build a table so that an A spanning from i to j in the input is placed in cell [ij] in the table

bull So a non-terminal spanning an entire string will sit in cell [0 n]

bull If we build the table bottom up wersquoll know that the parts of the A must go from i to k and from k to j

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 7: CSCI 5832 Natural Language Processing

041923 7

CKY

bull Meaning that for a rule like A -gt B C we should look for a B in [ik] and a C in [kj]

bull In other words if we think there might be an A spanning ij in the inputhellip AND

bull A -gt B C is a rule in the grammar THENbull There must be a B in [ik] and a C in [kj]

for some iltkltj

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 8: CSCI 5832 Natural Language Processing

041923 8

CKY

bull So to fill the table loop over the cell[ij] values in some systematic way What constraint should we put on that

For each cell loop over the appropriate k values to search for things to add

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 9: CSCI 5832 Natural Language Processing

041923 9

CKY Table

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 10: CSCI 5832 Natural Language Processing

041923 10

CKY Algorithm

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 11: CSCI 5832 Natural Language Processing

041923 11

CKY Parsing

bull Is that really a parser

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 12: CSCI 5832 Natural Language Processing

041923 12

Note

bull We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever wersquore filling a

cell the parts needed to fill it are already in the table (to the left and below)

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 13: CSCI 5832 Natural Language Processing

041923 13

Example

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 14: CSCI 5832 Natural Language Processing

041923 14

Other Ways to Do It

bull Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 15: CSCI 5832 Natural Language Processing

041923 15

Other Ways to Do It

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 16: CSCI 5832 Natural Language Processing

041923 16

Sample Grammar

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 17: CSCI 5832 Natural Language Processing

041923 17

Problem

bull What if your grammar isnrsquot binary As in the case of the TreeBank grammar

bull Convert it to binaryhellip any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically

bull What does this mean The resulting grammar accepts (and rejects) the

same set of strings as the original grammar But the resulting derivations (trees) are different

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 18: CSCI 5832 Natural Language Processing

041923 18

Problem

bull More specifically rules have to be of the formA -gt B C

Or

A -gt w

That is rules can expand to either 2 non-terminals or to a single terminal

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 19: CSCI 5832 Natural Language Processing

041923 19

Binarization Intuition

bull Eliminate chains of unit productionsbull Introduce new intermediate non-terminals into the

grammar that distribute rules with length gt 2 over several rules SohellipS -gt A B C

Turns into

S -gt X CX - A B

Where X is a symbol that doesnrsquot occur anywhere else in the the grammar

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 20: CSCI 5832 Natural Language Processing

041923 20

CNF Conversion

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 21: CSCI 5832 Natural Language Processing

041923 21

CKY Algorithm

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 22: CSCI 5832 Natural Language Processing

041923 22

Example

Filling column 5

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 23: CSCI 5832 Natural Language Processing

041923 23

Example

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 24: CSCI 5832 Natural Language Processing

041923 24

Example

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 25: CSCI 5832 Natural Language Processing

041923 25

Example

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 26: CSCI 5832 Natural Language Processing

041923 26

Example

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 27: CSCI 5832 Natural Language Processing

041923 27

CKY Notes

bull Since itrsquos bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but

cannot really occur in the context in which they are being suggested

To avoid this we can switch to a top-down control strategy or

We can add some kind of filtering that blocks constituents where they can not happen in a final analysis

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 28: CSCI 5832 Natural Language Processing

041923 28

Break

bull Quiz pushed back to Tues 318 Schedule

Today CKY and Earley Thursday Partial parsing chunking and more on

statistical sequence processing Next week statistical parsing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 29: CSCI 5832 Natural Language Processing

041923 29

Earley Parsing

bull Allows arbitrary CFGsbull Top-down controlbull Fills a table in a single sweep over the input

words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locations In-progress constituents Predicted constituents

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 30: CSCI 5832 Natural Language Processing

041923 30

States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 31: CSCI 5832 Natural Language Processing

041923 31

StatesLocations

bull S -gt VP [00]

bull NP -gt Det Nominal [12]

bull VP -gt V NP [03]

bull A VP is predicted at the start of the sentence

bull An NP is in progress the Det goes from 1 to 2

bull A VP has been found starting at 0 and ending at 3

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 32: CSCI 5832 Natural Language Processing

041923 32

Graphically

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 33: CSCI 5832 Natural Language Processing

041923 33

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n and is complete

bull If thatrsquos the case yoursquore done S ndashgt α [0n]

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 34: CSCI 5832 Natural Language Processing

041923 34

Earley

bull So sweep through the table from 0 to nhellip New predicted states are created by starting

top-down from S New incomplete states are created by

advancing existing states as new constituents are discovered

New complete states are created in the same way

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 35: CSCI 5832 Natural Language Processing

041923 35

Earley

bull More specificallyhellip1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches

2 Generate new predictions

3 Go to step 2

3 Look at n to see if you have a winner

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 36: CSCI 5832 Natural Language Processing

041923 36

Earley Code

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 37: CSCI 5832 Natural Language Processing

041923 37

Earley Code

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 38: CSCI 5832 Natural Language Processing

041923 38

Example

bull Book that flight

bull We should findhellip an S from 0 to 3 that is a completed statehellip

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 39: CSCI 5832 Natural Language Processing

041923 39

Example

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 40: CSCI 5832 Natural Language Processing

041923 40

Add To Chart

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 41: CSCI 5832 Natural Language Processing

041923 41

Example

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 42: CSCI 5832 Natural Language Processing

041923 42

Example

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 43: CSCI 5832 Natural Language Processing

041923 43

Efficiency

bull For such a simple example there seems to be a lot of useless stuff in there

bull Why

bull Itrsquos predicting things that arenrsquot consistent with the input bullThatrsquos the flipside to the CKY problem

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 44: CSCI 5832 Natural Language Processing

041923 44

Details

bull As with CKY that isnrsquot a parser until we add the backpointers so that each state knows where it came from

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 45: CSCI 5832 Natural Language Processing

041923 45

Back to Ambiguity

bull Did we solve it

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 46: CSCI 5832 Natural Language Processing

041923 46

Ambiguity

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 47: CSCI 5832 Natural Language Processing

041923 47

Ambiguity

bull Nohellip Both CKY and Earley will result in multiple S

structures for the [0n] table entry They both efficiently store the sub-parts that

are shared between multiple parses And they obviously avoid re-deriving those

sub-parts But neither can tell us which one is right

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 48: CSCI 5832 Natural Language Processing

041923 48

Ambiguity

bull In most cases humans donrsquot notice incidental ambiguity (lexical or syntactic) It is resolved on the fly and never noticed

bull Wersquoll try to model that with probabilities

bull But note something odd and important about the Groucho Marx examplehellip

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing

Page 49: CSCI 5832 Natural Language Processing

041923 49

Next Time

bull Partial Parsing and chunkingbull After that wersquoll move on to probabilistic parsing