Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

44
1 Christel Kemke 2007/08 COMP 4060 Natural Language Processing PARSING
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

Page 1: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

1

Christel Kemke 2007/08

COMP 4060 Natural Language Processing

PARSING

Page 2: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2

2007/08 Christel Kemke

Parsing

Language, Syntax, Parsing Problems in Parsing

Ambiguity Attachment / Binding Bottom vs. Top Down Parsing

Chart-Parsing Earley-Algorithm

Page 3: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

3

2007/08 Christel Kemke

Natural Language - Parsing

Parsing • derive the syntactic structure of a sentence

based on a language model (grammar)

• construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite system)

Page 4: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

4

2007/08 Christel Kemke

Natural Language - Grammar

Natural Language Syntax described through a formal language, often a context-free grammar (CFG):

G=(NT,T,P,S):

• the Start-Symbol SNT ≡ sentence symbol

• Non-Terminals NT ≡ syntactic constituents

• Terminals T ≡ lexical entries/ words

• Production Rules P NT (NTT)+ ≡ grammar rules

Page 5: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

5

Sample GrammarGrammar (S, NT, T, P) Sentence Symbol S NT, Part-of-Speech NT, Constituents NT,Terminals, Word TGrammar Rules P NT (NT T)*

S NP VP statementS Aux NP VP questionS VP commandNP Det Nominal NP Proper-Noun Nominal Noun | Noun Nominal | Nominal PPVP Verb | Verb NP | Verb PP | Verb NP PP PP Prep NP

Det that | this | aNoun book | flight | meal | moneyProper-Noun Houston | American Airlines | TWAVerb book | include | preferAux doesPrep from | to | on

Page 6: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

6

2007/08 Christel Kemke

Parsing Task

Parse

"Does this flight include a meal?"

Page 7: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

7

2007/08 Christel Kemke

Parse "Does this flight include a meal?"

S

Aux NP VP

Det Nominal Verb NP

Noun Det Nominal

does this flight include a meal

Sample Parse Tree

Page 8: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

8

2007/08 Christel Kemke

Problems in Parsing - Ambiguity

Ambiguity

syntactical/structural ambiguity – several parse trees are possible e.g. above sentence

semantic/lexical ambiguity – several word meanings e.g. bank (where you get money) and (river) bank

even different word categories possible (interim) e.g. “He books the flight.” vs. “The books are here.“ or “Fruit flies from the balcony” vs. “Fruit flies are on the balcony.”

“Peter saw Mary with the telescope / her friend / his friend.”

Page 9: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

9

2007/08 Christel Kemke

Problems in Parsing – Attachment 1

Attachment

in particular PP (prepositional phrase) binding; often referred to as binding problem.

See next slides.

Page 10: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

10

2007/08 Christel Kemke

Problems in Parsing – Attachment 2

“One morning, I shot an elephant in my pajamas.”

Binding 2: VP Verb NP and NP Det Nominal and Nominal Nominal PP and Nominal Noun

(S ... (NP (PNoun I )) (VP (Verb shot ) (NP (Det an) (Nominal (Nominal (Noun elephant ) (PP in my pajamas )... )

Binding 1: VP Verb NP PP

(S ... (NP (PNoun I )(VP (Verb shot ) (NP (Det an (Nominal (Noun elephant ))) (PP in my pajamas ))...)

Page 11: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

11

2007/08 Christel Kemke

Problems in Parsing – Attachment 3

“One morning, I shot an elephant in my pajamas.”

Binding 2: VP Verb NP and NP Det Nominal and Nominal Nominal PP and Nominal Noun

(S ... (NP (PNoun I )) (VP (Verb shot ) (NP (Det an) (Nominal (Nominal (Noun elephant ) (PP in my pajamas )... )

“How he got into them, I don’t know.”

Page 12: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

12

2007/08 Christel Kemke

Bottom-up – from word-nodes to sentence-symbol Top-down Parsing – from sentence-symbol to words

S

Aux NP VP

Det Nominal Verb NP

Noun Det

Nominal

does this flight include a meal

Bottom-up and Top-down Parsing

Page 13: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

13

2007/08 Christel Kemke

Problems with Bottom-up and Top-down Parsing

Problems with left-recursive rules like NP NP PP: don’t know how many times recursion is needed

Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid (several grammar rules applicable ‘interim’ ambiguity).

Combine top-down and bottom-up approach:Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right).

Chart-Parsing / Earley-Parser

Page 14: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

14

2007/08 Christel Kemke

Chart Parsing / Early Algorithm

Earley-Parser based on Chart-ParsingEssence: Integrate top-down and bottom-up parsing. Keep

recognized sub-structures (sub-trees) for shared use during parsing.

Top-down: Start with S-symbol. Generate all applicable rules for S. Go further down with left-most constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is a word category (POS).

Bottom-up: Read input word and compare. If word matches, mark as recognized and move parsing on to the next category in the rule(s).

Page 15: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

15

2007/08 Christel Kemke

Chart

A Chart is a graph with n+1 nodes marked 0 to n for a sequence of n input words. Arcs indicate recognized part of RHS of rule.The • indicates recognized constituents in rules.

Jurafsky & Martin, Figure 10.15, p. 380

Page 16: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

16

2007/08 Christel Kemke

Chart Parsing / Earley Parser 1

ChartSequence of n input words; n+1 nodes marked 0 to n.States in chart represent possible rules and recognized

constituents.RHS of recognized rule is covered by arc.Interim stateS • VP, [0,0] top-down look at rule S VP nothing of RHS of rule yet recognized (• is far left) arc at beginning, no coverage (covers no input word;

beginning of arc at node 0 and end of arc at node 0)

Page 17: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

17

2007/08 Christel Kemke

Chart Parsing / Earley Parser 2

Interim statesNP Det • Nominal, [1,2] top-down look with rule NP Det • Nominal Det recognized (• after Det) arc covers one input word which is between node 1 and

node 2 look next for Nominal, top-down

NP Det Nominal • , [1,3] Nominal was recognized, move • after Nominal move end of arc to cover Nominal; change 2 to 3 structure is completely recognized; arc is inactive; mark NP as recognized in other rules (move • ), bottom up

Page 18: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

18

Chart - 0

Book this flight

S . VPVP . V NP

Page 19: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

19

Chart - 1

VP V . NP

V

Book this flight

S . VP

NP . Det Nom

Page 20: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

20

Chart - 2

VP V . NP

V

Book this flight

S . VP

NP Det . Nom

DetNom . Noun

Page 21: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

21

Chart - 3a

VP V . NP

V

Book this flight

S . VP

NP Det . Nom

DetNom Noun .

Noun

Page 22: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

22

Chart - 3b

VP V . NP

V

Book this flight

S . VP

NP Det Nom .

DetNom Noun .

Noun

Page 23: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

23

Chart - 3c

VP V NP .

V

Book this flight

NP Det Nom .

DetNom Noun .

Noun

S . VP

Page 24: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

24

Chart - 3d

VP V NP .

V

Book this flight

S VP .

NP Det Nom .

DetNom Noun .

Noun

Page 25: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

25

Chart – Valid and Invalid Rules/Arcs

NP Det Nom .

VP V . NP

Nom Noun .

V Det Noun

Book this flight

S . VP

VP . V NP

NP . Det Nom

NP Det . Nom

VP V NP .

S VP .

Nom . Noun

Page 26: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

26

Chart - Final States

NP Det Nom .

Nom Noun .

V Det Noun

Book this flight

VP V NP .

S VP .

Page 27: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

27

Chart 0 with two S- and two VP-Rules

Book this flight

S . VP

VP . V NP

additional S-ruleS . VP NP

additional VP-ruleVP . V

Page 28: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

28

Chart 1a with two S- and two VP-Rules

VP V . NP

V

Book this flight

S . VP

NP . Det Nom

S . VP NP

VP V .

Page 29: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

29

Chart 1b with two S- and two VP-Rules

VP V . NP

V

Book this flight

S VP .

NP . Det Nom

S VP . NP

VP V .

Page 30: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

30

Chart 2 with two S- and two VP-Rules

VP V . NP

V

Book this flight

S VP .

NP Det . NomS VP . NP

VP V .

Nom . Noun

Page 31: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

31

VP V NP .

V

Book this flight

S VP .

NP Det Nom .

Det

Nom Noun .

S VP NP .

VP V .

Chart 3 with two S- and two VP-Rules

Noun

Page 32: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

32

NP Det Nom .

Final Chart - with two S-and two VP-Rules

VP V NP .

V

Book this flight

S VP NP .

DetNom Noun .

Noun

S VP .

VP V .

Page 33: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

33

Christel Kemke 2007/08

Earley Parser

Page 34: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

34

Earley Algorithm - Functions

predictorgenerates new rules for partly recognized RHS with constituent right of • (top-down generation)

scannerif word category (POS) is found right of the • , the Scanner reads the next input word and adds a rule for it to the chart (bottom-up mode)

completerif rule is completely recognized (the • is far right), the recognition state of earlier rules in the chart advances: the • is moved over the recognized constituent (bottom-up recognition).

Page 35: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

35

2007/08 Christel Kemke

Earley – Chart for

“book that flight”

including references to completed states/rules

Page 36: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

36

2007/08 Christel Kemke

Earley – Chart for

“book that flight”

from 2nd edition

Page 37: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

37

function EARLEY-PARSE(words, grammar) returns chartENQUEUE(( S, [0,0]), chart[0])for i_from 0 to LENGTH(words) do

for each state in chart[i] doif INCOMPLETE?(state) and NEXT-CAT(state) is not a part of

speech then PREDICTOR(state) elseif INCOMPLETE?(state) and NEXT-CAT(state)is a part of

speech then SCANNER(state)else COMPLETER(state)

endendreturn(chart) -

continued -

Earley-Algorithm

Page 38: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

38

procedure PREDICTOR((A B , [i,j]))for each (B ) in GRAMMAR-RULES-FOR(B, grammar) do ENQUEUE((B [j,j], chart[j])

end

procedure SCANNER ((A B , [i,j]))if B PARTS-OF-SPEECH(word[j]) then ENQUEUE((B word[j], [j,j+1]), chart[j+1])

end

procedure COMPLETER ((B , [j,k]))for each (A B , [i,j]) in chart[j] do ENQUEUE((A B , [i,k]), chart[k])

end

procedure ENQUEUE(state, chart-entry)if state is not already in chart-entry then PUSH(state, chart-entry)

end

Earley-Algorithm (continued)

Page 39: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

39

2007/08 Christel Kemke

Earley-Algorithm (copy from 2nd edition)

Earley – Algorithm

main

Page 40: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

40

2007/08 Christel Kemke

Earley-Algorithm (continued)

Earley – Algorithm

processes

Page 41: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

41

2007/08 Christel Kemke

Earley – Algorithm

complete

Page 42: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

42

2007/08 Christel Kemke

Chart-Parser Algorithm

(just FYI)

Page 43: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

Christel Kemke

43

2007/08

Earley Algorithm - Figures

Jurafsky & Martin, 2nd ed., Ch. 13 Figures 13.16, 13.13, 13.14

Page 44: Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

2007/08 Christel Kemke

44

Additional References

Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000. (Chapters 9 and 10)

Earley Algorithm Jurafsky & Martin, Figure 10.16, p.384

Earley Algorithm - ExamplesJurafsky & Martin, Figures 10.17 and 10.18