Chapter 23(continued) Natural Language for...

26
Chapter 23(continued) Natural Language for Communication

Transcript of Chapter 23(continued) Natural Language for...

Page 1: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

Chapter 23(continued)

Natural Language for Communication

Page 2: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

2

Phrase Structure Grammars• Probabilistic context-free grammar (PCFG):

– Context free: the left-hand side of the grammar consists of a single nonterminal symbol

– Probabilistic: the grammar assigns a probability to every string– Lexicon: list of allowable words– Grammar: a collection of rules that defines a language as a set of

allowable string of words– Example: Fish people fish tanks Backus–Naur Form (BNF)

Page 3: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

3

Phrase Structure Grammars (continued)• Probabilistic context-free grammar (PCFG):

– Context free: the left-hand side of the grammar consists of a single nonterminal symbol

– Probabilistic: the grammar assigns a probability to every string– Lexicon: list of allowable words– Grammar: a collection of rules that defines a language as a set of

allowable string of words– Example: Fish people fish tanks PCFG

Page 4: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

4

Phrase Structure Grammars (continued)• Example: Fish people fish tanks

GrammarLexicon

0.2 0.5 0.6 0.2

0.1 0.7

0.5

0.9

Probability = 0.2 x 0.5 x 0.6 x 0.2 x 0.1 x 0.7 x 0.5 x 0.9

Page 5: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

5

Parsing• Objective: analyzing a string of words to uncover its

phrase structure, given the lexicon and grammar.– The result of parsing is a parse tree

• Top-down parse and bottom-up parse– Naïve solutions: left-to-right or right-to-left parse– Example: The wumpus is dead

Page 6: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

6

Parsing (continued)• Objective: analyzing a string of words to uncover its

phrase structure, given the lexicon and grammar.– The result of parsing is a parse tree

• Naïve solutions:– Top-down parse and bottom-up parse– Example: The wumpus is dead

– Efficient? – Example:

Have the students in section 2 of Computer Science 101 take the exam.Have the students in section 2 of Computer Science 101 taken the exam?

Page 7: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

7

Parsing (continued)• Efficient solutions: chart parsers

– Using dynamic programming

• CYK algorithm– A bottom-up chart parser:

(Named after its inventors, John Cocke, Daniel Younger, and Tadeo Kasami)

– Input: lexicon, grammar and query strings.– Output: a parse tree– Three major steps:

• Assign lexicons• Compute probability of adjacent phrases• Solve grammar conflict by selecting the most probable phrases

Page 8: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

8

Parsing (continued)• CYK algorithm

– Three major steps:• Assign lexicons• Compute probability of adjacent phrases• Solve grammar conflict by selecting the most probable phrases

Assign lexicons

Compute probability of adjacent phrases

Solve grammar conflict

Page 9: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

9

Parsing (continued)• Example: Fish people fish tanks

GrammarLexicon

Page 10: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

10

Parsing (continued)• Example: by Dr. Christopher Manning from Stanford

Page 11: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

11

Augmented Parsing Methods• Lexicalized PCFGs

– BNF notation for grammars too restrictive– Augmented grammar

• adding logical inference • to construct sentence semantics

Page 12: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

12

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora – Indexicality– Vagueness – Discourse structure – Metonymy – Metaphor – Noncompositionality

Page 13: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

13

Real language• Real human languages provide many problems for NLP

– Ambiguity: can be lexical (polysemy), syntactic, semantic, referential

I ate spaghetti with meatballs

Page 14: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

14

Real language• Real human languages provide many problems for NLP

– Ambiguity: can be lexical (polysemy), syntactic, semantic, referential

I ate spaghetti with meatballssalad

Page 15: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

15

Real language• Real human languages provide many problems for NLP

– Ambiguity: can be lexical (polysemy), syntactic, semantic, referential

I ate spaghetti with meatballssaladabandon

Page 16: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

16

Real language• Real human languages provide many problems for NLP

– Ambiguity: can be lexical (polysemy), syntactic, semantic, referential

I ate spaghetti with meatballssaladabandona fork

Page 17: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

17

Real language• Real human languages provide many problems for NLP

– Ambiguity: can be lexical (polysemy), syntactic, semantic, referential

I ate spaghetti with meatballssaladabandona forka friend

Page 18: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

18

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora: using pronouns to refer back to entities already

introduced in the text

After Mary proposed to John, they found a preacher and got married.

Page 19: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

19

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora: using pronouns to refer back to entities already

introduced in the text

After Mary proposed to John, they found a preacher and got married.For the honeymoon, they went to Hawaii

Page 20: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

20

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora: using pronouns to refer back to entities already

introduced in the text

After Mary proposed to John, they found a preacher and got married.For the honeymoon, they went to HawaiiMary saw a ring through the window and asked John for it

Page 21: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

21

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora: using pronouns to refer back to entities already

introduced in the text

After Mary proposed to John, they found a preacher and got married.For the honeymoon, they went to HawaiiMary saw a ring through the window and asked John for itMary threw a rock at the window and broke it

Page 22: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

22

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora – Indexicality: indexical sentences refer to utterance situation (place,

time, S/H, etc.)

I am over hereWhy did you do that?

Page 23: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

23

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora – Indexicality– Vagueness – Discourse structure – Metonymy: using one noun phrase to stand for another

I've read ShakespeareChrysler announced record profitsThe ham sandwich on Table 4 wants another beer

Page 24: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

24

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora – Indexicality– Vagueness – Discourse structure – Metonymy – Metaphor: “Non-literal” usage of words and phrases

I've tried killing the process but it won't die. Its parent keeps it alive

Page 25: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

25

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora – Indexicality– Vagueness – Discourse structure – Metonymy – Metaphor – Noncompositionality

basketball shoes red bookbaby shoes red penalligator shoes red hairdesigner shoes red herringbrake shoes

Page 26: Chapter 23(continued) Natural Language for Communicationweb.eecs.utk.edu/~leparker/Courses/CS594-fall13/Lectures/20-Slide… · 0.2 0.5 0.6 0.2 0.1 0.7 0.5 0.9 Probability = 0.2 x

26

Real language• Real human languages provide many problems for NLP

– Ambiguity – Anaphora – Indexicality– Vagueness – Discourse structure – Metonymy – Metaphor – Noncompositionality

• Interpreting natural language using computer agents is challenging and still an open problem (but we are doing better)