Fall 2005 Lecture Notes #7

52
Fall 2005 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing

description

EECS 595 / LING 541 / SI 661. Natural Language Processing. Fall 2005 Lecture Notes #7. Natural Language Generation. What is NLG?. Mapping meaning to text Stages: Content selection Lexical selection Sentence structure: aggregation, referring expressions Discourse structure. - PowerPoint PPT Presentation

Transcript of Fall 2005 Lecture Notes #7

Page 1: Fall 2005 Lecture Notes #7

Fall 2005

Lecture Notes #7

EECS 595 / LING 541 / SI 661

Natural Language Processing

Page 2: Fall 2005 Lecture Notes #7

Natural Language Generation

Page 3: Fall 2005 Lecture Notes #7

What is NLG?

• Mapping meaning to text

• Stages:– Content selection– Lexical selection– Sentence structure: aggregation, referring

expressions– Discourse structure

Acrobat Document

Page 4: Fall 2005 Lecture Notes #7

Systemic grammars

• Language is viewed as a resource for expressing meaning in context (Halliday, 1985)

• Layers: mood, transitivity, theme

The system will save the document

Mood subject finite predicator object

Transitivity actor process goal

Theme theme rheme

Acrobat Document

Page 5: Fall 2005 Lecture Notes #7

Example

(:process save-1:actor system-1:goal document-1:speechact assertion:tense future

) Input is underspecified

Page 6: Fall 2005 Lecture Notes #7

The Functional Unification Formalism (FUF)

• Based on Kay’s (83) formalism

• partial information, declarative, uniform, compact

• same framework used for all stages: syntactic realization, lexicalization, and text planning

Page 7: Fall 2005 Lecture Notes #7

Functional analysis

• Functional vs. structured analysis

• “John eats an apple”

• actor (John), affected (apple), process (eat)

• NP VP NP

• suitable for generation

Page 8: Fall 2005 Lecture Notes #7

Partial vs. complete specification

• Voice: An apple is eaten by John

• Tense: John ate an apple

• Mode: Did John ear an apple?

• Modality: John must eat an apple

• prolog: p(X,b,c)

action = eat

actor = John

object = apple

Page 9: Fall 2005 Lecture Notes #7

Unification

• Target sentence

• input FD

• grammar

• unification process

• linearization process

Page 10: Fall 2005 Lecture Notes #7

Sample input

((cat s) (prot ((n ((lex john))))) (verb ((v ((lex like))))) (goal ((n ((lex mary))))))

Page 11: Fall 2005 Lecture Notes #7

Sample grammar((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb)))) ((cat noun)) ((cat verb)) ((cat article)))))

Page 12: Fall 2005 Lecture Notes #7

Sample output((cat s) (goal ((cat np) (n ((cat noun) (lex mary) (number {goal number}))) (pattern (n)) (proper yes))) (pattern (prot verb goal)) (prot ((cat np) (n ((cat noun) (lex john) (number {verb number}))) (number {verb number}) (pattern (n)) (proper yes))) (verb ((cat vp) (pattern (v)) (v ((cat verb) (lex like))))))

Page 13: Fall 2005 Lecture Notes #7

Comparison with Prolog

• Similarities:– both have unification at the core

– Prolog program = FUF grammar

– Prolog query = FUF input

• Differences:– Prolog: first order term unification

– FUF: arbitrarily rooted directed graphs are unified

Page 14: Fall 2005 Lecture Notes #7

The SURGE grammar• Syntactic realization front-end

• variable level of abstraction

• 5600 branches and 1600 alts

Lexicalchooser

SURGELinearizerMorphology

Lexicalized FD Syntactic FD

Text

Page 15: Fall 2005 Lecture Notes #7

Systems developed using FUF/SURGE

• COMET

• MAGIC

• ZEDDOC

• PLANDOC

• FLOWDOC

• SUMMONS

Page 16: Fall 2005 Lecture Notes #7

CFUF

• Fast implementation by Mark Kharitonov (C++)

• Up to 100 times faster than Lisp/FUF

• Speedup higher for larger inputs

Page 17: Fall 2005 Lecture Notes #7

References

• Cole, Mariani, Uszkoreit, Zaenen, Zue (eds.) Survey of the State of the Art in Human Language Technology, 1995

• Elhadad, Using Argumentation to Control Lexical Choice: A Functional Unification Implementation, 1993

• Elhadad, FUF: the Universal Unifier, User Manual, 1993

• Elhadad and Robin, SURGE: a Comprehensive Plug-in Syntactic Realization Component for Text Generation, 1999

• Kharitonov, CFUF: A Fast Interpreter for the Functional Unification Formalism, 1999

• Radev, Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources, Department of Computer Science, Columbia University, October 1998

Page 18: Fall 2005 Lecture Notes #7

Path notation

• You can view a FD as a tree• To specify features, you can use a path

– {feature feature … feature} value

– e.g. {prot number}

• You can also use relative paths– {^ number} value => the feature number for the current

node

– {^ ^ number} value => the feature number for the node above the current node

Page 19: Fall 2005 Lecture Notes #7

Sample grammar((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb)))) ((cat noun)) ((cat verb)) ((cat article)))))

Page 20: Fall 2005 Lecture Notes #7

Unification Example

Page 21: Fall 2005 Lecture Notes #7

Unify Prot

Page 22: Fall 2005 Lecture Notes #7

Unify Goal

Page 23: Fall 2005 Lecture Notes #7

Unify vp

Page 24: Fall 2005 Lecture Notes #7

Unify verb

Page 25: Fall 2005 Lecture Notes #7

Finish

Page 26: Fall 2005 Lecture Notes #7

Discourse Analysis

Page 27: Fall 2005 Lecture Notes #7

The problem

• Discourse• Monologue and Dialogue (dialog)• Human-computer interaction• Example: John went to Bill’s car dealership to

check out an Acura Integra. He looked at it for about half an hour.

• Example: I’d like to get from Boston to San Francisco, on either December 5th or December 6th. It’s okay if it stops in another city along the way.

Page 28: Fall 2005 Lecture Notes #7

Information extraction and discourse analysis

• Example: First Union Corp. is continuing to wrestle with severe problems unleashed by a botched merger and a troubled business strategy. According to industry insiders at Paine Webber, their president, John R. Georgius, is planning to retire by the end of the year.

• Problems with summarization and generation

Page 29: Fall 2005 Lecture Notes #7

Reference resolution

• The process of reference (associating “John” with “he”).

• Referring expressions and referents.

• Needed: discourse models

• Problem: many types of reference!

Page 30: Fall 2005 Lecture Notes #7

Example (from Webber 91)

• According to John, Bob bought Sue an Integra, and Sue bough Fred a legend.

• But that turned out to be a lie. - referent is a speech act.

• But that was false. - proposition• That struck me as a funny way to describe the

situation. - manner of description• That caused Sue to become rather poor. - event• That caused them both to become rather poor. -

combination of several events.

Page 31: Fall 2005 Lecture Notes #7

Reference phenomena

• Indefinite noun phrases: I saw an Acura Integra today.

• Definite noun phrases: The Integra was white.• Pronouns: It was white.• Demonstratives: this Acura.• Inferrables: I almost bought an Acura Integra today,

but a door had a dent and the engine seemed noisy.• Mix the flour, butter, and water. Kneed the dough

until smooth and shiny.

Page 32: Fall 2005 Lecture Notes #7

Constraints on coreference

• Number agreement: John has an Acura. It is red.• Person and case agreement: (*) John and Mary have

Acuras. We love them (where We=John and Mary)• Gender agreement: John has an Acura. He/it/she is

attractive.• Syntactic constraints:

– John bought himself a new Acura.– John bought him a new Acura.– John told Bill to buy him a new Acura.– John told Bill to buy himself a new Acura– He told Bill to buy John a new Acura.

Page 33: Fall 2005 Lecture Notes #7

Preferences in pronoun interpretation

• Recency: John has an Integra. Bill has a Legend. Mary likes to drive it.

• Grammatical role: John went to the Acura dealership with Bill. He bought an Integra.

• (?) John and Bill went to the Acura dealership. He bought an Integra.

• Repeated mention: John needed a car to go to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.

Page 34: Fall 2005 Lecture Notes #7

Preferences in pronoun interpretation

• Parallelism: Mary went with Sue to the Acura dealership. Sally went with her to the Mazda dealership.

• ??? Mary went with Sue to the Acura dealership. Sally told her not to buy anything.

• Verb semantics: John telephoned Bill. He lost his pamphlet on Acuras. John criticized Bill. He lost his pamphlet on Acuras.

Page 35: Fall 2005 Lecture Notes #7

An algorithm for pronoun resolution

• Two steps: discourse model update and pronoun resolution.

• Salience values are introduced when a noun phrase that evokes a new entity is encountered.

• Salience factors: set empirically.

Page 36: Fall 2005 Lecture Notes #7

Salience weights in Lappin and Leass

Sentence recency 100

Subject emphasis 80

Existential emphasis 70

Accusative emphasis 50

Indirect object and oblique complement emphasis

40

Non-adverbial emphasis 50

Head noun emphasis 80

Page 37: Fall 2005 Lecture Notes #7

Lappin and Leass (cont’d)

• Recency: weights are cut in half after each sentence is processed.

• Examples:– An Acura Integra is parked in the lot. (subject)– There is an Acura Integra parked in the lot. (existential

predicate nominal)– John parked an Acura Integra in the lot. (object)– John gave Susan an Acura Integra. (indirect object)– In his Acura Integra, John showed Susan his new CD

player. (demarcated adverbial PP)

Page 38: Fall 2005 Lecture Notes #7

Algorithm

1. Collect the potential referents (up to four sentences back).2. Remove potential referents that do not agree in number or

gender with the pronoun.3. Remove potential referents that do not pass intrasentential

syntactic coreference constraints.4. Compute the total salience value of the referent by adding

any applicable values for role parallelism (+35) or cataphora (-175).

5. Select the referent with the highest salience value. In case of a tie, select the closest referent in terms of string position.

Page 39: Fall 2005 Lecture Notes #7

Example

• John saw a beautiful Acura Integra at the dealership last week. He showed it to Bill. He bought it.

Rec Subj Exist ObjIndObj

NonAdv

HeadN Total

John 100 80 50 80 310

Integra 100 50 50 80 280

dealership 100 50 80 230

Page 40: Fall 2005 Lecture Notes #7

Example (cont’d)

Referent Phrases Value

John {John} 155

Integra {a beautiful Acura Integra} 140

dealership {the dealership} 115

Page 41: Fall 2005 Lecture Notes #7

Example (cont’d)

Referent Phrases Value

John {John, he1} 465

Integra {a beautiful Acura Integra} 140

dealership {the dealership} 115

Page 42: Fall 2005 Lecture Notes #7

Example (cont’d)

Referent Phrases Value

John {John, he1} 465

Integra {a beautiful Acura Integra, it} 420

dealership {the dealership} 115

Page 43: Fall 2005 Lecture Notes #7

Example (cont’d)

Referent Phrases Value

John {John, he1} 465

Integra {a beautiful Acura Integra, it} 420

Bill {Bill} 270

dealership {the dealership} 115

Page 44: Fall 2005 Lecture Notes #7

Example (cont’d)

Referent Phrases Value

John {John, he1} 232.5

Integra {a beautiful Acura Integra, it1} 210

Bill {Bill} 135

dealership {the dealership} 57.5

Page 45: Fall 2005 Lecture Notes #7

Observations

• Lappin & Leass - tested on computer manuals - 86% accuracy on unseen data.

• Centering (Grosz, Josh, Weinstein): additional concept of a “center” – at any time in discourse, an entity is centered.

• Backwards looking center; forward looking centers (a set).

• Centering has not been automatically tested on actual data.

Page 46: Fall 2005 Lecture Notes #7

Discourse structure

• (*) Bill went to see his mother. The trunk is what makes the bonsai, it gives it both its grace and power.

• Coherence principle:– John hid Bill’s car keys. He was drunk– ?? John hid Bill’s car keys. He likes spinach

• Rhetorical Structure Theory (Mann, Matthiessen, and Thompson)

Page 47: Fall 2005 Lecture Notes #7

Sample rhetorical relations

Relation Nucleus Satellite

Antithesis ideas favored by the author

ideas disfavored by the author

Background text whose understanding is being facilitated

text for facilitating understanding

Concession situation affirmed by author

situation which is apparently inconsistent but also affirmed by author

Elaboration basic information additional information

Purpose an intended situation the intent behind the situation

Restatement a situation a reexpression of the situation

Summary Text a short summary of that text

Page 48: Fall 2005 Lecture Notes #7

Example (from MMT)1) Title: Bouquets in a basket - with living flowers

2) There is a gardening revolution going on.

3) People are planting flower baskets with living plants,

4) mixing many types in one container for a full summer of floral beauty.

5) To create your own "Victorian" bouquet of flowers,

6) choose varying shapes, sizes and forms, besides a variety of complementary colors.

7) Plants that grow tall should be surrounded by smaller ones and filled with others that tumble over the side of a hanging basket.

8) Leaf textures and colors will also be important.

9) There is the silver-white foliage of dusty miller, the feathery threads of lotus vine floating down from above, the deep greens, or chartreuse, even the widely varied foliage colors of the coleus.

Christian Science Monitor, April, 1983

Page 49: Fall 2005 Lecture Notes #7

Example (cont’d)

Page 50: Fall 2005 Lecture Notes #7

Cross-document structure

Page 51: Fall 2005 Lecture Notes #7

Number Relationship type Level Description

1 Identity Any The same text appears in more than one location

2 Equivalence (paraphrasing) S, D Two text spans have the same information content

3 Translation P, S Same information content in different languages

4 Subsumption S, D One sentence contains more information than another

5 Contradiction S, D Conflicting information 6 Historical background S Information that puts current

information in context 7 Cross-reference P The same entity is mentioned 8 Citation S, D One sentence cites another document 9 Modality S Qualified version of a sentence

10 Attribution S One sentence repeats the information of another while adding an attribution

11 Summary S, D Similar to Summary in RST: one sentence summarizes another

Page 52: Fall 2005 Lecture Notes #7

Number Relationship type Level Description

12 Follow-up S Additional information which reflects facts that have happened since the previous account

13 Elaboration S Additional information that wasn’t included in the last account

14 Indirect speech S Shift from direct to indirect speech or vite-versa

15 Refinement S Additional information that is 16 Agreement S One source expresses agreement with

another 17 Judgement S A qualified account of a fact 18 Fulfilment S A prediction turned true 19 Description S Insertion of a description 20 Reader profile S Style and background-specific change 21 Contrast S Contrasting two accounts of facts 22 Parallel S Comparing two accounts of facts 23 Generalization S Generalization 24 Change of perspective S,D The same source presents a fact in a

different light