NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

35
NLP

description

(U)nderstanding and (G)eneration Language Computer (U) Language (G)

Transcript of NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Page 1: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

NLP

Page 2: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Introduction to NLP

Text Generation

Page 3: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Basic NLP Pipeline

• (U)nderstanding and (G)eneration

Language Computer(U) Language(G)

Page 4: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Definition

• Natural language generation is the process of deliberately constructing a natural language text in order to meet specified communicative goals.

[McDonald 1992]

Page 5: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

What is NLG?

• Mapping meaning to text• Stages:

– Content selection– Lexical choice– Sentence structure: aggregation, referring expressions– Discourse structure

Page 6: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Example of an NLG System

• FOG (Goldberg et al. 1994)• Weather forecast reports

for the Canadian Weather Service

• Input– Numerical simulation data

annotated by humans

Page 7: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Plandoc

• Function:– Produces a report describing the simulation options that

an engineer has explored• Input

– A simulation log file• Developer

– Bellcore and Columbia University

Page 8: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Input for Plandoc

• RUNID fiberall FIBER 6/19/93 act yes• FA 1301 2 1995• FA 1201 2 1995• FA 1401 2 1995• FA 1501 2 1995• ANF co 1103 2 1995 48• ANF 1201 1301 2 1995 24• ANF 1401 1501 2 1995 24• END. 856.0 670.2

Page 9: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Output

• This saved fiber refinement includes all DLC changes in Run-ID ALLDLC. RUN-ID FIBERALL demanded that PLAN activate fiber for CSAs 1201, 1301, 1401 and 1501 in 1995 Q2. It requested the placement of a 48-fiber cable from the CO to section 1103 and the placement of 24-fiber cables from section 1201 to section 1301 and from section 1401 to section 1501 in the second quarter of 1995. For this refinement, the resulting 20 year route PWE was $856.00K, a $64.11K savings over the BASE plan and the resulting 5 year IFC was $670.20K, a $60.55K savings over the BASE plan.

Page 10: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Considerations

• NLG is about choices– Content– Coherence– Style– Media– Syntax– Aggregation– Referring expressions– Lexical choice

Page 11: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Introduction to NLP

Features and Unification

Page 12: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Need for feature-based grammars• Example

– The dogs bites (agreement)• Example

– many water (count/mass nouns)• Idea

– S NP VP (if the person of the NP is equal to the person of the VP)

Page 13: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unification Grammars

• Types of unification grammars– LFG, HPSG, FUG

• Handle agreement– e.g., number, gender, person

• Unification– Two constituents can be combined only if their features

can unify

Page 14: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Feature unification

CAT NPPERSON 3NUMBER SINGULAR

CAT NPNUMBER SINGULARPERSON 3

CAT NPNUMBER SINGULARPERSON 3

U

Page 15: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Feature unification

CAT NPPERSON 3NUMBER SINGULAR

CAT NPPERSON 1PERSON 3

U

FAILURE

Page 16: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Agreement

• S NP VP{NP PERSON} = {VP PERSON}

• S Aux NP VP{Aux PERSON} = {NP PERSON}

• Verb bites{Verb PERSON} = 3

• Verb bite{Verb PERSON} = 1

Page 17: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Subcategorization

• VP Verb{VP SUBCAT} = {Verb SUBCAT}{VP SUBCAT} = INTRANS

• VP Verb NP{VP SUBCAT} = {Verb SUBCAT}{VP SUBCAT} = TRANS

• VP Verb NP NP{VP SUBCAT} = {Verb SUBCAT}{VP SUBCAT} = DITRANS

Page 18: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Systemic Grammars

• Language is viewed as a resource for expressing meaning in context (Halliday, 1985)

• Layers: mood, transitivity, theme

The system will save the document

Mood subject finite predicator object

Transitivity actor process goal

Theme theme rheme

Page 19: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Example

(:process save-1:actor system-1:goal document-1:speechact assertion:tense future

) Input is underspecified

Page 20: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

The Functional Unification Formalism (FUF)

• Based on Kay’s (83) formalism• Partial information, declarative, uniform,

compact• Same framework used for all stages: syntactic

realization, lexicalization, and text planning

Page 21: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Functional Analysis

• Functional vs. structured analysis• “John eats an apple”• Actor (John), affected (apple), process (eat)• Suitable for generation

Page 22: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Partial vs. Complete Specification

• Voice: An apple is eaten by John• Tense: John ate an apple• Mode: Did John eat an apple? • Modality: John must eat an apple

action = eatactor = Johnobject = apple

Page 23: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unification

• Target sentence• Input FD• Grammar• Unification process• Linearization process

Page 24: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Path notation

• View an FD as a tree• To specify features, use a path

– {feature feature … feature} value– e.g. {prot number}

• Also use relative paths– {^ number} value = the feature number for the current node– {^ ^ number} value = the feature number for the node above

the current node

Page 25: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Sample input

((cat s) (prot ((n ((lex john))))) (verb ((v ((lex like))))) (goal ((n ((lex mary))))))

Page 26: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Sample Grammar((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb)))) ((cat noun)) ((cat verb)) ((cat article)))))

Page 27: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Sample Output((cat s) (goal ((cat np) (n ((cat noun) (lex mary) (number {goal number}))) (pattern (n)) (proper yes))) (pattern (prot verb goal)) (prot ((cat np) (n ((cat noun) (lex john) (number {verb number}))) (number {verb number}) (pattern (n)) (proper yes))) (verb ((cat vp) (pattern (v)) (v ((cat verb) (lex like))))))

Page 28: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unification Example

Page 29: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unify Prot

Page 30: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unify Goal

Page 31: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unify VP

Page 32: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Unify Verb

Page 33: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

Finish

Page 34: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

The SURGE grammar (Elhadad)• Syntactic realization front-end• Variable level of abstraction• 5,600 branches and 1,600 alts

Lexicalchooser

SURGE LinearizerMorphology

Lexicalized FD Syntactic FD

Text

Page 35: NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)

NLP