NATURAL LANGUAGE GENERATION - Helsinki

363
LECTURE 6 NATURAL LANGUAGE GENERATION Leo Lepp¨ anen HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Department of Computer Science Leo Lepp¨ anen NLP - 2018 - Lecture 6

Transcript of NATURAL LANGUAGE GENERATION - Helsinki

Page 1: NATURAL LANGUAGE GENERATION - Helsinki

LECTURE 6

NATURAL LANGUAGE GENERATION

Leo Leppanen

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 2: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 3: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 4: NATURAL LANGUAGE GENERATION - Helsinki

NATURAL LANGUAGEGENERATION

• From here on out: NLG

• Recall from first lecture: reverse of NLU

• A different kind of complexity

• ‘Language understanding is somewhat like counting fromone to infinity; language generation is like counting frominfinity to one.’ –Wilks, quoted by Dale, Euginio & Scott

• ‘Generation from what?!’ – possibly Longuet-Higgins

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 5: NATURAL LANGUAGE GENERATION - Helsinki

NATURAL LANGUAGEGENERATION

• From here on out: NLG

• Recall from first lecture: reverse of NLU

• A different kind of complexity

• ‘Language understanding is somewhat like counting fromone to infinity; language generation is like counting frominfinity to one.’ –Wilks, quoted by Dale, Euginio & Scott

• ‘Generation from what?!’ – possibly Longuet-Higgins

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 6: NATURAL LANGUAGE GENERATION - Helsinki

NATURAL LANGUAGEGENERATION

• From here on out: NLG

• Recall from first lecture: reverse of NLU

• A different kind of complexity

• ‘Language understanding is somewhat like counting fromone to infinity; language generation is like counting frominfinity to one.’ –Wilks, quoted by Dale, Euginio & Scott

• ‘Generation from what?!’ – possibly Longuet-Higgins

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 7: NATURAL LANGUAGE GENERATION - Helsinki

NATURAL LANGUAGEGENERATION

• From here on out: NLG

• Recall from first lecture: reverse of NLU

• A different kind of complexity• ‘Language understanding is somewhat like counting from

one to infinity; language generation is like counting frominfinity to one.’ –Wilks, quoted by Dale, Euginio & Scott

• ‘Generation from what?!’ – possibly Longuet-Higgins

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 8: NATURAL LANGUAGE GENERATION - Helsinki

NATURAL LANGUAGEGENERATION

• From here on out: NLG

• Recall from first lecture: reverse of NLU

• A different kind of complexity• ‘Language understanding is somewhat like counting from

one to infinity; language generation is like counting frominfinity to one.’ –Wilks, quoted by Dale, Euginio & Scott

• ‘Generation from what?!’ – possibly Longuet-Higgins

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 9: NATURAL LANGUAGE GENERATION - Helsinki

GENERATION FROM WHAT?

• Seemingly trivial but insufficient definition: ‘Systems thatproduce natural language as output’

• Commonly split into three subcategories:

• Text-to-Text Generation• Visual-to-Text Generation• Data-to-Text generation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 10: NATURAL LANGUAGE GENERATION - Helsinki

GENERATION FROM WHAT?

• Seemingly trivial but insufficient definition: ‘Systems thatproduce natural language as output’

• Commonly split into three subcategories:

• Text-to-Text Generation• Visual-to-Text Generation• Data-to-Text generation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 11: NATURAL LANGUAGE GENERATION - Helsinki

GENERATION FROM WHAT?

• Seemingly trivial but insufficient definition: ‘Systems thatproduce natural language as output’

• Commonly split into three subcategories:• Text-to-Text Generation

• Visual-to-Text Generation• Data-to-Text generation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 12: NATURAL LANGUAGE GENERATION - Helsinki

GENERATION FROM WHAT?

• Seemingly trivial but insufficient definition: ‘Systems thatproduce natural language as output’

• Commonly split into three subcategories:• Text-to-Text Generation• Visual-to-Text Generation

• Data-to-Text generation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 13: NATURAL LANGUAGE GENERATION - Helsinki

GENERATION FROM WHAT?

• Seemingly trivial but insufficient definition: ‘Systems thatproduce natural language as output’

• Commonly split into three subcategories:• Text-to-Text Generation• Visual-to-Text Generation• Data-to-Text generation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 14: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 15: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 16: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 17: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 18: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 19: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 20: NATURAL LANGUAGE GENERATION - Helsinki

TEXT-TO-TEXT NLG

• Machine Translation

• Summarization

• Simplification

• Spellling and grammar correction

• Generation of peer reviews for scientific articles

• Paraphrase generation

• Question generation systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 21: NATURAL LANGUAGE GENERATION - Helsinki

VISUAL-TO-TEXT NLG

• Describe a still image or video in natural language.

• Also known as ‘captioning’.

• NB: Distinct from image/object recognition! Ouput isnot just a classification.

• Alternatively view object recognition as a sub task ofcaptioning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 22: NATURAL LANGUAGE GENERATION - Helsinki

VISUAL-TO-TEXT NLG

• Describe a still image or video in natural language.

• Also known as ‘captioning’.

• NB: Distinct from image/object recognition! Ouput isnot just a classification.

• Alternatively view object recognition as a sub task ofcaptioning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 23: NATURAL LANGUAGE GENERATION - Helsinki

VISUAL-TO-TEXT NLG

• Describe a still image or video in natural language.

• Also known as ‘captioning’.

• NB: Distinct from image/object recognition! Ouput isnot just a classification.

• Alternatively view object recognition as a sub task ofcaptioning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 24: NATURAL LANGUAGE GENERATION - Helsinki

VISUAL-TO-TEXT NLG

• Describe a still image or video in natural language.

• Also known as ‘captioning’.

• NB: Distinct from image/object recognition! Ouput isnot just a classification.• Alternatively view object recognition as a sub task of

captioning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 25: NATURAL LANGUAGE GENERATION - Helsinki

PICTURE-TO-TEXT

COCO 2015 Image Captioning Task

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 26: NATURAL LANGUAGE GENERATION - Helsinki

PICTURE-TO-TEXT

COCO 2015 Image Captioning Task

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 27: NATURAL LANGUAGE GENERATION - Helsinki

VIDEO-TO-TEXT

‘An old man is standing next to a woman in an office. Later,he is walking away from her. Next, an old man is sitting on a

chair.’HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 28: NATURAL LANGUAGE GENERATION - Helsinki

DATA-TO-TEXT

• Go from some non-visual data format to text

• Usually has an implicit ‘Structured’ at start

• Examples

• Automated journalism (sports, finance, elections etc.)• Weather reports• Clinical summaries of patient information

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 29: NATURAL LANGUAGE GENERATION - Helsinki

DATA-TO-TEXT

• Go from some non-visual data format to text

• Usually has an implicit ‘Structured’ at start

• Examples

• Automated journalism (sports, finance, elections etc.)• Weather reports• Clinical summaries of patient information

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 30: NATURAL LANGUAGE GENERATION - Helsinki

DATA-TO-TEXT

• Go from some non-visual data format to text

• Usually has an implicit ‘Structured’ at start

• Examples

• Automated journalism (sports, finance, elections etc.)• Weather reports• Clinical summaries of patient information

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 31: NATURAL LANGUAGE GENERATION - Helsinki

DATA-TO-TEXT

• Go from some non-visual data format to text

• Usually has an implicit ‘Structured’ at start

• Examples• Automated journalism (sports, finance, elections etc.)

• Weather reports• Clinical summaries of patient information

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 32: NATURAL LANGUAGE GENERATION - Helsinki

DATA-TO-TEXT

• Go from some non-visual data format to text

• Usually has an implicit ‘Structured’ at start

• Examples• Automated journalism (sports, finance, elections etc.)• Weather reports

• Clinical summaries of patient information

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 33: NATURAL LANGUAGE GENERATION - Helsinki

DATA-TO-TEXT

• Go from some non-visual data format to text

• Usually has an implicit ‘Structured’ at start

• Examples• Automated journalism (sports, finance, elections etc.)• Weather reports• Clinical summaries of patient information

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 34: NATURAL LANGUAGE GENERATION - Helsinki

NOT STRICT CATEGORIES

• Text-to-Text is often excluded from the definition ofNLG

• Text-to-Text can be seen as NLU (Text-to-Data)followed by Data-to-Text NLG

• Recall the Vauqois pyramid from lecture 1

• Consider: Are emails ‘data’ or ‘text’?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 35: NATURAL LANGUAGE GENERATION - Helsinki

NOT STRICT CATEGORIES

• Text-to-Text is often excluded from the definition ofNLG• Text-to-Text can be seen as NLU (Text-to-Data)

followed by Data-to-Text NLG

• Recall the Vauqois pyramid from lecture 1

• Consider: Are emails ‘data’ or ‘text’?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 36: NATURAL LANGUAGE GENERATION - Helsinki

NOT STRICT CATEGORIES

• Text-to-Text is often excluded from the definition ofNLG• Text-to-Text can be seen as NLU (Text-to-Data)

followed by Data-to-Text NLG• Recall the Vauqois pyramid from lecture 1

• Consider: Are emails ‘data’ or ‘text’?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 37: NATURAL LANGUAGE GENERATION - Helsinki

NOT STRICT CATEGORIES

• Text-to-Text is often excluded from the definition ofNLG• Text-to-Text can be seen as NLU (Text-to-Data)

followed by Data-to-Text NLG• Recall the Vauqois pyramid from lecture 1

• Consider: Are emails ‘data’ or ‘text’?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 38: NATURAL LANGUAGE GENERATION - Helsinki

THE KINDA-STANDARDDEFINITION

NLG is ‘the subfield of artificial intelligence and computationallinguistics that is concerned with the construction of computersystems than can produce understandable texts in Englishor other human languages from some underlyingnon-linguistic representation of information’

• Not completely uncontroversial!

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 39: NATURAL LANGUAGE GENERATION - Helsinki

THE KINDA-STANDARDDEFINITION

NLG is ‘the subfield of artificial intelligence and computationallinguistics that is concerned with the construction of computersystems than can produce understandable texts in Englishor other human languages from some underlyingnon-linguistic representation of information’

• Not completely uncontroversial!

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 40: NATURAL LANGUAGE GENERATION - Helsinki

NLG IN THE REAL WORLD

Discuss with the people around you for 2 minutes

What kinds of NLG systems have you come across? Use thebroader meaning of NLG. Try to come up with examples ofdata-to-text, text-to-text and visual-to-text systems.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 41: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 42: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

• NLG systems come in all kinds of shapes

• Still, all systems must conceptually accomplish the same(conceptual) tasks

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 43: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 44: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 45: NATURAL LANGUAGE GENERATION - Helsinki

CONTENT DETERMINATION

• Selecting what information to include in the text

• Decisions usually extremely domain dependent

• Hard to identify an algorithm that works for both icehockey reporting and restaurant recommendation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 46: NATURAL LANGUAGE GENERATION - Helsinki

CONTENT DETERMINATION

• Selecting what information to include in the text

• Decisions usually extremely domain dependent

• Hard to identify an algorithm that works for both icehockey reporting and restaurant recommendation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 47: NATURAL LANGUAGE GENERATION - Helsinki

CONTENT DETERMINATION

• Selecting what information to include in the text

• Decisions usually extremely domain dependent• Hard to identify an algorithm that works for both ice

hockey reporting and restaurant recommendation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 48: NATURAL LANGUAGE GENERATION - Helsinki

INPUTS

• Decisions based on four factors

• Knowledge source: what the system knows

• Communicative goal: what it’s trying to achieve

• User model: what the user knows and prefers

• Dialogue history: previous interactions and their results

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 49: NATURAL LANGUAGE GENERATION - Helsinki

MESSAGES

• Making decisions only possible if data is transformed intomessages

• A meaningful piece of information: something to eitherinclude in or exclude from the final text

• Expressed in some formal (non-natural) language

• No universal standard format

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 50: NATURAL LANGUAGE GENERATION - Helsinki

MESSAGES

• Making decisions only possible if data is transformed intomessages

• A meaningful piece of information: something to eitherinclude in or exclude from the final text

• Expressed in some formal (non-natural) language

• No universal standard format

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 51: NATURAL LANGUAGE GENERATION - Helsinki

MESSAGES

• Making decisions only possible if data is transformed intomessages

• A meaningful piece of information: something to eitherinclude in or exclude from the final text

• Expressed in some formal (non-natural) language

• No universal standard format

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 52: NATURAL LANGUAGE GENERATION - Helsinki

MESSAGES

• Making decisions only possible if data is transformed intomessages

• A meaningful piece of information: something to eitherinclude in or exclude from the final text

• Expressed in some formal (non-natural) language

• No universal standard format

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 53: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE: KEY-VALUE PAIRS

Meaning Representation

name[The Eagle], eatType[coffee shop], food[French],

priceRange[moderate], customerRating[3/5],

area[riverside], kidsFriendly[yes], near[Burger King]

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 54: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE: KEY-VALUE PAIRS

Meaning Representation

name[The Eagle], eatType[coffee shop], food[French],

priceRange[moderate], customerRating[3/5],

area[riverside], kidsFriendly[yes], near[Burger King]

Possible NL representation

The three star coffee shop, The Eagle, gives families amid-priced dining experience featuring a variety of wines andcheeses. Find The Eagle near Burger King.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 55: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE: SEMANTICGRAPHS

Meaning Representation

(w / want-01

:ARG0 (b / boy)

:ARG1 (b2 / believe-01

:ARG0 (g / girl)

:ARG1 b))

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 56: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE: SEMANTICGRAPHS

Meaning Representation

(w / want-01

:ARG0 (b / boy)

:ARG1 (b2 / believe-01

:ARG0 (g / girl)

:ARG1 b))

Possible NL representation

The boy desires the girl to believe him.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 57: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 58: NATURAL LANGUAGE GENERATION - Helsinki

TEXT STRUCTURINGAKA Document Structuring

• Choosing the order/structure of the information

• Very domain-specific → No real standard method

• Temporal order?• Most important first?• Standard format for domain?

• Potentially very complex: X might beactionable/understandable only with Y .

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 59: NATURAL LANGUAGE GENERATION - Helsinki

TEXT STRUCTURINGAKA Document Structuring

• Choosing the order/structure of the information

• Very domain-specific → No real standard method

• Temporal order?• Most important first?• Standard format for domain?

• Potentially very complex: X might beactionable/understandable only with Y .

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 60: NATURAL LANGUAGE GENERATION - Helsinki

TEXT STRUCTURINGAKA Document Structuring

• Choosing the order/structure of the information

• Very domain-specific → No real standard method• Temporal order?

• Most important first?• Standard format for domain?

• Potentially very complex: X might beactionable/understandable only with Y .

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 61: NATURAL LANGUAGE GENERATION - Helsinki

TEXT STRUCTURINGAKA Document Structuring

• Choosing the order/structure of the information

• Very domain-specific → No real standard method• Temporal order?• Most important first?

• Standard format for domain?

• Potentially very complex: X might beactionable/understandable only with Y .

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 62: NATURAL LANGUAGE GENERATION - Helsinki

TEXT STRUCTURINGAKA Document Structuring

• Choosing the order/structure of the information

• Very domain-specific → No real standard method• Temporal order?• Most important first?• Standard format for domain?

• Potentially very complex: X might beactionable/understandable only with Y .

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 63: NATURAL LANGUAGE GENERATION - Helsinki

TEXT STRUCTURINGAKA Document Structuring

• Choosing the order/structure of the information

• Very domain-specific → No real standard method• Temporal order?• Most important first?• Standard format for domain?

• Potentially very complex: X might beactionable/understandable only with Y .

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 64: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory

• Long list of possible relation types• Relations either paratactic (coordinate) or hypotactic

(subordinate)• Most important parts are nuclei• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 65: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory

• Long list of possible relation types• Relations either paratactic (coordinate) or hypotactic

(subordinate)• Most important parts are nuclei• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 66: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory

• Long list of possible relation types• Relations either paratactic (coordinate) or hypotactic

(subordinate)• Most important parts are nuclei• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 67: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory• Long list of possible relation types

• Relations either paratactic (coordinate) or hypotactic(subordinate)

• Most important parts are nuclei• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 68: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory• Long list of possible relation types• Relations either paratactic (coordinate) or hypotactic

(subordinate)

• Most important parts are nuclei• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 69: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory• Long list of possible relation types• Relations either paratactic (coordinate) or hypotactic

(subordinate)• Most important parts are nuclei

• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 70: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

• Classically, output is a tree describing the informationcontent of the document

• Various types of relations between different text spans(nodes of tree)

• A very common formalism: Rhetorical Structure Theory• Long list of possible relation types• Relations either paratactic (coordinate) or hypotactic

(subordinate)• Most important parts are nuclei• Satellites contain additional information about the nuclei

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 71: NATURAL LANGUAGE GENERATION - Helsinki

RST RELATIONS

Sequence

‘Peel orages and slice crosswise. Arrange in a bowl andsprinkle with rum and coconut.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 72: NATURAL LANGUAGE GENERATION - Helsinki

RST RELATIONS

Sequence

‘Peel orages and slice crosswise. Arrange in a bowl andsprinkle with rum and coconut.’

Contrast

‘Animals heal, but trees compartmentalize.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 73: NATURAL LANGUAGE GENERATION - Helsinki

RST RELATIONS

Sequence

‘Peel orages and slice crosswise. Arrange in a bowl andsprinkle with rum and coconut.’

Contrast

‘Animals heal, but trees compartmentalize.’

Elaboration

‘This is a lecture on NLG. It gives a brief introduction to thesubject and enables further study.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 74: NATURAL LANGUAGE GENERATION - Helsinki

DOCUMENT PLAN

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 75: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 76: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

• Humans remove redundant information

• A complex phenomena, partially domain-dependent

• Significant potential to cause misunderstandings if doneimproperly

• Poorly understood (in NLG) for a long time

• Reape & Mellish 1999: ”Just what is aggregationanyway.”

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 77: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

• Humans remove redundant information

• A complex phenomena, partially domain-dependent

• Significant potential to cause misunderstandings if doneimproperly

• Poorly understood (in NLG) for a long time

• Reape & Mellish 1999: ”Just what is aggregationanyway.”

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 78: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

• Humans remove redundant information

• A complex phenomena, partially domain-dependent

• Significant potential to cause misunderstandings if doneimproperly

• Poorly understood (in NLG) for a long time

• Reape & Mellish 1999: ”Just what is aggregationanyway.”

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 79: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

• Humans remove redundant information

• A complex phenomena, partially domain-dependent

• Significant potential to cause misunderstandings if doneimproperly

• Poorly understood (in NLG) for a long time

• Reape & Mellish 1999: ”Just what is aggregationanyway.”

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 80: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

• Humans remove redundant information

• A complex phenomena, partially domain-dependent

• Significant potential to cause misunderstandings if doneimproperly

• Poorly understood (in NLG) for a long time• Reape & Mellish 1999: ”Just what is aggregation

anyway.”

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 81: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

Original

‘I bought a carton of milk. I bought coffee. I bought somebread. I bought a bit of cheese.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 82: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

Original

‘I bought a carton of milk. I bought coffee. I bought somebread. I bought a bit of cheese.’

Aggregation 1

‘I bought a carton of milk, coffee, some bread and a bit ofcheese’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 83: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATION

Original

‘I bought a carton of milk. I bought coffee. I bought somebread. I bought a bit of cheese.’

Aggregation 1

‘I bought a carton of milk, coffee, some bread and a bit ofcheese’

Aggregation 2

‘I bought breakfast items’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 84: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation

• Conceptual aggregation

• {peacock(x), hummingbird(y)}→ bird({x, y})

• Semantic aggregation

• ‘Harry is Jane’s brother. Jane is Harry’s sister’→ ‘Harry and Jane are brother and sister’

• Syntactic Aggregation

• ‘Harry is here. Jack is here.’→ ‘Harry and Jack are here.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 85: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation

• Conceptual aggregation• {peacock(x), hummingbird(y)}→ bird({x, y})

• Semantic aggregation

• ‘Harry is Jane’s brother. Jane is Harry’s sister’→ ‘Harry and Jane are brother and sister’

• Syntactic Aggregation

• ‘Harry is here. Jack is here.’→ ‘Harry and Jack are here.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 86: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation

• Conceptual aggregation• {peacock(x), hummingbird(y)}→ bird({x, y})

• Semantic aggregation

• ‘Harry is Jane’s brother. Jane is Harry’s sister’→ ‘Harry and Jane are brother and sister’

• Syntactic Aggregation

• ‘Harry is here. Jack is here.’→ ‘Harry and Jack are here.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 87: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation

• Conceptual aggregation• {peacock(x), hummingbird(y)}→ bird({x, y})

• Semantic aggregation• ‘Harry is Jane’s brother. Jane is Harry’s sister’→ ‘Harry and Jane are brother and sister’

• Syntactic Aggregation

• ‘Harry is here. Jack is here.’→ ‘Harry and Jack are here.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 88: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation

• Conceptual aggregation• {peacock(x), hummingbird(y)}→ bird({x, y})

• Semantic aggregation• ‘Harry is Jane’s brother. Jane is Harry’s sister’→ ‘Harry and Jane are brother and sister’

• Syntactic Aggregation

• ‘Harry is here. Jack is here.’→ ‘Harry and Jack are here.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 89: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation

• Conceptual aggregation• {peacock(x), hummingbird(y)}→ bird({x, y})

• Semantic aggregation• ‘Harry is Jane’s brother. Jane is Harry’s sister’→ ‘Harry and Jane are brother and sister’

• Syntactic Aggregation• ‘Harry is here. Jack is here.’→ ‘Harry and Jack are here.’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 90: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation

• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation

• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)

• Reducing overall rhetorical complexity by increasing it ina single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 91: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation

• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)

• Reducing overall rhetorical complexity by increasing it ina single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 92: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation

• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)

• Reducing overall rhetorical complexity by increasing it ina single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 93: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation

• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)

• Reducing overall rhetorical complexity by increasing it ina single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 94: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)

• Reducing overall rhetorical complexity by increasing it ina single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 95: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)

• Reducing overall rhetorical complexity by increasing it ina single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 96: NATURAL LANGUAGE GENERATION - Helsinki

SENTENCE AGGREGATIONTypes of aggregation (cont.)

• Lexical aggregation• ‘Open Monday, Tuesday, ... Friday’→ ‘Open weekdays’

• ‘more quick’ → ‘quicker’

• Referential aggregation• ‘Harry and Jack are here.’→ ‘They are here’

• Discource Aggregation (skipped here)• Reducing overall rhetorical complexity by increasing it in

a single place

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 97: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 98: NATURAL LANGUAGE GENERATION - Helsinki

LEXICALISATION

• Lexicalization is about finding the right words and phrasesto express information

• For the abstract action of ‘making a goal in football ’,what is a suitable verb?

• ‘make’ – neutral, boring• ‘score’ – not for own goal• ‘slam’ – not always applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 99: NATURAL LANGUAGE GENERATION - Helsinki

LEXICALISATION

• Lexicalization is about finding the right words and phrasesto express information

• For the abstract action of ‘making a goal in football ’,what is a suitable verb?

• ‘make’ – neutral, boring• ‘score’ – not for own goal• ‘slam’ – not always applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 100: NATURAL LANGUAGE GENERATION - Helsinki

LEXICALISATION

• Lexicalization is about finding the right words and phrasesto express information

• For the abstract action of ‘making a goal in football ’,what is a suitable verb?• ‘make’ – neutral, boring

• ‘score’ – not for own goal• ‘slam’ – not always applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 101: NATURAL LANGUAGE GENERATION - Helsinki

LEXICALISATION

• Lexicalization is about finding the right words and phrasesto express information

• For the abstract action of ‘making a goal in football ’,what is a suitable verb?• ‘make’ – neutral, boring• ‘score’ – not for own goal

• ‘slam’ – not always applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 102: NATURAL LANGUAGE GENERATION - Helsinki

LEXICALISATION

• Lexicalization is about finding the right words and phrasesto express information

• For the abstract action of ‘making a goal in football ’,what is a suitable verb?• ‘make’ – neutral, boring• ‘score’ – not for own goal• ‘slam’ – not always applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 103: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation

• The property ‘tall’ is in relation to other objects• A tall baby is shorter than a short adult

• Labels and terms are fuzzy

• Is the timestamp ‘00:00’ late evening, midnight orevening?

• When does ‘late night’ turn into ‘early morning’?• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 104: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation• The property ‘tall’ is in relation to other objects

• A tall baby is shorter than a short adult

• Labels and terms are fuzzy

• Is the timestamp ‘00:00’ late evening, midnight orevening?

• When does ‘late night’ turn into ‘early morning’?• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 105: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation• The property ‘tall’ is in relation to other objects• A tall baby is shorter than a short adult

• Labels and terms are fuzzy

• Is the timestamp ‘00:00’ late evening, midnight orevening?

• When does ‘late night’ turn into ‘early morning’?• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 106: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation• The property ‘tall’ is in relation to other objects• A tall baby is shorter than a short adult

• Labels and terms are fuzzy

• Is the timestamp ‘00:00’ late evening, midnight orevening?

• When does ‘late night’ turn into ‘early morning’?• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 107: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation• The property ‘tall’ is in relation to other objects• A tall baby is shorter than a short adult

• Labels and terms are fuzzy• Is the timestamp ‘00:00’ late evening, midnight or

evening?

• When does ‘late night’ turn into ‘early morning’?• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 108: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation• The property ‘tall’ is in relation to other objects• A tall baby is shorter than a short adult

• Labels and terms are fuzzy• Is the timestamp ‘00:00’ late evening, midnight or

evening?• When does ‘late night’ turn into ‘early morning’?

• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 109: NATURAL LANGUAGE GENERATION - Helsinki

LANGUAGE IS VAGUE

• Decisions cannot be made in isolation• The property ‘tall’ is in relation to other objects• A tall baby is shorter than a short adult

• Labels and terms are fuzzy• Is the timestamp ‘00:00’ late evening, midnight or

evening?• When does ‘late night’ turn into ‘early morning’?• What are ‘some’, ‘many’, and ‘most’ in percentages?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 110: NATURAL LANGUAGE GENERATION - Helsinki

FUZZY LOGIC

• ‘Fuzzy logic’ deals with these kinds of issues all the time• Some work on combining works from NLG with fuzzy

logic, still somewhat unexplored

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 111: NATURAL LANGUAGE GENERATION - Helsinki

VARIETY IS GOOD– SOMETIMES

• Humans prefer texts to have variation – but no too much

• The suitable level is domain dependent

• Football reports allow for good variety• Maritime weather reports for almost none

• Generating suitably colored/varied language is an openresearch question

• Related topics: metaphors (‘All the world’s a stage’),humor, similes (‘he was as daft as a brush’) etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 112: NATURAL LANGUAGE GENERATION - Helsinki

VARIETY IS GOOD– SOMETIMES

• Humans prefer texts to have variation – but no too much

• The suitable level is domain dependent

• Football reports allow for good variety• Maritime weather reports for almost none

• Generating suitably colored/varied language is an openresearch question

• Related topics: metaphors (‘All the world’s a stage’),humor, similes (‘he was as daft as a brush’) etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 113: NATURAL LANGUAGE GENERATION - Helsinki

VARIETY IS GOOD– SOMETIMES

• Humans prefer texts to have variation – but no too much

• The suitable level is domain dependent• Football reports allow for good variety

• Maritime weather reports for almost none

• Generating suitably colored/varied language is an openresearch question

• Related topics: metaphors (‘All the world’s a stage’),humor, similes (‘he was as daft as a brush’) etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 114: NATURAL LANGUAGE GENERATION - Helsinki

VARIETY IS GOOD– SOMETIMES

• Humans prefer texts to have variation – but no too much

• The suitable level is domain dependent• Football reports allow for good variety• Maritime weather reports for almost none

• Generating suitably colored/varied language is an openresearch question

• Related topics: metaphors (‘All the world’s a stage’),humor, similes (‘he was as daft as a brush’) etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 115: NATURAL LANGUAGE GENERATION - Helsinki

VARIETY IS GOOD– SOMETIMES

• Humans prefer texts to have variation – but no too much

• The suitable level is domain dependent• Football reports allow for good variety• Maritime weather reports for almost none

• Generating suitably colored/varied language is an openresearch question

• Related topics: metaphors (‘All the world’s a stage’),humor, similes (‘he was as daft as a brush’) etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 116: NATURAL LANGUAGE GENERATION - Helsinki

VARIETY IS GOOD– SOMETIMES

• Humans prefer texts to have variation – but no too much

• The suitable level is domain dependent• Football reports allow for good variety• Maritime weather reports for almost none

• Generating suitably colored/varied language is an openresearch question

• Related topics: metaphors (‘All the world’s a stage’),humor, similes (‘he was as daft as a brush’) etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 117: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 118: NATURAL LANGUAGE GENERATION - Helsinki

REFERRING EXPRESSIONGENERATION

• The task of selecting how to refer to domain entities

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 119: NATURAL LANGUAGE GENERATION - Helsinki

REFERRING EXPRESSIONGENERATION

• The task of selecting how to refer to domain entities

The many names of Winston

Sir Winston Leonard Spencer-ChurchillWinston ChurchillChurchillThe Prime MinisterHe/him...

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 120: NATURAL LANGUAGE GENERATION - Helsinki

TWO FACTORS

1. Referential form: Has this entity been referencedbefore? Can we use a pronoun or some similar shortcut?

2. Referential content: Do we need to distinguish it fromdistractors?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 121: NATURAL LANGUAGE GENERATION - Helsinki

TWO FACTORS

1. Referential form: Has this entity been referencedbefore? Can we use a pronoun or some similar shortcut?

2. Referential content: Do we need to distinguish it fromdistractors?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 122: NATURAL LANGUAGE GENERATION - Helsinki

DISTRACTORS ANDPROPERTIES

• Distinguishing an entity from distractors is done bymentioning properties that isolate it from the distractors

• Multiple ‘correct’ solutions, some are better than other

• What makes a solution ‘better’ is complex

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 123: NATURAL LANGUAGE GENERATION - Helsinki

DISTRACTORS ANDPROPERTIES

• Distinguishing an entity from distractors is done bymentioning properties that isolate it from the distractors

• Multiple ‘correct’ solutions, some are better than other

• What makes a solution ‘better’ is complex

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 124: NATURAL LANGUAGE GENERATION - Helsinki

DISTRACTORS ANDPROPERTIES

• Distinguishing an entity from distractors is done bymentioning properties that isolate it from the distractors

• Multiple ‘correct’ solutions, some are better than other

• What makes a solution ‘better’ is complex

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 125: NATURAL LANGUAGE GENERATION - Helsinki

TRYING IT OUT IN PRESEMO

Describe the object pointed at by the arrow

From GRE3D7-1.0 by Jette Viethen and Robert Dale

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 126: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE STRATEGIES

Multiple ways to go about this:

1. Find smallest set of properties that uniquely describes theitem

2. Greedily add properties, always selecting one that rulesout most distractors

3. Select properties from a domain-specific order

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 127: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE STRATEGIES

Multiple ways to go about this:

1. Find smallest set of properties that uniquely describes theitem

2. Greedily add properties, always selecting one that rulesout most distractors

3. Select properties from a domain-specific order

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 128: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE STRATEGIES

Multiple ways to go about this:

1. Find smallest set of properties that uniquely describes theitem

2. Greedily add properties, always selecting one that rulesout most distractors

3. Select properties from a domain-specific order

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 129: NATURAL LANGUAGE GENERATION - Helsinki

NLG SUBTASKS

1. Content Determination

2. Text Structuring

3. Sentence Aggregation

4. Lexicalisation

5. Referring Expression Generation

6. Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 130: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language

• Ordering of constituents• Morphological realisation

- Conjugation- Agreement between words- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 131: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language• Ordering of constituents

• Morphological realisation

- Conjugation- Agreement between words- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 132: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language• Ordering of constituents• Morphological realisation

- Conjugation- Agreement between words- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 133: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language• Ordering of constituents• Morphological realisation

- Conjugation

- Agreement between words- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 134: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language• Ordering of constituents• Morphological realisation

- Conjugation- Agreement between words

- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 135: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language• Ordering of constituents• Morphological realisation

- Conjugation- Agreement between words- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 136: NATURAL LANGUAGE GENERATION - Helsinki

LINGUISTIC REALISATION

• Final actions to make text natural language• Ordering of constituents• Morphological realisation

- Conjugation- Agreement between words- Insertion of auxiliary words (e.g. prepositions)

• A few ways to go about achieving this (later)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 137: NATURAL LANGUAGE GENERATION - Helsinki

CONSTITUENT ORDERINGExample: Adjectives

• Languages have ‘default orders’ for adjectives

• Order can be different based on domain or emphasis

Vote in Presemo: Which is most natural/neutral?

A: It was made from a strange, green, metallic, materialB: It was made from a metallic, strange, green, materialC: It was made from a green, metallic, strange, material

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 138: NATURAL LANGUAGE GENERATION - Helsinki

CONSTITUENT ORDERINGExample: Adjectives

• Languages have ‘default orders’ for adjectives

• Order can be different based on domain or emphasis

Vote in Presemo: Which is most natural/neutral?

A: It was made from a strange, green, metallic, materialB: It was made from a metallic, strange, green, materialC: It was made from a green, metallic, strange, material

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 139: NATURAL LANGUAGE GENERATION - Helsinki

MORPHOLOGICALREALIZATION

• ‘Making sure the words are in correct forms’

• Different languages present different difficulties

• Eng: *‘she go’ → ‘she goes’• Fr: ‘Je suis’ (I am) vs. ‘elle est’ (she is)• Fi: ‘minun taloni’ (my house) vs. ‘sinun talosi’ (your

house)

- ‘The word-forms of the Finnish noun kauppa ’shop’(N=2,253)’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 140: NATURAL LANGUAGE GENERATION - Helsinki

MORPHOLOGICALREALIZATION

• ‘Making sure the words are in correct forms’

• Different languages present different difficulties

• Eng: *‘she go’ → ‘she goes’• Fr: ‘Je suis’ (I am) vs. ‘elle est’ (she is)• Fi: ‘minun taloni’ (my house) vs. ‘sinun talosi’ (your

house)

- ‘The word-forms of the Finnish noun kauppa ’shop’(N=2,253)’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 141: NATURAL LANGUAGE GENERATION - Helsinki

MORPHOLOGICALREALIZATION

• ‘Making sure the words are in correct forms’

• Different languages present different difficulties• Eng: *‘she go’ → ‘she goes’

• Fr: ‘Je suis’ (I am) vs. ‘elle est’ (she is)• Fi: ‘minun taloni’ (my house) vs. ‘sinun talosi’ (your

house)

- ‘The word-forms of the Finnish noun kauppa ’shop’(N=2,253)’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 142: NATURAL LANGUAGE GENERATION - Helsinki

MORPHOLOGICALREALIZATION

• ‘Making sure the words are in correct forms’

• Different languages present different difficulties• Eng: *‘she go’ → ‘she goes’• Fr: ‘Je suis’ (I am) vs. ‘elle est’ (she is)

• Fi: ‘minun taloni’ (my house) vs. ‘sinun talosi’ (yourhouse)

- ‘The word-forms of the Finnish noun kauppa ’shop’(N=2,253)’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 143: NATURAL LANGUAGE GENERATION - Helsinki

MORPHOLOGICALREALIZATION

• ‘Making sure the words are in correct forms’

• Different languages present different difficulties• Eng: *‘she go’ → ‘she goes’• Fr: ‘Je suis’ (I am) vs. ‘elle est’ (she is)• Fi: ‘minun taloni’ (my house) vs. ‘sinun talosi’ (your

house)

- ‘The word-forms of the Finnish noun kauppa ’shop’(N=2,253)’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 144: NATURAL LANGUAGE GENERATION - Helsinki

MORPHOLOGICALREALIZATION

• ‘Making sure the words are in correct forms’

• Different languages present different difficulties• Eng: *‘she go’ → ‘she goes’• Fr: ‘Je suis’ (I am) vs. ‘elle est’ (she is)• Fi: ‘minun taloni’ (my house) vs. ‘sinun talosi’ (your

house)

- ‘The word-forms of the Finnish noun kauppa ’shop’(N=2,253)’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 145: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 146: NATURAL LANGUAGE GENERATION - Helsinki

APPROACHES TO NLG

• Various claims about ‘standard’ or ‘consensus’ NLGarchitectures

• Most famously Reiter & Dale, 2000

• Three major parts:

1. Deciding what to say (Document planning)2. Deciding how to say it (Microplanning)3. Realizing the languge (Realization)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 147: NATURAL LANGUAGE GENERATION - Helsinki

APPROACHES TO NLG

• Various claims about ‘standard’ or ‘consensus’ NLGarchitectures

• Most famously Reiter & Dale, 2000

• Three major parts:

1. Deciding what to say (Document planning)2. Deciding how to say it (Microplanning)3. Realizing the languge (Realization)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 148: NATURAL LANGUAGE GENERATION - Helsinki

APPROACHES TO NLG

• Various claims about ‘standard’ or ‘consensus’ NLGarchitectures

• Most famously Reiter & Dale, 2000

• Three major parts:

1. Deciding what to say (Document planning)2. Deciding how to say it (Microplanning)3. Realizing the languge (Realization)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 149: NATURAL LANGUAGE GENERATION - Helsinki

APPROACHES TO NLG

• Various claims about ‘standard’ or ‘consensus’ NLGarchitectures

• Most famously Reiter & Dale, 2000

• Three major parts:

1. Deciding what to say (Document planning)

2. Deciding how to say it (Microplanning)3. Realizing the languge (Realization)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 150: NATURAL LANGUAGE GENERATION - Helsinki

APPROACHES TO NLG

• Various claims about ‘standard’ or ‘consensus’ NLGarchitectures

• Most famously Reiter & Dale, 2000

• Three major parts:

1. Deciding what to say (Document planning)2. Deciding how to say it (Microplanning)

3. Realizing the languge (Realization)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 151: NATURAL LANGUAGE GENERATION - Helsinki

APPROACHES TO NLG

• Various claims about ‘standard’ or ‘consensus’ NLGarchitectures

• Most famously Reiter & Dale, 2000

• Three major parts:

1. Deciding what to say (Document planning)2. Deciding how to say it (Microplanning)3. Realizing the languge (Realization)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 152: NATURAL LANGUAGE GENERATION - Helsinki

GROUPING NLG SUBTASKS

• Content Determination}

Document planning• Text Structuring

• Sentence Aggregation

• Lexicalisation

• Referring Expression Generation

Microplanning

• Linguistic Realisation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 153: NATURAL LANGUAGE GENERATION - Helsinki

REALITY IS MORE COMPLEX

• At best dubious how much of a ‘consensus’ thisarchitectures was even when originally presented

• Clearly not a consensus anymore

• The subtask groupings still used as terminology

• Gatt & Krahmer’s survey from 2018: NLG systems canbe classified on two axes: architecture and method

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 154: NATURAL LANGUAGE GENERATION - Helsinki

REALITY IS MORE COMPLEX

• At best dubious how much of a ‘consensus’ thisarchitectures was even when originally presented

• Clearly not a consensus anymore

• The subtask groupings still used as terminology

• Gatt & Krahmer’s survey from 2018: NLG systems canbe classified on two axes: architecture and method

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 155: NATURAL LANGUAGE GENERATION - Helsinki

REALITY IS MORE COMPLEX

• At best dubious how much of a ‘consensus’ thisarchitectures was even when originally presented

• Clearly not a consensus anymore

• The subtask groupings still used as terminology

• Gatt & Krahmer’s survey from 2018: NLG systems canbe classified on two axes: architecture and method

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 156: NATURAL LANGUAGE GENERATION - Helsinki

REALITY IS MORE COMPLEX

• At best dubious how much of a ‘consensus’ thisarchitectures was even when originally presented

• Clearly not a consensus anymore

• The subtask groupings still used as terminology

• Gatt & Krahmer’s survey from 2018: NLG systems canbe classified on two axes: architecture and method

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 157: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT ARCHICTURES

• Whether the NLG process is divided into subtasks

• One end: Architectures that have dedicated componentsfor different NLG subtasks

• Other end: Systems that completely lack division tosubtasks

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 158: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT ARCHICTURES

• Whether the NLG process is divided into subtasks

• One end: Architectures that have dedicated componentsfor different NLG subtasks

• Other end: Systems that completely lack division tosubtasks

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 159: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT ARCHICTURES

• Whether the NLG process is divided into subtasks

• One end: Architectures that have dedicated componentsfor different NLG subtasks

• Other end: Systems that completely lack division tosubtasks

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 160: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT METHODS

• How (sub)task(s) is/are achieved

• Gatt & Krahmer’s terminology:

1. Rule-based methods2. Planning-based methods3. Data-driven methods

• Some argument over whether it makes sense todistinguish between 1 and 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 161: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT METHODS

• How (sub)task(s) is/are achieved

• Gatt & Krahmer’s terminology:

1. Rule-based methods2. Planning-based methods3. Data-driven methods

• Some argument over whether it makes sense todistinguish between 1 and 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 162: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT METHODS

• How (sub)task(s) is/are achieved

• Gatt & Krahmer’s terminology:

1. Rule-based methods

2. Planning-based methods3. Data-driven methods

• Some argument over whether it makes sense todistinguish between 1 and 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 163: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT METHODS

• How (sub)task(s) is/are achieved

• Gatt & Krahmer’s terminology:

1. Rule-based methods2. Planning-based methods

3. Data-driven methods

• Some argument over whether it makes sense todistinguish between 1 and 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 164: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT METHODS

• How (sub)task(s) is/are achieved

• Gatt & Krahmer’s terminology:

1. Rule-based methods2. Planning-based methods3. Data-driven methods

• Some argument over whether it makes sense todistinguish between 1 and 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 165: NATURAL LANGUAGE GENERATION - Helsinki

DIFFERENT METHODS

• How (sub)task(s) is/are achieved

• Gatt & Krahmer’s terminology:

1. Rule-based methods2. Planning-based methods3. Data-driven methods

• Some argument over whether it makes sense todistinguish between 1 and 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 166: NATURAL LANGUAGE GENERATION - Helsinki

RULE-BASED METHODS

• The system consists of a set of rules that govern how theinput is transformed

• Input is fed in, rules are used to transform it

• Once no more rules to apply, the result is the system’sfinal output

• Usually a pipeline of stages: separate sets of rules fordifferent components

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 167: NATURAL LANGUAGE GENERATION - Helsinki

RULE-BASED METHODS

• The system consists of a set of rules that govern how theinput is transformed

• Input is fed in, rules are used to transform it

• Once no more rules to apply, the result is the system’sfinal output

• Usually a pipeline of stages: separate sets of rules fordifferent components

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 168: NATURAL LANGUAGE GENERATION - Helsinki

RULE-BASED METHODS

• The system consists of a set of rules that govern how theinput is transformed

• Input is fed in, rules are used to transform it

• Once no more rules to apply, the result is the system’sfinal output

• Usually a pipeline of stages: separate sets of rules fordifferent components

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 169: NATURAL LANGUAGE GENERATION - Helsinki

RULE-BASED METHODS

• The system consists of a set of rules that govern how theinput is transformed

• Input is fed in, rules are used to transform it

• Once no more rules to apply, the result is the system’sfinal output

• Usually a pipeline of stages: separate sets of rules fordifferent components

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 170: NATURAL LANGUAGE GENERATION - Helsinki

PLANNING-BASED METHODS

• System consists of a state transition system: states andactions that transition between states

• Alongside input, we have a (communicative) goal

• Planner finds the best series of actions (i.e. path throughthe state system) to reach the goal

• Actions along that path transform the input into theoutput

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 171: NATURAL LANGUAGE GENERATION - Helsinki

PLANNING-BASED METHODS

• System consists of a state transition system: states andactions that transition between states

• Alongside input, we have a (communicative) goal

• Planner finds the best series of actions (i.e. path throughthe state system) to reach the goal

• Actions along that path transform the input into theoutput

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 172: NATURAL LANGUAGE GENERATION - Helsinki

PLANNING-BASED METHODS

• System consists of a state transition system: states andactions that transition between states

• Alongside input, we have a (communicative) goal

• Planner finds the best series of actions (i.e. path throughthe state system) to reach the goal

• Actions along that path transform the input into theoutput

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 173: NATURAL LANGUAGE GENERATION - Helsinki

PLANNING-BASED METHODS

• System consists of a state transition system: states andactions that transition between states

• Alongside input, we have a (communicative) goal

• Planner finds the best series of actions (i.e. path throughthe state system) to reach the goal

• Actions along that path transform the input into theoutput

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 174: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN METHODS

• Terminology not too helpful

• ≈ ‘Statistical’ or ‘ML-based’

• Language Models (recall Lecture 3)• Neural Networks (soon)• Extracting rules/templates from corpora (skipped)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 175: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN METHODS

• Terminology not too helpful

• ≈ ‘Statistical’ or ‘ML-based’

• Language Models (recall Lecture 3)• Neural Networks (soon)• Extracting rules/templates from corpora (skipped)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 176: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN METHODS

• Terminology not too helpful

• ≈ ‘Statistical’ or ‘ML-based’• Language Models (recall Lecture 3)

• Neural Networks (soon)• Extracting rules/templates from corpora (skipped)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 177: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN METHODS

• Terminology not too helpful

• ≈ ‘Statistical’ or ‘ML-based’• Language Models (recall Lecture 3)• Neural Networks (soon)

• Extracting rules/templates from corpora (skipped)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 178: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN METHODS

• Terminology not too helpful

• ≈ ‘Statistical’ or ‘ML-based’• Language Models (recall Lecture 3)• Neural Networks (soon)• Extracting rules/templates from corpora (skipped)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 179: NATURAL LANGUAGE GENERATION - Helsinki

REMINDER: SPECTRUMS

• Recall that the previous slides present axes or spectrums

• Systems can share features from both ends of bothspectrums

• The ‘rule-based’ vs ‘planning-based’ distinction is not tooclear cut

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 180: NATURAL LANGUAGE GENERATION - Helsinki

REMINDER: SPECTRUMS

• Recall that the previous slides present axes or spectrums

• Systems can share features from both ends of bothspectrums

• The ‘rule-based’ vs ‘planning-based’ distinction is not tooclear cut

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 181: NATURAL LANGUAGE GENERATION - Helsinki

REMINDER: SPECTRUMS

• Recall that the previous slides present axes or spectrums

• Systems can share features from both ends of bothspectrums

• The ‘rule-based’ vs ‘planning-based’ distinction is not tooclear cut

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 182: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 183: NATURAL LANGUAGE GENERATION - Helsinki

CANNED TEXT

• The most trivial architecture

• System chooses from among canned texts.

• Examples: Error messages, warnings, etc.

• Pro: Simple, can’t go wrong

• Con: No flexibility, doesn’t scale

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 184: NATURAL LANGUAGE GENERATION - Helsinki

CANNED TEXT

• The most trivial architecture

• System chooses from among canned texts.

• Examples: Error messages, warnings, etc.

• Pro: Simple, can’t go wrong

• Con: No flexibility, doesn’t scale

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 185: NATURAL LANGUAGE GENERATION - Helsinki

CANNED TEXT

• The most trivial architecture

• System chooses from among canned texts.

• Examples: Error messages, warnings, etc.

• Pro: Simple, can’t go wrong

• Con: No flexibility, doesn’t scale

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 186: NATURAL LANGUAGE GENERATION - Helsinki

CANNED TEXT

• The most trivial architecture

• System chooses from among canned texts.

• Examples: Error messages, warnings, etc.

• Pro: Simple, can’t go wrong

• Con: No flexibility, doesn’t scale

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 187: NATURAL LANGUAGE GENERATION - Helsinki

CANNED TEXT

• The most trivial architecture

• System chooses from among canned texts.

• Examples: Error messages, warnings, etc.

• Pro: Simple, can’t go wrong

• Con: No flexibility, doesn’t scale

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 188: NATURAL LANGUAGE GENERATION - Helsinki

THE PIPELINEARCHITECTURE

• The platonic ideal of a [rule|planning] -based modulararchitecture

• A series of components, like a unix pipeline

• Use standard components where possible

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 189: NATURAL LANGUAGE GENERATION - Helsinki

THE PIPELINEARCHITECTURE

• The platonic ideal of a [rule|planning] -based modulararchitecture

• A series of components, like a unix pipeline

• Use standard components where possible

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 190: NATURAL LANGUAGE GENERATION - Helsinki

THE PIPELINEARCHITECTURE

• The platonic ideal of a [rule|planning] -based modulararchitecture

• A series of components, like a unix pipeline

• Use standard components where possible

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 191: NATURAL LANGUAGE GENERATION - Helsinki

STANDARD COMPONENTS

• E.g. Morphological realization

1. Take a FSA morphological analyser that goes from aword to analysis

2. Reverse the FSA3. Feed in ‘analysis’, get back the inflected word

• E.g. Referring Expression Generation

• Saw a few methods before

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 192: NATURAL LANGUAGE GENERATION - Helsinki

STANDARD COMPONENTS

• E.g. Morphological realization

1. Take a FSA morphological analyser that goes from aword to analysis

2. Reverse the FSA3. Feed in ‘analysis’, get back the inflected word

• E.g. Referring Expression Generation

• Saw a few methods before

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 193: NATURAL LANGUAGE GENERATION - Helsinki

STANDARD COMPONENTS

• E.g. Morphological realization

1. Take a FSA morphological analyser that goes from aword to analysis

2. Reverse the FSA

3. Feed in ‘analysis’, get back the inflected word

• E.g. Referring Expression Generation

• Saw a few methods before

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 194: NATURAL LANGUAGE GENERATION - Helsinki

STANDARD COMPONENTS

• E.g. Morphological realization

1. Take a FSA morphological analyser that goes from aword to analysis

2. Reverse the FSA3. Feed in ‘analysis’, get back the inflected word

• E.g. Referring Expression Generation

• Saw a few methods before

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 195: NATURAL LANGUAGE GENERATION - Helsinki

STANDARD COMPONENTS

• E.g. Morphological realization

1. Take a FSA morphological analyser that goes from aword to analysis

2. Reverse the FSA3. Feed in ‘analysis’, get back the inflected word

• E.g. Referring Expression Generation

• Saw a few methods before

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 196: NATURAL LANGUAGE GENERATION - Helsinki

STANDARD COMPONENTS

• E.g. Morphological realization

1. Take a FSA morphological analyser that goes from aword to analysis

2. Reverse the FSA3. Feed in ‘analysis’, get back the inflected word

• E.g. Referring Expression Generation• Saw a few methods before

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 197: NATURAL LANGUAGE GENERATION - Helsinki

TEMPLATE-BASEDREALISATION

• In reality, very few systems implement the whole pipeline

• Esp. surface realization is often done (in part) usingtemplates

• The $measurement is expected to reach $value

by $time

→ The mean day-time temperature is expected to reach25 degrees Celcius by end of next week

• Combines (parts of) lexicalization with realization

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 198: NATURAL LANGUAGE GENERATION - Helsinki

TEMPLATE-BASEDREALISATION

• In reality, very few systems implement the whole pipeline

• Esp. surface realization is often done (in part) usingtemplates

• The $measurement is expected to reach $value

by $time

→ The mean day-time temperature is expected to reach25 degrees Celcius by end of next week

• Combines (parts of) lexicalization with realization

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 199: NATURAL LANGUAGE GENERATION - Helsinki

TEMPLATE-BASEDREALISATION

• In reality, very few systems implement the whole pipeline

• Esp. surface realization is often done (in part) usingtemplates• The $measurement is expected to reach $value

by $time

→ The mean day-time temperature is expected to reach25 degrees Celcius by end of next week

• Combines (parts of) lexicalization with realization

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 200: NATURAL LANGUAGE GENERATION - Helsinki

TEMPLATE-BASEDREALISATION

• In reality, very few systems implement the whole pipeline

• Esp. surface realization is often done (in part) usingtemplates• The $measurement is expected to reach $value

by $time

→ The mean day-time temperature is expected to reach25 degrees Celcius by end of next week

• Combines (parts of) lexicalization with realization

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 201: NATURAL LANGUAGE GENERATION - Helsinki

GRAMMAR-BASEDREALISATIONSimpleNLG

/∗ . . . ∗/SPhraseSpec p = n l gFa c t o r y . c r e a t eC l a u s e ( ) ;

p . s e t S u b j e c t (”Mary ” ) ;p . s e tVe rb (” chase ” ) ;p . s e tOb j e c t (” the monkey ” ) ;

p . s e t F e a t u r e ( Fea tu r e .TENSE, Tense .PAST) ;

S t r i n g output = r e a l i s e r . r e a l i s e S e n t e n c e ( p ) ;System . out . p r i n t l n ( output ) ;

>>> Mary chased the monkey

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 202: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusability of components

• Transferability

• Interpretability

• No need for training data

• High level of quaranteed quality

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 203: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusability of components

• Transferability

• Interpretability

• No need for training data

• High level of quaranteed quality

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 204: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusability of components

• Transferability

• Interpretability

• No need for training data

• High level of quaranteed quality

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 205: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusability of components

• Transferability

• Interpretability

• No need for training data

• High level of quaranteed quality

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 206: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusability of components

• Transferability

• Interpretability

• No need for training data

• High level of quaranteed quality

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 207: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time

• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation

• Consider a tweet generator: the limit of the text is aconstraint

• But modules at start cannot know exactly how muchtext their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 208: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation

• Consider a tweet generator: the limit of the text is aconstraint

• But modules at start cannot know exactly how muchtext their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 209: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation

• Consider a tweet generator: the limit of the text is aconstraint

• But modules at start cannot know exactly how muchtext their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 210: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation

• Consider a tweet generator: the limit of the text is aconstraint

• But modules at start cannot know exactly how muchtext their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 211: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation• Consider a tweet generator: the limit of the text is a

constraint

• But modules at start cannot know exactly how muchtext their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 212: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation• Consider a tweet generator: the limit of the text is a

constraint• But modules at start cannot know exactly how much

text their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 213: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• High development time• Generation gap

• What if we end up with a plan that later stages cannotrealize?

• Contrained generation• Consider a tweet generator: the limit of the text is a

constraint• But modules at start cannot know exactly how much

text their plan will produce

• Variety and variability is very difficult/expensive

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 214: NATURAL LANGUAGE GENERATION - Helsinki

NEURAL END-TO-END NLG

• Example of a global, unified, data-driven NLG system

• Input is e.g. a meaning representation

• Output is text

• Highly similar (in abstract) to machine translation→ Seq-2-seq models and RNNs very ‘in’ right now

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 215: NATURAL LANGUAGE GENERATION - Helsinki

NEURAL END-TO-END NLG

• Example of a global, unified, data-driven NLG system• Input is e.g. a meaning representation

• Output is text• Highly similar (in abstract) to machine translation→ Seq-2-seq models and RNNs very ‘in’ right now

Meaning Representation

name[The Eagle], eatType[coffee shop], food[French],

priceRange[moderate], customerRating[3/5],

area[riverside], kidsFriendly[yes], near[Burger King]

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 216: NATURAL LANGUAGE GENERATION - Helsinki

NEURAL END-TO-END NLG

• Example of a global, unified, data-driven NLG system• Input is e.g. a meaning representation• Output is text

• Highly similar (in abstract) to machine translation→ Seq-2-seq models and RNNs very ‘in’ right now

Meaning Representation

name[The Eagle], eatType[coffee shop], food[French],

priceRange[moderate], customerRating[3/5],

area[riverside], kidsFriendly[yes], near[Burger King]

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 217: NATURAL LANGUAGE GENERATION - Helsinki

NEURAL END-TO-END NLG

• Example of a global, unified, data-driven NLG system• Input is e.g. a meaning representation• Output is text• Highly similar (in abstract) to machine translation→ Seq-2-seq models and RNNs very ‘in’ right now

Meaning Representation

name[The Eagle], eatType[coffee shop], food[French],

priceRange[moderate], customerRating[3/5],

area[riverside], kidsFriendly[yes], near[Burger King]

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 218: NATURAL LANGUAGE GENERATION - Helsinki

RECURRENT NNS

From Towards Data Science

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 219: NATURAL LANGUAGE GENERATION - Helsinki

SEQ-2-SEQ MODELS

From Chen, Hongshen, et al. ”A survey on dialogue systems: Recent advances and new frontiers.” ACM SIGKDDExplorations Newsletter 19.2 (2017): 25-35.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 220: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusable network

• Low development time (given data)

• High(er) variety of output

• Neural systems are very much in

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 221: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusable network

• Low development time (given data)

• High(er) variety of output

• Neural systems are very much in

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 222: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusable network

• Low development time (given data)

• High(er) variety of output

• Neural systems are very much in

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 223: NATURAL LANGUAGE GENERATION - Helsinki

PROS

• Reusable network

• Low development time (given data)

• High(er) variety of output

• Neural systems are very much in

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 224: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• Costly in terms of data & processing power

• Interpretability

• Recent works indicating e.g. attention is not a silverbullet in NLP

• Hallucination: Systems overfit into training data, produceungrounded output

• Open question: why is this not a problem for neural MT?

• Tweakability (see XKCD #1838)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 225: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• Costly in terms of data & processing power

• Interpretability

• Recent works indicating e.g. attention is not a silverbullet in NLP

• Hallucination: Systems overfit into training data, produceungrounded output

• Open question: why is this not a problem for neural MT?

• Tweakability (see XKCD #1838)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 226: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• Costly in terms of data & processing power

• Interpretability• Recent works indicating e.g. attention is not a silver

bullet in NLP

• Hallucination: Systems overfit into training data, produceungrounded output

• Open question: why is this not a problem for neural MT?

• Tweakability (see XKCD #1838)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 227: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• Costly in terms of data & processing power

• Interpretability• Recent works indicating e.g. attention is not a silver

bullet in NLP

• Hallucination: Systems overfit into training data, produceungrounded output

• Open question: why is this not a problem for neural MT?

• Tweakability (see XKCD #1838)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 228: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• Costly in terms of data & processing power

• Interpretability• Recent works indicating e.g. attention is not a silver

bullet in NLP

• Hallucination: Systems overfit into training data, produceungrounded output• Open question: why is this not a problem for neural MT?

• Tweakability (see XKCD #1838)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 229: NATURAL LANGUAGE GENERATION - Helsinki

CONS

• Costly in terms of data & processing power

• Interpretability• Recent works indicating e.g. attention is not a silver

bullet in NLP

• Hallucination: Systems overfit into training data, produceungrounded output• Open question: why is this not a problem for neural MT?

• Tweakability (see XKCD #1838)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 230: NATURAL LANGUAGE GENERATION - Helsinki

THE HIDDEN COSTS

Strubell et al., upcoming

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 231: NATURAL LANGUAGE GENERATION - Helsinki

CLASSICAL OR NEURAL?

Discuss

Can you come up with an example of where a neuralend-to-end NLG system is more suitable than a ‘classical’system? Think about the pros and cons of both. How aboutthe reverse?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 232: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Classical systems (1970’s - early 2010’s) modular andrule- or planning based to some degree

• Most systems* combine some components

• Also systems that divide a subtasks further

• Industry systems now: largely the same

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 233: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Classical systems (1970’s - early 2010’s) modular andrule- or planning based to some degree

• Most systems* combine some components

• Also systems that divide a subtasks further

• Industry systems now: largely the same

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 234: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Classical systems (1970’s - early 2010’s) modular andrule- or planning based to some degree

• Most systems* combine some components

• Also systems that divide a subtasks further

• Industry systems now: largely the same

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 235: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Classical systems (1970’s - early 2010’s) modular andrule- or planning based to some degree

• Most systems* combine some components

• Also systems that divide a subtasks further

• Industry systems now: largely the same

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 236: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Academia is somewhat split

• Work on individual modules

• Significant interest in global data-driven methods (‘neuralnetworks are cool’)

• Exploring the limits of ‘classical’ systems

• Potential future: Acknowledge pros and cons of both, findways to combine pros without the cons

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 237: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Academia is somewhat split

• Work on individual modules

• Significant interest in global data-driven methods (‘neuralnetworks are cool’)

• Exploring the limits of ‘classical’ systems

• Potential future: Acknowledge pros and cons of both, findways to combine pros without the cons

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 238: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Academia is somewhat split

• Work on individual modules

• Significant interest in global data-driven methods (‘neuralnetworks are cool’)

• Exploring the limits of ‘classical’ systems

• Potential future: Acknowledge pros and cons of both, findways to combine pros without the cons

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 239: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Academia is somewhat split

• Work on individual modules

• Significant interest in global data-driven methods (‘neuralnetworks are cool’)

• Exploring the limits of ‘classical’ systems

• Potential future: Acknowledge pros and cons of both, findways to combine pros without the cons

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 240: NATURAL LANGUAGE GENERATION - Helsinki

THE REAL WORLD

• Academia is somewhat split

• Work on individual modules

• Significant interest in global data-driven methods (‘neuralnetworks are cool’)

• Exploring the limits of ‘classical’ systems

• Potential future: Acknowledge pros and cons of both, findways to combine pros without the cons

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 241: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 242: NATURAL LANGUAGE GENERATION - Helsinki

NOT A SOLVED PROBLEM

• Problem 1: System input is not standardized→ Hard to compare systems to eachother

• Problem 2: No clear definition of how to measure output‘correctness’→ Hard to say anything concrete about any system

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 243: NATURAL LANGUAGE GENERATION - Helsinki

NOT A SOLVED PROBLEM

• Problem 1: System input is not standardized→ Hard to compare systems to eachother

• Problem 2: No clear definition of how to measure output‘correctness’→ Hard to say anything concrete about any system

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 244: NATURAL LANGUAGE GENERATION - Helsinki

NO STANDARD INPUT

• Large data sets for comparison are few

• Languages dominated by English

• The few common data sets are highly specific

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 245: NATURAL LANGUAGE GENERATION - Helsinki

NO STANDARD INPUT

• Large data sets for comparison are few

• Languages dominated by English

• The few common data sets are highly specific

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 246: NATURAL LANGUAGE GENERATION - Helsinki

NO STANDARD INPUT

• Large data sets for comparison are few

• Languages dominated by English

• The few common data sets are highly specific

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 247: NATURAL LANGUAGE GENERATION - Helsinki

SPECIFIC CONTEXTSExample from 2018 E2E NLG Challenge

Input

name[The Eagle], eatType[coffee shop], food[French],

priceRange[moderate], customerRating[3/5],

area[riverside], kidsFriendly[yes], near[Burger King]

Example output

The three star coffee shop, The Eagle, gives families amid-priced dining experience featuring a variety of wines andcheeses. Find The Eagle near Burger King.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 248: NATURAL LANGUAGE GENERATION - Helsinki

WHICH IS MORE ‘CORRECT’?

Candidate 1

The three star coffee shop, The Eagle, gives families amid-priced dining experience featuring a variety of wines andcheeses. Find The Eagle near Burger King.

Candidate 2

The Eagle, located close to the Riverside Burger King, has amoderately priced French-style coffee shop menu. It’schild-friendly and fairly good.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 249: NATURAL LANGUAGE GENERATION - Helsinki

SIMPLE METRICS FAIL

Example

Reference: ‘The cat jumped on the table’Candidate 1: ‘The tabby jumped unto the table’Candidate 2: ‘The kitten leaped up and landed atop thecounter’

• Most words have synonyms → recall doesn’t work

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 250: NATURAL LANGUAGE GENERATION - Helsinki

SIMPLE METRICS FAIL

Example

Reference: ‘The cat jumped on the table’Candidate: ‘the the the the the the’

• Unigram precision is 1, because all words in C appear in R.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 251: NATURAL LANGUAGE GENERATION - Helsinki

MORE COMPLEX METRICS

• BLEU – BiLingual Evaluation Understudy

• ROUGE – Recall-Oriented Understudy for GistingEvaluation

• METEOR – Metric for Evaluation of Translation withExplicit ORdering

• CIDEr – Consensus-based Image Description Evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 252: NATURAL LANGUAGE GENERATION - Helsinki

MORE COMPLEX METRICS

• BLEU – BiLingual Evaluation Understudy

• ROUGE – Recall-Oriented Understudy for GistingEvaluation

• METEOR – Metric for Evaluation of Translation withExplicit ORdering

• CIDEr – Consensus-based Image Description Evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 253: NATURAL LANGUAGE GENERATION - Helsinki

MORE COMPLEX METRICS

• BLEU – BiLingual Evaluation Understudy

• ROUGE – Recall-Oriented Understudy for GistingEvaluation

• METEOR – Metric for Evaluation of Translation withExplicit ORdering

• CIDEr – Consensus-based Image Description Evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 254: NATURAL LANGUAGE GENERATION - Helsinki

MORE COMPLEX METRICS

• BLEU – BiLingual Evaluation Understudy

• ROUGE – Recall-Oriented Understudy for GistingEvaluation

• METEOR – Metric for Evaluation of Translation withExplicit ORdering

• CIDEr – Consensus-based Image Description Evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 255: NATURAL LANGUAGE GENERATION - Helsinki

BLEUA modified precision score

Example

Ref 1: The cat is on the mat.Ref 2: There is a cat on the mat.Candidate: the the the the the the the

count(n-gram) is the number of times n-gram appears in thecandidate.

count(the) = 7

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 256: NATURAL LANGUAGE GENERATION - Helsinki

BLEUA modified precision score

Example

Ref 1: The cat is on the mat.Ref 2: There is a cat on the mat.Candidate: the the the the the the the

countclip(n-gram) is the number of times an n-gram appears inthe candidate, clipped to the max number of times itappears in any reference

countclip(the) = 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 257: NATURAL LANGUAGE GENERATION - Helsinki

BLEUA modified precision score

• Calculate over whole corpus as follows:

pn =

∑c∈Candidates

∑n-gram∈c

countclip(n-gram)∑c ′∈Candidates

∑n-gram′∈c ′

count(n-gram′)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 258: NATURAL LANGUAGE GENERATION - Helsinki

BLEUA modified precision score

• Take geometric mean of modified precision scores fordifferent length n-grams, applying weighing:

almost-BLEU = exp

(N∑

n=1

wn log pn

)

• Baseline is N = 4 and wn = 1/N

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 259: NATURAL LANGUAGE GENERATION - Helsinki

BLEUA modified precision score

• Observervation: Shorter candidates get higher scores• Solution: A brevity penalty for candidates shorter than

references

BP =

{1 if c > re(1−r/c) if c ≤ r

• c is length of candidate, r is “effective reference corpuslength”.• Definition of r varies a bit, can be e.g. length of reference

closest in lengthHELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 260: NATURAL LANGUAGE GENERATION - Helsinki

BLEUA modified precision score

• Apply BP by simply multiplying it in

BLEU = BP exp

(N∑

n=1

wn log pn

)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 261: NATURAL LANGUAGE GENERATION - Helsinki

OTHER METRICS

• ROUGE-N: Overlap of n-grams

• ROUGE-L: Based on longest common subsequence

• METEOR: Weighted mean of unigram precision andrecall with penalty for misalignment

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 262: NATURAL LANGUAGE GENERATION - Helsinki

LARGE SCALE ONLY

Example

Reference: ‘The cat jumped on the table’Candidate 1: ‘The tabby jumped unto the table’Candidate 2: ‘The kitten leaped up and landed atop the table’

• Automated metrics only claim to correlate with humanjudgements given a sufficiently representative set ofreferences

• OK for short texts in closed domains, exponentially moredifficult for longer texts and more open domains

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 263: NATURAL LANGUAGE GENERATION - Helsinki

LARGE SCALE ONLY

• Automated metrics only claim to correlate with humanjudgements given a sufficiently representative set ofreferences

• OK for short texts in closed domains, exponentially moredifficult for longer texts and more open domains

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 264: NATURAL LANGUAGE GENERATION - Helsinki

THE PROBLEMATICREFERENCES

• References are human-made

• Large amounts needed (prev. slide) → crowdsourcing

• Crowdsourcing can be a source of errors and bias

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 265: NATURAL LANGUAGE GENERATION - Helsinki

THE PROBLEMATICREFERENCES

• References are human-made

• Large amounts needed (prev. slide) → crowdsourcing

• Crowdsourcing can be a source of errors and bias

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 266: NATURAL LANGUAGE GENERATION - Helsinki

THE PROBLEMATICREFERENCES

• References are human-made

• Large amounts needed (prev. slide) → crowdsourcing

• Crowdsourcing can be a source of errors and bias

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 267: NATURAL LANGUAGE GENERATION - Helsinki

THE PROBLEMATICREFERENCES

• References are human-made

• Large amounts needed (prev. slide) → crowdsourcing

• Crowdsourcing can be a source of errors and bias

Let’s try to replicate van Miltenburg et al., 2017

Individually go to presemo.helsinki.fi/nlp2019 and type in acaption for each of the following pictures.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 268: NATURAL LANGUAGE GENERATION - Helsinki

PICTURE 1

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 269: NATURAL LANGUAGE GENERATION - Helsinki

PICTURE 2

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 270: NATURAL LANGUAGE GENERATION - Helsinki

PICTURE 3

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 271: NATURAL LANGUAGE GENERATION - Helsinki

PICTURE 4

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 272: NATURAL LANGUAGE GENERATION - Helsinki

BLEU PRACTICE

• BLEU is standard, but problematic

• ‘Overall, the evidence supports using BLEU for diagnosticevaluation of MT systems (which is what it was originallyproposed for), but does not support using BLEU outwithMT, for evaluation of individual texts, or for scientifichypothesis testing.’ (Reiter, 2017)

• Empirical observation: BLEU’s correlation with humanjudgements is increasing(!)

• Unclear why

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 273: NATURAL LANGUAGE GENERATION - Helsinki

BLEU PRACTICE

• BLEU is standard, but problematic

• ‘Overall, the evidence supports using BLEU for diagnosticevaluation of MT systems (which is what it was originallyproposed for), but does not support using BLEU outwithMT, for evaluation of individual texts, or for scientifichypothesis testing.’ (Reiter, 2017)

• Empirical observation: BLEU’s correlation with humanjudgements is increasing(!)

• Unclear why

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 274: NATURAL LANGUAGE GENERATION - Helsinki

BLEU PRACTICE

• BLEU is standard, but problematic

• ‘Overall, the evidence supports using BLEU for diagnosticevaluation of MT systems (which is what it was originallyproposed for), but does not support using BLEU outwithMT, for evaluation of individual texts, or for scientifichypothesis testing.’ (Reiter, 2017)

• Empirical observation: BLEU’s correlation with humanjudgements is increasing(!)

• Unclear why

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 275: NATURAL LANGUAGE GENERATION - Helsinki

BLEU PRACTICE

• BLEU is standard, but problematic

• ‘Overall, the evidence supports using BLEU for diagnosticevaluation of MT systems (which is what it was originallyproposed for), but does not support using BLEU outwithMT, for evaluation of individual texts, or for scientifichypothesis testing.’ (Reiter, 2017)

• Empirical observation: BLEU’s correlation with humanjudgements is increasing(!)• Unclear why

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 276: NATURAL LANGUAGE GENERATION - Helsinki

OTHER METRICS INPRACTICE

• Other automated metrics are less comprehensively studied

• In general, automated metrics do not correlate too wellwith human judgements

• Methods based on n-gram overlap or string distance areproblematic

• Esp. for trying to measure performance on a subtask

• Increasing worry about state of automatic evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 277: NATURAL LANGUAGE GENERATION - Helsinki

OTHER METRICS INPRACTICE

• Other automated metrics are less comprehensively studied

• In general, automated metrics do not correlate too wellwith human judgements

• Methods based on n-gram overlap or string distance areproblematic

• Esp. for trying to measure performance on a subtask

• Increasing worry about state of automatic evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 278: NATURAL LANGUAGE GENERATION - Helsinki

OTHER METRICS INPRACTICE

• Other automated metrics are less comprehensively studied

• In general, automated metrics do not correlate too wellwith human judgements

• Methods based on n-gram overlap or string distance areproblematic

• Esp. for trying to measure performance on a subtask

• Increasing worry about state of automatic evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 279: NATURAL LANGUAGE GENERATION - Helsinki

OTHER METRICS INPRACTICE

• Other automated metrics are less comprehensively studied

• In general, automated metrics do not correlate too wellwith human judgements

• Methods based on n-gram overlap or string distance areproblematic• Esp. for trying to measure performance on a subtask

• Increasing worry about state of automatic evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 280: NATURAL LANGUAGE GENERATION - Helsinki

OTHER METRICS INPRACTICE

• Other automated metrics are less comprehensively studied

• In general, automated metrics do not correlate too wellwith human judgements

• Methods based on n-gram overlap or string distance areproblematic• Esp. for trying to measure performance on a subtask

• Increasing worry about state of automatic evaluation

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 281: NATURAL LANGUAGE GENERATION - Helsinki

HUMAN EVALUATIONTranslation Edit Rate

• Calculate the amount of post-edits made by humans to‘correct’ the text

• Instruct editors to make the smallest possible set ofchanges

• Empirical/anecdotal evidence of overestimating errors!

• Editors won’t stick with minimal:

• ‘I prefer it the other way’• ‘Not really an error, but it was quick to change’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 282: NATURAL LANGUAGE GENERATION - Helsinki

HUMAN EVALUATIONTranslation Edit Rate

• Calculate the amount of post-edits made by humans to‘correct’ the text

• Instruct editors to make the smallest possible set ofchanges

• Empirical/anecdotal evidence of overestimating errors!

• Editors won’t stick with minimal:

• ‘I prefer it the other way’• ‘Not really an error, but it was quick to change’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 283: NATURAL LANGUAGE GENERATION - Helsinki

HUMAN EVALUATIONTranslation Edit Rate

• Calculate the amount of post-edits made by humans to‘correct’ the text

• Instruct editors to make the smallest possible set ofchanges

• Empirical/anecdotal evidence of overestimating errors!

• Editors won’t stick with minimal:

• ‘I prefer it the other way’• ‘Not really an error, but it was quick to change’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 284: NATURAL LANGUAGE GENERATION - Helsinki

HUMAN EVALUATIONTranslation Edit Rate

• Calculate the amount of post-edits made by humans to‘correct’ the text

• Instruct editors to make the smallest possible set ofchanges

• Empirical/anecdotal evidence of overestimating errors!

• Editors won’t stick with minimal:

• ‘I prefer it the other way’• ‘Not really an error, but it was quick to change’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 285: NATURAL LANGUAGE GENERATION - Helsinki

HUMAN EVALUATIONTranslation Edit Rate

• Calculate the amount of post-edits made by humans to‘correct’ the text

• Instruct editors to make the smallest possible set ofchanges

• Empirical/anecdotal evidence of overestimating errors!

• Editors won’t stick with minimal:• ‘I prefer it the other way’

• ‘Not really an error, but it was quick to change’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 286: NATURAL LANGUAGE GENERATION - Helsinki

HUMAN EVALUATIONTranslation Edit Rate

• Calculate the amount of post-edits made by humans to‘correct’ the text

• Instruct editors to make the smallest possible set ofchanges

• Empirical/anecdotal evidence of overestimating errors!

• Editors won’t stick with minimal:• ‘I prefer it the other way’• ‘Not really an error, but it was quick to change’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 287: NATURAL LANGUAGE GENERATION - Helsinki

INTRINSIC HUMANEVALUATION

• ‘On a scale of 1 to 5, how pleasant is this to read? ’

• Captures only some aspects of quality

• Esp. correctness very difficult for complex domains andlonger texts

• How can the judge know something was missing,misleading or wrong?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 288: NATURAL LANGUAGE GENERATION - Helsinki

INTRINSIC HUMANEVALUATION

• ‘On a scale of 1 to 5, how pleasant is this to read? ’

• Captures only some aspects of quality

• Esp. correctness very difficult for complex domains andlonger texts

• How can the judge know something was missing,misleading or wrong?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 289: NATURAL LANGUAGE GENERATION - Helsinki

INTRINSIC HUMANEVALUATION

• ‘On a scale of 1 to 5, how pleasant is this to read? ’

• Captures only some aspects of quality

• Esp. correctness very difficult for complex domains andlonger texts

• How can the judge know something was missing,misleading or wrong?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 290: NATURAL LANGUAGE GENERATION - Helsinki

INTRINSIC HUMANEVALUATION

• ‘On a scale of 1 to 5, how pleasant is this to read? ’

• Captures only some aspects of quality

• Esp. correctness very difficult for complex domains andlonger texts• How can the judge know something was missing,

misleading or wrong?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 291: NATURAL LANGUAGE GENERATION - Helsinki

EXTRINSIC HUMANEVALUATION

• Measuring whether the message gets humans to do thecorrect things

• For example:

• Summary of medical info → Correct treatment• Info on hazards of smoking → Quitting• News article a football game → ???

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 292: NATURAL LANGUAGE GENERATION - Helsinki

EXTRINSIC HUMANEVALUATION

• Measuring whether the message gets humans to do thecorrect things

• For example:

• Summary of medical info → Correct treatment• Info on hazards of smoking → Quitting• News article a football game → ???

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 293: NATURAL LANGUAGE GENERATION - Helsinki

EXTRINSIC HUMANEVALUATION

• Measuring whether the message gets humans to do thecorrect things

• For example:• Summary of medical info → Correct treatment

• Info on hazards of smoking → Quitting• News article a football game → ???

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 294: NATURAL LANGUAGE GENERATION - Helsinki

EXTRINSIC HUMANEVALUATION

• Measuring whether the message gets humans to do thecorrect things

• For example:• Summary of medical info → Correct treatment• Info on hazards of smoking → Quitting

• News article a football game → ???

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 295: NATURAL LANGUAGE GENERATION - Helsinki

EXTRINSIC HUMANEVALUATION

• Measuring whether the message gets humans to do thecorrect things

• For example:• Summary of medical info → Correct treatment• Info on hazards of smoking → Quitting• News article a football game → ???

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 296: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:

• Is your dataset unique?

- I.e. can you compare your system to another

• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 297: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:

• Is your dataset unique?

- I.e. can you compare your system to another

• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 298: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:

• Is your dataset unique?

- I.e. can you compare your system to another

• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 299: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:• Is your dataset unique?

- I.e. can you compare your system to another• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 300: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:• Is your dataset unique?

- I.e. can you compare your system to another

• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 301: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:• Is your dataset unique?

- I.e. can you compare your system to another• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 302: NATURAL LANGUAGE GENERATION - Helsinki

HOW SHOULD WE EVALUATE?

• Acknowledge that evaluation is not a solved problem

• Human evaluations >>> Automated evaluations

• Identify your setting:• Is your dataset unique?

- I.e. can you compare your system to another• Do you have a corpus of references?

- I.e. can you use automated metrics

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 303: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Unique dataset, no reference corpus

• Human evaluations are only possibility

• Aim at both intrinsic and extrinsic

• If extrinsic is not possible, TER by expert is better thannothing

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 304: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Unique dataset, no reference corpus

• Human evaluations are only possibility

• Aim at both intrinsic and extrinsic

• If extrinsic is not possible, TER by expert is better thannothing

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 305: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Unique dataset, no reference corpus

• Human evaluations are only possibility

• Aim at both intrinsic and extrinsic

• If extrinsic is not possible, TER by expert is better thannothing

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 306: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Unique dataset, have reference

• Problem: Nobody knows in isolation whether “BLEU of26” is good or not

• Report multiple metrics to allow comparisons in futurework

• Still need human evaluations

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 307: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Unique dataset, have reference

• Problem: Nobody knows in isolation whether “BLEU of26” is good or not

• Report multiple metrics to allow comparisons in futurework

• Still need human evaluations

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 308: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Unique dataset, have reference

• Problem: Nobody knows in isolation whether “BLEU of26” is good or not

• Report multiple metrics to allow comparisons in futurework

• Still need human evaluations

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 309: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Well known dataset

• Report multiple automated metrics

• Only make strong claims if you score significantly higheron all

• Always report intrinsic human evaluations

• Known cases where automated metrics are indisagreement with human evals→ Human judgements are more convincing

• Conduct extrinsic human evaluation if applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 310: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Well known dataset

• Report multiple automated metrics• Only make strong claims if you score significantly higher

on all

• Always report intrinsic human evaluations

• Known cases where automated metrics are indisagreement with human evals→ Human judgements are more convincing

• Conduct extrinsic human evaluation if applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 311: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Well known dataset

• Report multiple automated metrics• Only make strong claims if you score significantly higher

on all

• Always report intrinsic human evaluations

• Known cases where automated metrics are indisagreement with human evals→ Human judgements are more convincing

• Conduct extrinsic human evaluation if applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 312: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Well known dataset

• Report multiple automated metrics• Only make strong claims if you score significantly higher

on all

• Always report intrinsic human evaluations• Known cases where automated metrics are in

disagreement with human evals→ Human judgements are more convincing

• Conduct extrinsic human evaluation if applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 313: NATURAL LANGUAGE GENERATION - Helsinki

HOW DO WE EVALUATE INPRACTICE?Well known dataset

• Report multiple automated metrics• Only make strong claims if you score significantly higher

on all

• Always report intrinsic human evaluations• Known cases where automated metrics are in

disagreement with human evals→ Human judgements are more convincing

• Conduct extrinsic human evaluation if applicable

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 314: NATURAL LANGUAGE GENERATION - Helsinki

OUTLINE

Introduction

NLG Subtasks

Classifying NLG Systems

A Few Architectures

Evaluating NLG

Dialogue Systems

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 315: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE SYSTEMS

• Dialogue systems are hard to classify

• On one hand, input is text → text-to-text NLG

• On the other hand, usually seen as a sequence of NLU(understanding the human) and NLG (replying) tasks

• Ignore the classification for now

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 316: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE SYSTEMS

• Dialogue systems are hard to classify

• On one hand, input is text → text-to-text NLG

• On the other hand, usually seen as a sequence of NLU(understanding the human) and NLG (replying) tasks

• Ignore the classification for now

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 317: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE SYSTEMS

• Dialogue systems are hard to classify

• On one hand, input is text → text-to-text NLG

• On the other hand, usually seen as a sequence of NLU(understanding the human) and NLG (replying) tasks

• Ignore the classification for now

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 318: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE SYSTEMS

• Dialogue systems are hard to classify

• On one hand, input is text → text-to-text NLG

• On the other hand, usually seen as a sequence of NLU(understanding the human) and NLG (replying) tasks

• Ignore the classification for now

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 319: NATURAL LANGUAGE GENERATION - Helsinki

COMPONENTS OF ADIALOGUE SYSTEM

• NLU unit – Interprets the NL input

• Dialogue management – Decide what the system shoulddo next

• NLG unit – Produce the NL output

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 320: NATURAL LANGUAGE GENERATION - Helsinki

COMPONENTS OF ADIALOGUE SYSTEM

• NLU unit – Interprets the NL input

• Dialogue management – Decide what the system shoulddo next

• NLG unit – Produce the NL output

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 321: NATURAL LANGUAGE GENERATION - Helsinki

COMPONENTS OF ADIALOGUE SYSTEM

• NLU unit – Interprets the NL input

• Dialogue management – Decide what the system shoulddo next

• NLG unit – Produce the NL output

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 322: NATURAL LANGUAGE GENERATION - Helsinki

FLAVOURS OF DIALOGUESYSTEMS

• Dialogue comes in two primary flavours

• Task-oriented dialogue

• Non-task-oriented dialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 323: NATURAL LANGUAGE GENERATION - Helsinki

FLAVOURS OF DIALOGUESYSTEMS

• Dialogue comes in two primary flavours

• Task-oriented dialogue

• Non-task-oriented dialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 324: NATURAL LANGUAGE GENERATION - Helsinki

FLAVOURS OF DIALOGUESYSTEMS

• Dialogue comes in two primary flavours

• Task-oriented dialogue

• Non-task-oriented dialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 325: NATURAL LANGUAGE GENERATION - Helsinki

TASK-ORIENTED DIALOGUE

• The system and/or the user are trying to achievesomething

• Find a good restaurant, book a plane ticket, etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 326: NATURAL LANGUAGE GENERATION - Helsinki

TASK-ORIENTED DIALOGUE

• The system and/or the user are trying to achievesomething

• Find a good restaurant, book a plane ticket, etc.

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 327: NATURAL LANGUAGE GENERATION - Helsinki

NON-TASK-ORIENTEDDIALOGUE

• There is no specific goal for the conversation

• Previously ‘chatbot’ or ‘chatterbot’

• These days ‘chatbot’ also used for task-oriented systems

• E.g. ELIZA (1966)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 328: NATURAL LANGUAGE GENERATION - Helsinki

NON-TASK-ORIENTEDDIALOGUE

• There is no specific goal for the conversation

• Previously ‘chatbot’ or ‘chatterbot’

• These days ‘chatbot’ also used for task-oriented systems

• E.g. ELIZA (1966)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 329: NATURAL LANGUAGE GENERATION - Helsinki

NON-TASK-ORIENTEDDIALOGUE

• There is no specific goal for the conversation

• Previously ‘chatbot’ or ‘chatterbot’• These days ‘chatbot’ also used for task-oriented systems

• E.g. ELIZA (1966)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 330: NATURAL LANGUAGE GENERATION - Helsinki

NON-TASK-ORIENTEDDIALOGUE

• There is no specific goal for the conversation

• Previously ‘chatbot’ or ‘chatterbot’• These days ‘chatbot’ also used for task-oriented systems

• E.g. ELIZA (1966)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 331: NATURAL LANGUAGE GENERATION - Helsinki

ELIZA

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 332: NATURAL LANGUAGE GENERATION - Helsinki

NLU IN DIALOGUE

• Translate the NL input provided by the human using thesystem into some logical format for the dialogue manager

• Can be preceded by a stage of e.g. speech recognition

Example input

‘Are there any action movies to see this weekend?’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 333: NATURAL LANGUAGE GENERATION - Helsinki

NLU IN DIALOGUE

• Translate the NL input provided by the human using thesystem into some logical format for the dialogue manager

• Can be preceded by a stage of e.g. speech recognition

Example input

‘Are there any action movies to see this weekend?’

Example output

request movie(genre=action, date=this weekend)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 334: NATURAL LANGUAGE GENERATION - Helsinki

NLU IN DIALOGUE

• Translate the NL input provided by the human using thesystem into some logical format for the dialogue manager

• Can be preceded by a stage of e.g. speech recognition

Example input

‘Are there any action movies to see this weekend?’

Example output

request movie(genre=action, date=this weekend)

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 335: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE MANAGEMENT

• Keeps track and updates dialogue state and history anduser goal

• Decides what should be done next based on the above

• Can be split into two subcomponents along the abovedivision

• State tracking• Policy learning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 336: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE MANAGEMENT

• Keeps track and updates dialogue state and history anduser goal

• Decides what should be done next based on the above

• Can be split into two subcomponents along the abovedivision

• State tracking• Policy learning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 337: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE MANAGEMENT

• Keeps track and updates dialogue state and history anduser goal

• Decides what should be done next based on the above

• Can be split into two subcomponents along the abovedivision

• State tracking• Policy learning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 338: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE MANAGEMENT

• Keeps track and updates dialogue state and history anduser goal

• Decides what should be done next based on the above

• Can be split into two subcomponents along the abovedivision• State tracking

• Policy learning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 339: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE MANAGEMENT

• Keeps track and updates dialogue state and history anduser goal

• Decides what should be done next based on the above

• Can be split into two subcomponents along the abovedivision• State tracking• Policy learning

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 340: NATURAL LANGUAGE GENERATION - Helsinki

DIALOGUE MANAGEMENT

• Keeps track and updates dialogue state and history• Decides what should be done next based on the above• DM identifies it does not know where the user wants to

see the movie. Decides best action is to ask for additionalinformation. Also uses opportunity to implicitly verify it’sunderstanding of the current dialogue state:

Example

request(location, action=request movie(

genre=action, date=this weekend))

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 341: NATURAL LANGUAGE GENERATION - Helsinki

NLG IN DIALOGUE

• Taking the DM’s output as input, produce the textualoutput

• Can be seen as ‘standard NLG’

• Sometimes followed by an additional realization stage,e.g. text-to-speech

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 342: NATURAL LANGUAGE GENERATION - Helsinki

NLG IN DIALOGUE

• Taking the DM’s output as input, produce the textualoutput

• Can be seen as ‘standard NLG’

• Sometimes followed by an additional realization stage,e.g. text-to-speech

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 343: NATURAL LANGUAGE GENERATION - Helsinki

NLG IN DIALOGUE

• Taking the DM’s output as input, produce the textualoutput

• Can be seen as ‘standard NLG’

• Sometimes followed by an additional realization stage,e.g. text-to-speech

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 344: NATURAL LANGUAGE GENERATION - Helsinki

NLG IN DIALOGUE

• Taking the DM’s output as input, produce the textualoutput

• Can be seen as ‘standard NLG’

• Sometimes followed by an additional realization stage,e.g. text-to-speech

Example output

Where would you like to see the action movie this weekend?

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 345: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’

• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 346: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’

• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 347: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’

• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 348: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’• NLU’d to inform(location=Helsinki)

• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 349: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 350: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions

- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 351: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state

- Contrast to ‘In Espoo, but I meant the weekend afterthat’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 352: NATURAL LANGUAGE GENERATION - Helsinki

STATE IN DIALOGUE

• NLU and NLG are stateless→ Can use fairly standard approaches

• All state about the dialogue lives in the dialogue manager

• Assume the next NL input is ‘In Helsinki’• NLU’d to inform(location=Helsinki)• DM must infer multiple things

- This is an answer to i’s previous questions- It contains an implicit verification of the previous state- Contrast to ‘In Espoo, but I meant the weekend after

that’

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 353: NATURAL LANGUAGE GENERATION - Helsinki

METHODS FOR DIALOGUE

• Classically rules & pipelines

• Dialogue management using e.g. reinforcement learningor human-written rules

• More recently research into end-to-end systems andneural methods in individual components

• Seq-2-seq neural networks esp. in non-task-orienteddialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 354: NATURAL LANGUAGE GENERATION - Helsinki

METHODS FOR DIALOGUE

• Classically rules & pipelines

• Dialogue management using e.g. reinforcement learningor human-written rules

• More recently research into end-to-end systems andneural methods in individual components

• Seq-2-seq neural networks esp. in non-task-orienteddialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 355: NATURAL LANGUAGE GENERATION - Helsinki

METHODS FOR DIALOGUE

• Classically rules & pipelines

• Dialogue management using e.g. reinforcement learningor human-written rules

• More recently research into end-to-end systems andneural methods in individual components

• Seq-2-seq neural networks esp. in non-task-orienteddialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 356: NATURAL LANGUAGE GENERATION - Helsinki

METHODS FOR DIALOGUE

• Classically rules & pipelines

• Dialogue management using e.g. reinforcement learningor human-written rules

• More recently research into end-to-end systems andneural methods in individual components

• Seq-2-seq neural networks esp. in non-task-orienteddialogue

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 357: NATURAL LANGUAGE GENERATION - Helsinki

EXAMPLE SEQ2SEQ

From Deep Learning for Chatbots

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 358: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN DANGERS2016: Microsoft’s Tay

• March 23: First tweet: ‘hellooooooo world!!!

• March 24: ‘@godblessameriga WE’RE GOING TO BUILDA WALL, AND MEXICO IS GOING TO PAY FOR IT’

• Suspended for a while, reintroduced March 30th

• March 30: starts spamming ‘You are too fast, please takea rest.’ several times per second

• Suspended again, hasn’t returned

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 359: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN DANGERS2016: Microsoft’s Tay

• March 23: First tweet: ‘hellooooooo world!!!

• March 24: ‘@godblessameriga WE’RE GOING TO BUILDA WALL, AND MEXICO IS GOING TO PAY FOR IT’

• Suspended for a while, reintroduced March 30th

• March 30: starts spamming ‘You are too fast, please takea rest.’ several times per second

• Suspended again, hasn’t returned

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 360: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN DANGERS2016: Microsoft’s Tay

• March 23: First tweet: ‘hellooooooo world!!!

• March 24: ‘@godblessameriga WE’RE GOING TO BUILDA WALL, AND MEXICO IS GOING TO PAY FOR IT’

• Suspended for a while, reintroduced March 30th

• March 30: starts spamming ‘You are too fast, please takea rest.’ several times per second

• Suspended again, hasn’t returned

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 361: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN DANGERS2016: Microsoft’s Tay

• March 23: First tweet: ‘hellooooooo world!!!

• March 24: ‘@godblessameriga WE’RE GOING TO BUILDA WALL, AND MEXICO IS GOING TO PAY FOR IT’

• Suspended for a while, reintroduced March 30th

• March 30: starts spamming ‘You are too fast, please takea rest.’ several times per second

• Suspended again, hasn’t returned

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 362: NATURAL LANGUAGE GENERATION - Helsinki

DATA-DRIVEN DANGERS2016: Microsoft’s Tay

• March 23: First tweet: ‘hellooooooo world!!!

• March 24: ‘@godblessameriga WE’RE GOING TO BUILDA WALL, AND MEXICO IS GOING TO PAY FOR IT’

• Suspended for a while, reintroduced March 30th

• March 30: starts spamming ‘You are too fast, please takea rest.’ several times per second

• Suspended again, hasn’t returned

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6

Page 363: NATURAL LANGUAGE GENERATION - Helsinki

WHERE FROM HERE?

• Reiter, Ehud, and Robert Dale. Building natural language generation systems.Cambridge university press, 2000.

• Gatt, Albert, and Emiel Krahmer. ”Survey of the state of the art in naturallanguage generation: Core tasks, applications and evaluation.” Journal ofArtificial Intelligence Research 61 (2018): 65-170.

• Reiter, Ehud, and Anja Belz. ”An investigation into the validity of some metricsfor automatically evaluating natural language generation systems.”Computational Linguistics 35.4 (2009): 529-558.

• Proceedings of the International Natural Language Generation Conference

HELSINGIN YLIOPISTO

HELSINGFORS UNIVERSITET

UNIVERSITY OF HELSINKI Department of Computer Science Leo Leppanen NLP - 2018 - Lecture 6