Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An...

84
Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science & Engineering IIT Kharagpur
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An...

Page 1: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

Natural Language Generation

An Introductory Tour

Anupam BasuDept. of Computer Science &

EngineeringIIT Kharagpur

Page 2: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Text

Language Technology

Natural Language

Understanding

Natural Language

Generation

Speech Recognition

Speech Synthesis

Text

Meaning

Speech Speech

Page 3: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

What is NLG? Thought / conceptualization of the world ------ Expression

The block c is on block a

The block a is under block c

The block b is by the side of a

The block b is on the right of a

The block b has its top free

The block b is alone ………

Page 4: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Conceptualization

Some intermediate form of representation

ON (C, A)

ON (A, TABLE)

ON (B, TABLE)

RIGHT_OF (B,A) …….

What to say?

Page 5: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Conceptualization

C

A B

On

Right_of

Is_aBlock

Is_a

What to say?

Page 6: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

What to say ? How to say ?

Natural language generation is the process of deliberately constructing a natural language text in order to meet specified communicative goals.

[McDonald 1992]

Page 7: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Some of the Applications Machine Translation

Question Answering

Dialogue Systems

Text Summarization

Report Generation

Page 8: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Thought / Concept Expression Objective:

produce understandable and appropriate texts in human languages

Input: some underlying non-linguistic representation

of information

Knowledge sources required: Knowledge of language and of the domain

Page 9: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Involved Expertise Knowledge of Domain

What to say Relevance

Knowledge of Language Lexicon, Grammar, Semantics

Strategic Rhetorical Knowledge How to achieve goals, text types, style

Sociolinguistic and Psychological Factors Habits and Constraints of the end user as an information

processor

Page 10: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Asking for a pen

have(X, z) not have (Y,z)

want have (Y,z)

ask(give (X,z,Y)))

Could you please give me a pen?

Situation

Goal

Conceptualization

Expression

Why?

What?

How?

Page 11: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

Some Examples

Page 12: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Example System #1: FoG Function:

Produces textual weather reports in English and French Input:

Graphical/numerical weather depiction User:

Environment Canada (Canadian Weather Service) Developer:

CoGenTex Status:

Fielded, in operational use since 1992

Page 13: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

FoG: Input

Page 14: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

FoG: Output

Page 15: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Example System #2: STOP Function:

Produces a personalised smoking-cessation leaflet Input:

Questionnaire about smoking attitudes, beliefs, history User:

NHS (British Health Service) Developer:

University of Aberdeen Status:

Undergoing clinical evaluation to determine its effectiveness

Page 16: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

STOP: InputSMOKING QUESTIONNAIRE

Please answer by marking the most appropriate box for each question like this:

Q1 Have you smoked a cigarette in the last week, even a puff?YES NO

Please complete the following questions Please return the questionnaire unanswered in theenvelope provided. Thank you.

Please read the questions carefully. If you are not sure how to answer, just give the best answer you can.

Q2 Home situation:Livealone

Live withhusband/wife/partner

Live withother adults

Live withchildren

Q3 Number of children under 16 living at home ………………… boys ………1……. girls

Q4 Does anyone else in your household smoke? (If so, please mark all boxes which apply)husband/wife/partner other family member others

Q5 How long have you smoked for? …10… years Tick here if you have smoked for less than a year

Page 17: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

STOP: OutputDear Ms CameronThank you for taking the trouble to return the smoking questionnaire that we sent you. It appears from your answers that although you're not planning to stop smoking in the near future, you would like to stop if it was easy. You think it would be difficult to stop because smoking helps you cope with stress, it is something to do when you are bored, and smoking stops you putting on weight. However, you have reasons to be confident of success if you did try to stop, and there are ways of coping with the difficulties.

Page 18: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

Approaches

Page 19: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Template-based generation• Most common technique

• In simplest form, words fill in slots: “The train from Source to Destination will

leave platform number at time hours” Most common sort of NLG found in

commercial systems

Page 20: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Pros and Cons Pros

Conceptually simple

No specialized knowledge needed

Can be tailored to a domain with good performance

Cons

Not general

No variation in style – monotonous

Not scalable

Page 21: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Modern Approaches Rule Based approach

Machine Learning Approach

Page 22: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

Some Critical Issues

Page 23: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Context Sensitivity in Connected Sentences X-town was a blooming city. Yet, when the hooligans

started to invade the place, __________ . The place was not livable any more.

the place was abandoned by its population

the place was abandoned by them

the city was abandoned by its population

it was abandoned by its population

its population abandoned it……..

Page 24: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Referencing John is Jane’s friend. He loves to swim with

his dog in the pool. It is really lovely.

I am taking the Shatabdi Express tomorrow. It is a much better train than the Rajdhani Express. It has a nice restaurant car, while the other has nice seats.

Page 25: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

ReferencingJohn stole the book from Mary, but he was

caught.

John stole the book from Mary, but the fool was caught.

Page 26: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

AggregationThe dress was cheap.The dress was beautiful

The dress was cheap and beautifulThe dress was cheap yet beautiful

I found the boy. The boy was lost.I found the boy who was lost

I found the lost boy.

Sita bought a story book. Geeta bought a story book.

???? Sita and Geeta bought a story book.???? Sita bought a story book and Geeta

also bought a story book

Page 27: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Choice of words (Lexicalization)The bus was in time. The journey was fine.

The seats were bad.

The bus was in perfect time. The journey was fantastic. The seats were awful.

The bus was in perfect time. The journey was fantastic. However, the seats were not that good.

Page 28: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

General Architecture

Page 29: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Component Tasks in NLG Content Planning

=== Macroplanner Document Structuring

Sentence Planner === Microplanning Aggregation ; Lexicalization; Referring Expression Generation

Surface Form Realization Linguistic realization; Structure Realization

Page 30: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

A Pipelined Architecture

Document Planning

Microplanning

Surface Realizatio

n

Document Plan

Text Specification

Page 31: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

An ExampleConsider two assertions

has (Hotel_Bliss, food (bad))has (Hotel_Bliss, ambience (good))

Content Planning selects information orderingHotel Bliss has bad food but its ambience is good

Hotel Bliss has good ambience but its food is good

Page 32: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

has (Hotel_Bliss, food (bad))Sentence Planning

choose syntactic templates choose lexicon bad or awful food or cuisine good or excellent Aggregate the two propositions

Generate referring expressionsIt or this restaurant

OrderingA big red ball OR A red big ball

Have

Entity Feature

Modifier

Subj Obj

Page 33: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Realizationcorrect verb inflection Have Has

may require noun inflection (not in this case) Articles required? Where? Conversion into final string Capitalization and Punctuation

Page 34: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Content Planning What to say

Data collection Making domain specific inferences Content selection Proposition formulation

Each proposition A clause Text structuring

Sequential ordering of propositions Specifying Rhetorical Relations

Page 35: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Content Planning Approaches Schema based (McKeown 1985)

Specify what information, in which order The schema is traversed to generate discourse

plan

Application of operators (similar to Rule Based approach) --- Hovy 93 The discourse plan is generated dynamically

Output is Content Plan Tree

Page 36: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Discourse

Demograph

Detailed view Summary

Name Age

Blood SugarCare

Group nodes

Page 37: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Content Plan Plan Tree Generation Ordering – of Group nodes Propositions

Rhetorical relations between leaf nodes

Paragraph and sentence boundaries

Page 38: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Rhetorical Relations

You should ...I’m in ... You can get ...The show ... It got a ...

MOTIVATION

MOTIVATION

EVIDENCE

ENABLEMENT

Page 39: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Rhetorical RelationsThree basic rhetorical relationships: SEQUENCE ELABORATION CONTRAST

Others like Justification Inference

Page 40: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Nucleus and Satellites

I love to collect classic cars

My favourite car is Toyota Innova

I drive my Maruti 800

Elaboration

Contrast

N

Page 41: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Target Text The month was cooler and drier than average, with the average number of rain days, but the total rain for the year so far is well below average. Although there was rain on every day for 8 days from 11th to 18th, rainfall amounts were mostly small.

Page 42: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Document Structuring in WeatherReporterThe Message Set:

MonthlyTempMsg ("cooler than average")MonthlyRainfallMsg ("drier than average")RainyDaysMsg ("average number of rain days")RainSoFarMsg ("well below average")RainSpellMsg ("8 days from 11th to 18th")RainAmountsMsg ("amounts mostly small")

Page 43: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Document Structuring in Weather Reporter

RainSoFarMsg

CONTRAST

RainAmounts

Msg

CONTRAST

ELABORATION

RainSpellMsg

RainyDaysMsg

ELABORATION

MonthlyTmpMsg

SEQUENCE

MonthlyRainfallMsg

Page 44: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Some Common RST Relationships Elaboration: The satellite presents more details about the

content of the nucleus

Contrast: The nuclei presents things, which are similar in some respects but different in some other relevant way. Multinuclear – no distinction bet. N and S

Purpose: S presents the goal of performing the activity presented in the nucleus

Condition: S presents something that must occur before the situation presented in N can occur

Result: N results from S

Page 45: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Planning Approach

Save Document

The system saves the document

Choose Save option

Select Folder

Type Filename

Click Save Button

A dialog box displayed

Dialog box closed

Page 46: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Planning OperatorName: Expand Purpose

Effect:(COMPETENT hearer(DO-ACTION ?action))

Constraints:(AND (get_all_substeps ?action ?subaction)

(NOT (singular list ?subaction))Nucleus:

(COMPETENT hearer (DO-SEQUENCE ?subaction))

Satellite:(((RST-PURPOSE (INFORM hearer (DO ?action)))

Page 47: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Expand SubactionsEffect:

(COMPETENT hearer (DO-SEQUENCE ?actions))

Constraints:NIL

Nucleus:(for each ?actions (RST-SEQUENCE

(COMPETENT hearer (DO-ACTION ?actions))))

Satellites:NIL

Page 48: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Purpose

Result

Choose Save Dialog

Box Opens

Choose Folder

Sequence

Page 49: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Discourse To save a file

1. Choose save option from file menu A dialog box will appear

2. Choose the folder 3. Type the file name 4. Click the Save button The system will save the document

Page 50: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Rhetorical Relations – Difficult to inferJohh abused the duckThe duck buzzed John

1. John abused the duck that had buzzed him

2. The duck buzzed John who had abused it3. The duck buzzed John and he abused it4. John abused the duck and it buzzed him

Page 51: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

On Clause Aggregation

Page 52: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Benefits of Aggregation Conciseness

Same information with fewer words

Cohesion We want a semantic unit – not a jumble of

disconnected phrases

Fluency Less effort to read Unambiguous and acc. to communication

conventions

Page 53: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Complex interactions Aggregation adds to fluency

The patient was admitted on Monday and released on Friday.

Someone ate apples. Someone ate oranges

Someone, who ate apples also ate oranges

Page 54: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Aggregation OperatorsCategory Operators Resources Surface

markersInterpretive Summarization

InferenceCommon sense knowledgeOntology

Referential Ref. expr. GenerationQuantified expression

Ontology Discourse

Each, all both some

Syntactic ParatacticHypotactic

Syntactic rulesLexicon

And, with, who, which

Lexical Paraphrasing Lexicon

Page 55: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

InterpretiveJohn punched MaryMary kicked John => John fought with MaryJohn kicked Mary

Not always meaning preserving

Note use of Ontology

John kicked Mary + John punched Mary =/>

John fights with Mary

Page 56: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Referential Aggregation Reference Expression generation

The patient is Mary [name]. The patient is female [gender] The patient is 80 years old [age]. The patient has hypertension [med.history]

The patient is Mary. She is an 80 year old female. She has hypertension.

How much info in one sentence?

Page 57: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Reference ( Quantification) John is doing well Mary is doing well All the patients are

doing well

Note the use of background knowledge

The patient’s leftarm The patient’s right arm Each arm

Note the use of Ontology

Page 58: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Syntactic Aggregation Paratactic: Entities are of equal syntactic status

Ram likes Sita and Geeta

Main operator is co-ordinating conjunction

Hypotactic: Unequal statusNP modified by a PP

Ram likes Sita, who is a nurse

Page 59: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Lexical Aggregation In hypotactic aggregation, the satellite propositions are

modified.

The Maths score was 99.8% 99.8% is a record high score The Maths score was 99.8%, a record high score (apposition

modification)

The Maths score was a record high score of 99.8%

A dog used by police A police dog Rise sharply shoot Drop sharply plunge

Page 60: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Rhetorical Relations and HypotacticsUse of cue operatorsRR: ConcessionHe was fine He just had an accidentAlthough he had an accident he was fine

RR: EvidenceMy car is not Indian My car is a ToyotaMy car is not Indian because it is a Toyota

RR: ElaborationMy car is not Indian My car is expensiveMy expensive car is not Indian

Page 61: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Hypotactic Operators If propositions do not share any common entity, the

operator can simply join using cue phrase

N:Tom is feeling cold S:The window is open CauseTom is feeling cold because the window is open

If the linked propositions share common entities, the internals of the linked propositions undergo modifications

N: The child stopped hunger S: The child ate an apple [Purpose]

To stop hunger, the child ate an apple.

Page 62: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Two stage transformation:RR: EvidenceN: Tom was hungryS: Tom did not eat dinnerReplace Tom in N by ‘he’Apply Rule 1

Because Tom did not eat dinner, he was hungry

Page 63: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Corpus study to Rules [Example RR: Purpose N: Lift the cover S: Install battery]

% Example

To-infinitive 59.6 To install battery, lift the cover

For-Nominalization 7.5 Lift the cover for battery installation

For-Gerund 2.5 Lift the cover for installing battery

By-pupose 10 Install battery by lifting cover

So-Tat Purpose 8.4 Lift cover so battery can be installed

Page 64: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Syntactic constructions for realizing Elaboration relations

Verbosity M-direction Examples

R-Clause Short Before An apple which weighs 3 oz

Reduced R-Clause Shorter Before An apple weighing 3oz

PP Shorter Before An apple in the basket

Apposition Shortest Before An apple, a small fruit

Prenominalization Shortest After A 3 oz apple

Adjective Shortest After A dark red apple

Page 65: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Lexical Constraints Except for R-clause and Reduced R-clause, transforming a

proposition into an apposition, an adjective or a PP requires that the satellite proposition be of a specific syntactic type ( a noun, an adj or a PP respectively).

N: Jack is a runner.S: Jack is fast.

Jack is a fast runner

Fast and runner has a possible qualifying relationship.

Qualia Structure (Pustejovsky 91)

Page 66: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Constraints Linear Ordering

Paratactic Years 1998,1999 and 2000

Not Years 1999, 1998 and 2000

Hypotactic Uncommon orderings between premodifiers create

disfluencies A happy old man ---- An old happy man

Page 67: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Linear Ordering and Scope of ModifiersProblem when multiple modifiers modify the same noun Decide the order Avoid ambiguity

Ms. Jones is a patient of Dr. Smith, undergoing heart surgery

Old men and women should board firstWomen and old men should board first

Page 68: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Linear Ordering of Modifiers A simplex NP is a maximal noun phrase that includes pre-

modifiers such as determiners and possessives, but not post-nominals such as PPs and R-Cls.

A POS tagger along with FS grammar can be used to extract simples NPs.

A morphology module transforms plurals of nouns, comparative and superlative adjectives into their base form for frequency count.

Regular expression filter to remove concatenations of NPs Takeover bid last week Metformin 500 milligrams

Page 69: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Three stages of subsequent analysis Direct Evidence

Modifier sequences are transformed in ordered pairs Well known traditional brand name drug

Well known < traditional Well known < brand name traditional < brand name

Three possibilities A < B ; B< A; B=A (no order)

Page 70: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

For n modifiers nC2 ordered pairs

Form a w X w matrix where w is the number of distinct modifiers.

Find Count[A,B] and Count[B,A]

For small corpus binomial distribution of one following the other is observed.

Page 71: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Transitivity

Again from corpus A < B and B< C ? A < CLong, boring and strenuous stretchLong strenuous lecture

Clustering: Formation of equivalence classes of words with same ordering with respect tp other premodifiers

Page 72: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

John is a 74 year old hypertensive diabetic white male patient with a swollen mass in the left groin

John is a diabetic male white 74 year old hypertensive patient with a red swollen mass in the left groin

Page 73: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Other Constraints For conjunctions

John ate an apple and an orange (NP and NP) John ate in the morning and in the evening (PP and PP) X John ate an apple and in the evening (NP and PP)

Moral: Same syntactic category? John and a hammer broke the window ??? He is Nobel Prize winner and at the peak of his career.

Others: Adj phrase attachment, PP attachment etc.

Page 74: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

Conjunctions

Page 75: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Three interesting types John ate fish on Monday and rice on

Tuesday (non-constituent coordination)

John ate fish and Bill rice (gapping) Right node raising

John caught and Mary killed the spider

Page 76: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

A Naïve Algorithm

1. Group propositions and order them according to similarities

1.I sold English books on Monday2.I sold Hindi books on Wednesday3.I sold onion on Monday4.I sold Bengali books on Monday((1,3,4),2) OR ((1,4),3,2) OR…..

Page 77: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

2. Identify recurring elements

3. Determine sentence boundary

4. Delete redundant elements

Page 78: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Still Funny Scenarios The baker baked. The bread baked. The baker and the bread baked.

I don’t drink. I don’t chew tobacco. I don’t drink and chew tobacco.

==What should the constraints be?

Page 79: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Morphological Synthesis

Inflections depending on tense, aspect, mood, case, gender, number, person and familiarity.

A typical Bengali verb has 63 different inflected forms (120 if we consider the causative derivations)

Exceptions

Page 80: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Synthesis Approach

Classification of words based on Syllable structure [19 classes for Bengali verbs]

Paradigm tables for each of the classes

Table-driven modification of the words

Exceptions treated separately.

Page 81: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Different rules are used to inflect qualifiers and headwords

The rule to inflect proper noun as a headword in a particular SSU

IF (headword type = proper noun AND the SSU to which the headword belongs = kAke AND the last character of root word = ‘a’),

THENRule1: headword = headword + “ke”Rule1: headword = headword + “ke”rAma rAma rAmake rAmake

IF (Verb1==verb2 AND the Conjunction = Ebong AND SSU2 to which the headword belongs = kakhana AND the last character of root word = ‘a’)

THENRule1: headword = headword –’a’.Rule1: headword = headword –’a’.Rule2: headword = headword +’o’.Rule2: headword = headword +’o’.

Aaem gfkal bl /K/leClam ybL Aajo /Klb.Aaem gfkal bl /K/leClam ybL Aajo /Klb.Headword : Headword : Aaj + oAaj + o

Noun Morphology Synthesis

Page 82: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

• Depends upon TAM option. Category Identification +Table lookup

Category Identification: Structure of root verb: X * VC * $. where: X= Any Character, V= vowel, C=constant and $ € { Ø, a, A, oYA }.

Verb Morphology Synthesis

ghumA [ghumAno]

(to sleep) u/au

so;oYA [so;oYAno]

(lie, causative)

tolA [tolAno]

(pick, causative)

tola [tolA]

(to pick)

so [so;oYA]

(to lie down) o

deoYA [deoYAno]

(give, causative)

dekhA

[dekhAno]

(to show)

dekha [dekhA]

(to see) e

ni~NrA [ni~NrAno] likha [lekhA]

(to write)

di [deoYA]

(to give) i

khAoYA [khAoYAno]

(to feed)

jAnA [jAnAno]

(to inform)

jAna [jAnA]

(to know)

khA [khAoYA]

(to eat) A

saoYA [saoYAno]

(undergo, causative)

karA[karAno]

(do, causative)

kara [karA]

(to do)

ha [haoYA]

(to happen) a

oYA A a* $

V

Page 83: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Table Look Up

The Table Lookup Stage:

i) Pr Present ii) Pa Pastiii) Sim Simpleiv) Per Perfectv) Co Continuousvi) Ind Indicative vii) Neg Negation.

Page 84: Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Summer School on Natural Language Processing and Text Mining 2008

?Questions?