Introduction to Automatic Summarization Li Sujian ( 李素建 ) Peking University, China...

136
Introduction to Automatic Summarization Li Sujian ( 李李李 ) Peking University, China [email protected] Room 1108, Winslow Building, RPI

Transcript of Introduction to Automatic Summarization Li Sujian ( 李素建 ) Peking University, China...

Introduction to Automatic Summarization

Li Sujian ( 李素建 )

Peking University, China

[email protected]

Room 1108, Winslow Building, RPI

2

Table of contents

1. Motivation

2. Overview of summarization system.

3. Summarization techniques

4. Building a summarization system.

5. Evaluating summaries

6. The future.

Information overload

The problem:4 Billion URLs indexed by Google200 TB of data on the Web [Lyman and Varian

03]

Possible approaches: information retrieval document clustering information extraction question answering text summarization

http://time.com/3522586/texas-hospital-ebola-mistakes-apology/

http://www.telegraph.co.uk/women/womens-life/11167681/John-Grisham-paedophile-row-sex-offenders-are-not-good-guys.html

http://www.reuters.com/article/2014/10/16/us-usa-florida-election-idUSKCN0I51EC20141016

MILAN, Italy, April 18. A small airplane crashed into a governmentbuilding in heart of Milan, setting the top floors on fire, Italianpolice reported. There were no immediate reports on casualties asrescue workers attempted to clear the area in the city's financialdistrict. Few details of the crash were available, but news reportsabout it immediately set off fears that it might be a terrorist actakin to the Sept. 11 attacks in the United States. Those fears sentU.S. stocks tumbling to session lows in late morning trading.

Witnesses reported hearing a loud explosion from the 30-storyoffice building, which houses the administrative offices of the localLombardy region and sits next to the city's central train station.Italian state television said the crash put a hole in the 25th floorof the Pirelli building. News reports said smoke poured from theopening. Police and ambulances rushed to the building in downtownMilan. No further details were immediately available.

(Radev, 2004)

MILAN, Italy, April 18. A small airplane crashed into a governmentbuilding in heart of Milan, setting the top floors on fire, Italianpolice reported. There were no immediate reports on casualties asrescue workers attempted to clear the area in the city's financialdistrict. Few details of the crash were available, but news reportsabout it immediately set off fears that it might be a terrorist actakin to the Sept. 11 attacks in the United States. Those fears sentU.S. stocks tumbling to session lows in late morning trading.

Witnesses reported hearing a loud explosion from the 30-storyoffice building, which houses the administrative offices of the localLombardy region and sits next to the city's central train station.Italian state television said the crash put a hole in the 25th floorof the Pirelli building. News reports said smoke poured from theopening. Police and ambulances rushed to the building in downtownMilan. No further details were immediately available.

(Radev, 2004)

MILAN, Italy, April 18. A small airplane crashed into a governmentbuilding in heart of Milan, setting the top floors on fire, Italianpolice reported. There were no immediate reports on casualties asrescue workers attempted to clear the area in the city's financialdistrict. Few details of the crash were available, but news reportsabout it immediately set off fears that it might be a terrorist actakin to the Sept. 11 attacks in the United States. Those fears sentU.S. stocks tumbling to session lows in late morning trading.

Witnesses reported hearing a loud explosion from the 30-storyoffice building, which houses the administrative offices of the localLombardy region and sits next to the city's central train station.Italian state television said the crash put a hole in the 25th floorof the Pirelli building. News reports said smoke poured from theopening. Police and ambulances rushed to the building in downtownMilan. No further details were immediately available.

How many victims?

Was it a terrorist act?

What was the target?

What happened?

Says who?

When, where?

1. How many people were injured?2. How many people were killed? (age, number, gender, description)3. Was the pilot killed?4. Where was the plane coming from?5. Was it an accident (technical problem, illness, terrorist act)? 6. Who was the pilot? (age, number, gender, description) 7. When did the plane crash? 8. How tall is the Pirelli building? 9. Who was on the plane with the pilot? 10. Did the plane catch fire before hitting the building? 11. What was the weather like at the time of the crash? 12. When was the building built? 13. What direction was the plane flying? 14. How many people work in the building? 15. How many people were in the building at the time of the crash? 16. How many people were taken to the hospital? 17. What kind of aircraft was used?

10

TV-GUIDES — decision making

11

Abstracts of papers — time saving

12http://summly.com/technology.html

13

Questions What kinds of summaries do people want?

What are summarizing, abstracting, gisting,...?

How sophisticated must summ. systems be? Are statistical techniques sufficient?Or do we need symbolic techniques and deep

understanding as well?

What milestones would mark quantum leaps in summarization theory and practice? How do we measure summarization quality?

14

Table of contents

1. Motivation.

2. Overview of summarization system.

3. Summarization techniques

4. Building a summarization system.

5. Evaluating summaries

6. The future.

Summary definition(Sparck Jones,1999)“a reductive transformation of source text to summary text through content condensation by selection and/or generalization on what is important in the source.”

Definitions

Schematic summary processing model

Source text Interpretation

Source representation

Transformation

Summary

representation

GenerationSummary text

18

Summarizing factors Input (Sparck Jones 2007)

subject type: domain genre: newspaper articles, editorials, letters, reports... form: regular text structure; free-form source size: single doc; multiple docs (few; many)

Purpose situation: embedded in larger system (MT, IR) or not? audience: focused or general usage: IR, sorting, skimming...

Output completeness: include all aspects, or focus on some? format: paragraph, table, etc. style: informative, indicative, aggregative, critical...

19

ExamplesExercise: summarize the following texts for

the following readers:

text1: Coup Attempt

text2: childrens’ story

reader1: your friend, who knows nothing about South Africa.

reader2: someone who lives in South Africa and knows the political position.

reader3: your 4-year-old niece.reader4: the Library of Congress.

20

90 Soldiers Arrested After Coup Attempt In Tribal HomelandMMABATHO, South Africa (AP)

About 90 soldiers have been arrested and face possible death sentences stemming from a coup attempt in Bophuthatswana, leaders of the tribal homeland said Friday. Rebel soldiers staged the takeover bid Wednesday, detaining homeland President Lucas Mangope and several top Cabinet officials for 15 hours before South African soldiers and police rushed to the homeland, rescuing the leaders and restoring them to power. At least three soldiers and two civilians died in the uprising. Bophuthatswana's Minister of Justice G. Godfrey Mothibe told a news conference that those arrested have been charged with high treason and if convicted could be sentenced to death. He said the accused were to appear in court Monday. All those arrested in the coup attempt have been described as young troops, the most senior being a warrant officer. During the coup rebel soldiers installed as head of state Rocky Malebane-Metsing, leader of the opposition Progressive Peoples Party. Malebane-Metsing escaped capture and his whereabouts remained unknown, officials said. Several unsubstantiated reports said he fled to nearby Botswana. Warrant Officer M.T.F. Phiri, described by Mangope as one of the coup leaders, was arrested Friday in Mmabatho, capital of the nominally independent homeland, officials said. Bophuthatswana, which has a population of 1.7 million spread over seven separate land blocks, is one of 10 tribal homelands in South Africa. About half of South Africa's 26 million blacks live in the homelands, none of which are recognized internationally. Hennie Riekert, the homeland's defense minister, said South African troops were to remain in Bophuthatswana but will not become a ``permanent presence.'' Bophuthatswana's Foreign Minister Solomon Rathebe defended South Africa's intervention. ``The fact that ... the South African government (was invited) to assist in this drama is not anything new nor peculiar to Bophuthatswana,'' Rathebe said. ``But why South Africa, one might ask? Because she is the only country with whom Bophuthatswana enjoys diplomatic relations and has formal agreements.'' Mangope described the mutual defense treaty between the homeland and South Africa as ``similar to the NATO agreement,'' referring to the Atlantic military alliance. He did not elaborate. Asked about the causes of the coup, Mangope said, ``We granted people freedom perhaps ... to the extent of planning a thing like this.'' The uprising began around 2 a.m. Wednesday when rebel soldiers took Mangope and his top ministers from their homes to the national sports stadium. On Wednesday evening, South African soldiers and police stormed the stadium, rescuing Mangope and his Cabinet. South African President P.W. Botha and three of his Cabinet ministers flew to Mmabatho late Wednesday and met with Mangope, the homeland's only president since it was declared independent in 1977. The South African government has said, without producing evidence, that the outlawed African National Congress may be linked to the coup. The ANC, based in Lusaka, Zambia, dismissed the claims and said South Africa's actions showed that it maintains tight control over the homeland governments. The group seeks to topple the Pretoria government. The African National Congress and other anti-government organizations consider the homelands part of an apartheid system designed to fragment the black majority and deny them political rights in South Africa.

21

If You Give a Mouse a Cookie

Laura Joffe Numeroff © 1985

If you give a mouse a cookie,he’s going to ask for a glass of milk.

When you give him the milk, he’ll probably ask you for a straw.

When he’s finished, he’ll ask for a napkin.

Then he’ll want to look in the mirror to make sure he doesn’t have a milk mustache.

When he looks into the mirror, he might notice his hair needs a trim.

So he’ll probably ask for a pair of nail scissors.

When he’s finished giving himself a trim, he’ll want a broom to sweep up.

He’ll start sweeping.

He might get carried away and sweep every room in the house.

He may even end up washing the floors as well.

When he’s done, he’ll probably want to take a nap.

You’ll have to fix up a little box for him with a blanket and a pillow.

He’ll crawl in, make himself comfortable, and fluff the pillow a few times.

He’ll probably ask you to read him a story.

When you read to him from one of your picture books, he'll ask to see the pictures.

When he looks at the pictures, he’ll get so excited that he’ll want to draw one of his own. He’ll ask for paper and crayons.

He’ll draw a picture. When the picture is finished, he’ll want to sign his name, with a pen.

Then he’ll want to hang his picture on your refrigerator. Which means he’ll need Scotch tape.

He’ll hang up his drawing and stand back to look at it. Looking at the refrigerator will remind him that he’s thirsty.

So…he’ll ask for a glass of milk.

And chances are that if he asks for a glass of milk, he’s going to want a cookie to go with it.

22

‘Genres’ of Summary? Indicative vs. informative

...used for quick categorization vs. content processing.

Extract vs. abstract...lists fragments of text vs. re-phrases content coherently.

Generic vs. query-oriented...provides author’s view vs. reflects user’s interest.

Background vs. just-the-news...assumes reader’s prior knowledge is poor vs. up-to-date.

Single-document vs. multi-document source...based on one text vs. fuses together many texts.

23

A Summarization Machine

EXTRACTS

ABSTRACTS

?

MULTIDOCS

Extract Abstract

Indicative

Generic

Background

Query-oriented

Just the news

10%

50%

100%

Very BriefBrief

Long

Headline

Informative

DOC QUERY

CASE FRAMESTEMPLATESCORE CONCEPTSCORE EVENTSRELATIONSHIPSCLAUSE FRAGMENTSINDEX TERMS

24

The Modules of the Summarization Machine

EXTRACTION

INTERPRETATION

EXTRACTS

ABSTRACTS

?

CASE FRAMESTEMPLATESCORE CONCEPTSCORE EVENTSRELATIONSHIPSCLAUSE FRAGMENTSINDEX TERMS

MULTIDOC

EXTRACTS

GENERATION

FILTERING

DOCEXTRACTS

25

Table of contents

1. Motivation

2. Overview of summarization system.

3. Summarization techniques

4. Building a summarization system.

5. Evaluating summaries

6. The future.

26

Computational Approach

Top-Down: I know what I want!

User needs: only certain types of info

System needs: particular criteria of interest, used to focus search

Bottom-Up: • I’m dead curious:

what’s in the text?

• User needs: anything that’s important

• System needs: generic importance metrics, used to rate content

27

Review of Methods

Text location: title, position Cue phrases Word frequencies Internal text cohesion:

word co-occurrences local salience co-reference of names,

objects lexical similarity semantic rep/graph centrality

Discourse structure centrality

Information extraction templates

Query-driven extraction: query expansion lists co-reference with query

names lexical similarity to query

Bottom-up methods Top-down methods

28

Query-Driven vs. Text-Driven Focus Top-down: Query-driven focus

Criteria of interest encoded as search specs. System uses specs to filter or analyze text portions. Examples: templates with slots with semantic

characteristics; termlists of important terms. Bottom-up: Text-driven focus

Generic importance metrics encoded as strategies. System applies strategies over rep of whole text. Examples: degree of connectedness in semantic

graphs; frequency of occurrence of tokens.

29

Bottom-Up, using Info. Retrieval IR task: Given a query, find the relevant

document(s) from a large set of documents. Summ-IR task: Given a query, find the relevant

passage(s) from a set of passages (i.e., from one or more documents).

• Questions: 1. IR techniques work on large

volumes of data; can they scale down accurately enough?

2. IR works on words; do abstracts require abstract representations?

xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx xxx xxxx xxx xxxx xx xxxxx xxxxx xx xxx xxxxxx xxx

30

Top-Down, using Info. Extraction IE task: Given a template and a text, find all the

information relevant to each slot of the template and fill it in.

Summ-IE task: Given a query, select the best template, fill it in, and generate the contents.

• Questions:1. IE works only for very particular

templates; can it scale up?

2. What about information that doesn’t fit into any template—is this a generic limitation of IE?

xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx x xxxx xxxx xxx xxxx xx xxxxx xxxxx xx xxx xxxxxx xxx

Xxxxx: xxxx Xxx: xxxx Xxx: xx xxx Xx: xxxxx xXxx: xx xxx Xx: x xxx xx Xx: xxx x Xxx: xx Xxx: x

31

NLP/IE:• Approach: try to ‘understand’

text—re-represent content using ‘deeper’ notation; then manipulate that.

• Need: rules for text analysis and manipulation, at all levels.

• Strengths: higher quality; supports abstracting.

• Weaknesses: speed; still needs to scale up to robust open-domain summarization.

IR/Statistics:• Approach: operate at lexical

level—use word frequency, collocation counts, etc.

• Need: large amounts of text.

• Strengths: robust; good for query-oriented summaries.

• Weaknesses: lower quality; inability to manipulate information at abstract levels.

Paradigms: NLP/IE vs. ir/statistics

32

Toward the Final Answer... Problem: What if neither IR-like nor

IE-like methods work?

Solution: semantic analysis of the text (NLP), using adequate knowledge bases that

support inference (AI).

Mrs. Coolidge: “What did the preacher preach about?”

Coolidge: “Sin.”Mrs. Coolidge: “What did he

say?”Coolidge: “He’s against it.”

– sometimes counting and templates are insufficient,

– and then you need to do inference to understand.

Word counting

Inference

33

The Optimal Solution...

Combine strengths of both paradigms…

...use IE/NLP when you have suitable template(s),

...use IR when you don’t…

…but how exactly to do it?

Progress of automatic summarization methods

1950th Position, frequency, cue phrase based (Luhn, 1958; Edmundson, 1969)

1970th Document content (Schank and Abelson, 1977; DeJong, 1979)

1990th Information Extraction based (Hovy and Lin, 1997; Kupiec et al., 1995)

Current semantic analysis, graph theory, QA, machine learning …

35

Overview of Extraction Methods Word frequencies throughout the text Position in the text

lead method; optimal position policy title/heading method

Cue phrases in sentences Cohesion: links among words

word co-occurrence coreference lexical chains

Discourse structure of the text Information Extraction: parsing and analysis

Statistical scoring

Scoring techniquesWord frequencies throughout the

text(Luhn58)Position in the text(Edmundson69)Title Method(Edmundson69)Cue phrases in sentences (Edmundson69)

Luhn 58 Very first work in

automated summarization

Computes measures of significance

Words: Stemming

differ,difference bag of words

words

Wordfrequency

The resolving power of words

(Luhn, 58)

Luhn 58 Sentences:

concentration of high-score words

Cutoff values established in experiments with 100 human subjects

SIGNIFICANT WORDS

ALL WORDS

* * * * 1 2 3 4 5 6 7

SENTENCE

SCORE = 42/7 2.3

Edmundson 69

Cue method: stigma words

(“hardly”, “impossible”) bonus words

(“significant”)

Key method: similar to Luhn

Title method: title + headings

Location method: sentences under

headings sentences near

beginning or end of document and/or paragraphs (also [Baxendale 58])

Edmundson 69

Linear combination of four features:

1C + 2K + 3T + 4L

Manually labelled training corpus

Key not important!0 10 20 30 40 50 60 70 80 90 100 %

RANDOM

KEY

TITLE

CUE

LOCATION

C + K + T + L

C + T + L

1

Lin & Hovy 97

Optimum position policy (OPP)

Measuring yield of each sentence position against keywords (signature words) from Ziff-Davis corpus

Preferred order

[(T) (P2,S1) (P3,S1) (P2,S2) {(P4,S1) (P5,S1) (P3,S2)} {(P1,S1) (P6,S1) (P7,S1) (P1,S3)(P2,S3) …]

Kupiec et al. 95

Extracts of roughly 20% of original text

Feature set: sentence length

|S| > 5

fixed phrases26 manually chosen

paragraphsentence position in

paragraph

thematic wordsbinary: whether

sentence is included in the set of highest scoring sentences

uppercase wordsnot common acronyms

Corpus:188 document +

summary pairs from scientific journals

Kupiec et al. 95

Uses Bayesian classifier:

• Assuming statistical independence:

k

j j

k

j j

kFP

SsPSsFPFFFSsP

1

121

)(

)()|(),...,|(

),(

)()|,...,(),...,|(

,...21

2121

k

kk FFFP

SsPSsFFFPFFFSsP

Centroid (Radev, 2004)

Centroids consist of words which are central not only to one article in a cluster, but to all the articles.

Hypothesize that sentences that contain the words from the centroid are more indicative of the topic of the cluster.

A centroid is a pseudo-document which consists of words which have Count*IDF scores above a predefined threshold in the documents that constitute the cluster. Count: average number of occurrences of a word

across the entire cluster IDF: computed from a large corpus.

Centroid example

45

46

Topic Signature Method (Hovy and Lin, 98) Claim: Can approximate script identification at

lexical level, using automatically acquired ‘word families’.

Idea: Create topic signatures: each concept is defined by frequency distribution of its related words (concepts):

TS = {topic, signature } = {topic, (t1,w1) (t2,w2) ...} restaurant -visit waiter + menu + food + eat...

(inverse of query expansion in IR.)

Signature term extraction

likelihood ratio (Dunning 1993) Hypothesis testing method:

47

48

49

50

Example SignaturesRANKaerospace banking environment telecommunication

1 contract bank epa at&t2 air_force thrift waste network3 aircraft banking environmental fcc4 navy loan water cbs5 army mr. ozone6 space deposit state bell7 missile board incinerator long-distance8 equipment fslic agency telephone9 mcdonnell fed clean telecommunication

10 northrop institution landfill mci11 nasa federal hazardous mr.12 pentagon fdic acid_rain doctrine13 defense volcker standard service14 receive henkel federal news15 boeing banker lake turner16 shuttle khoo garbage station17 airbus asset pollution nbc18 douglas brunei city sprint19 thiokol citicorp law communication20 plane billion site broadcasting21 engine regulator air broadcast22 million national_bankprotection programming23 aerospace greenspan violation television24 corp. financial management abc25 unit vatican reagan rate

The MEAD Summarizer

Extractive summarization as a classification task Approximate by classifying individual sentences

MEAD’s classification function Give each sentence a score based on a linear

combination of features (Position, Centroid, First sentence similarity)

}1,0{: sf

)()()()(Score sFwsCwsPws FCP

The Redundancy Problem

Sentences with duplicate information content Solemn ceremony marks handover. A solemn historic ceremony has marked the resumption of

the exercise of sovereigntyover Hong Kong.

MEAD’s method for combating redundancy• Do until # of sentences = max num for

summary: Foreach sentence si in order of score

if si too similar to sentences, skip

else add si

Redundancy between sentences

Example: similar sentences are both assigned a high score. “The top fine for smoking at work or in public places

would be 200 rand (dlrs 35). ” “The maximum fine for smoking at work or in public

places would be 200 rand (dlrs 35). MMR(maximal margin relevance) algorithm

When selecting a new sentence into the summary, it should be measured its similarity with the extracted sentences. If similarity greater than a threshold, redundant sentence.

Threshold is set to 0.7

Carbonell & Goldstein 98 MMR:

The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in re-ranking retrieved documents and in selecting appropriate passage for text summarization.

1 2\

max[ ( ( , ) (1 ) max ( , ))]i j

def

i i jD R S D S

MMR Arg Sim D Q Sim D D

C = doc collectionQ = user queryR = IR(C,Q,)S = already retrieved documentsSim = similarity metric used

Paice 90

Survey up to 1990 Techniques that

(mostly) failed: Frequency-keyword Title-keyword syntactic criteria indicator phrases

(“The purpose of this article is to review…)

Problems with extracts: lack of balance lack of cohesion

anaphoric reference lexical or definite

reference rhetorical connectives

Linguistic/Semantic Methods

Co-reference /Lexical Chain Rhetorical Analysis Information extraction ……

57

Cohesion-based methods Claim: Important sentences/paragraphs

contain the highest connected entities in more or less elaborate semantic structures.

Classes of approachesword co-occurrences; local salience and grammatical relations;co-reference; lexical similarity (WordNet, lexical chains);combinations of the above.

Barzilay and Elhadad 97

“Dr.Kenny” appears once in both sequences and so does “machine”. But sequence 1 is about the machine, and sequence 2 is about the “doctor”.

Barzilay and Elhadad 97 Chain computing

Procedure of constructing lexical chains:

three types of relations:extra-strong (repetitions)strong (WordNet relations, within 7 sentences)medium-strong (link between synsets is longer than one +

within 3 sentences)

Barzilay and Elhadad 97

Lexical chains [Stairmand 96]

Mr. Kenny is the person that invented the anesthetic machine which uses micro-computers to control the rate at which an anesthetic is pumped into the blood. Such machines are nothing new. But his device uses two micro-computers to achieve much closer monitoring of the pump feeding the anesthetic into the patient.

61

Co-reference/Lexical Chains

Mr. Kenny is the person that invented the anesthetic machine which uses micro-computers to control the rate at which an anesthetic is pumped into the blood. Such machines are nothing new. But his device uses two micro-computers to achieve much closer monitoring of the pump feeding the anesthetic into the patient

Barzilay and Elhadad 97 Scoring chains:

LengthHomogeneity index:

= 1 - # distinct words in chain/length

Score = Length * Homogeneity

Score > Average(scores) + 2 * st.dev(scores).

Barzilay and Elhadad 97 Extracting important sentences

Heuristic 1 For each chain in the summary representation, choose the sentence that contains the first appearance of a chain member in the text.

Heuristic 2 We therefore defined a second heuristic based on the notion of representative words: For each chain in the summary representation, choose the sentence that contains the first appearance of a representative chain member in the text.

Heuristic 3 For each chain, find the text unit where the chain is highly concentrated. Extract the sentence with the first chain appearance in this central unit. Concentration is computed as the number of chain members occurrences in a segment divided by the number of nouns in the segment.

65

Tree-like representation of texts in the style of Rhetorical Structure Theory (Mann and Thompson,88).

Claim: The multi-sentence coherence structure of a text can be constructed, and the ‘centrality’ of the textual units in this structure reflects their importance.

Use the discourse representation in order to determine the most important textual units. Attempts: (Marcu, 97) for English.

Discourse-based method

Rhetorical Structure Theory

Mann & Thompson 88 Rhetoric Relation

Between two non-overlapping text spans Nucleus - Core Idea, Writers Purpose Satellite - Referred in context to nucleus for Justifying,

Evidencing, Contradicting etc Nucleus of a rhetorical relation is comprehensible

independent of the satellite, but not vice versa All rhetoric relations are not nucleus-satellite

relations. Contrast is a multinuclear relationship

Rhetorical Structure Theory

RST ParsingBreaks into elementary units (EDU)Uses cue phrases(discourse markers) and

notion of semantic similarity in order to construct rhetorical relations

Rhetorical relations can be assembled into rhetorical structure trees (RS-trees) by recursively applying individual relations across the whole text

68

Rhetorical parsing (Marcu,97)

[With its distant orbit {– 50 percent farther from the sun than Earth –} and slim atmospheric blanket,1] [Mars experiences frigid weather conditions.2] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles.3] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion,4] [but any liquid water formed that way would evaporate almost instantly5] [because of the low atmospheric pressure.6]

[Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop,7] [most Martian weather involves blowing dust or carbon dioxide.8] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap.9] [Yet even on the summer pole, {where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water.10]

69

Rhetorical parsing

Use discourse markers to hypothesize rhetorical relations rhet_rel(CONTRAST, 4, 5) rhet_rel(CONTRAT, 4, 6) rhet_rel(EXAMPLE, 9, [7,8]) rhet_rel(EXAMPLE, 10, [7,8])

Use semantic similarity to hypothesize rhetorical relations if similar(u1,u2) then

rhet_rel(ELABORATION, u2, u1) rhet_rel(BACKGROUND, u1,u2)else

rhet_rel(JOIN, u1, u2)

rhet_rel(JOIN, 3, [1,2]) rhet_rel(ELABORATION, [4,6], [1,2])

Use the hypotheses in order to derive a valid discourse representation of the original text.

2Elaboration

2Elaboration

8Example

2BackgroundJustification

3Elaboration

8Concession

10Antithesis

Mars experiences

frigid weather

conditions(2)

Surface temperatures typically average

about -60 degrees

Celsius (-76 degrees

Fahrenheit) at the

equator and can dip to -

123 degrees C near the

poles(3)

4 5Contrast

Although the atmosphere

holds a small

amount of water, and water-ice

clouds sometimes develop,

(7)

Most Martian weather involves

blowing dust and carbon monoxide.

(8)

Each winter, for example, a blizzard of

frozen carbon dioxide

rages over one pole, and a few meters of

this dry-ice snow

accumulate as

previously frozen carbon dioxide

evaporates from the opposite

polar cap.(9)

Yet even on the summer pole, where

the sun remains in the sky all day long,

temperatures never warm

enough to melt frozen

water.(10)

With its distant orbit (50 percent farther from the sun than Earth) and

slim atmospheric

blanket,(1)

Only the midday sun at tropical latitudes is

warm enough to

thaw ice on occasion,

(4)

5Evidence

Cause

but any liquid water formed in this way would

evaporate almost

instantly(5)

because of the low

atmospheric pressure

(6)

71

Marcu 97

5Evidence

Cause

5 6

4

4 5Contrast

3

3Elaboration

1 2

2BackgroundJustification

2Elaboration

7 8

8Concession

9 10

10Antithesis

8Example

2Elaboration

Summarization = selection of the most important units

2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6

72

Information extraction Method Idea: content selection using templates

Predefine a template, whose slots specify what is of interest.

Use a canonical IE system to extract from a (set of) document(s) the relevant information; fill the template.

Generate the content of the template as the summary.

Previous IE work: FRUMP (DeJong, 78): ‘sketchy scripts’ of terrorism,

natural disasters, political visits... (Mauldin, 91): templates for conceptual IR. (Rau and Jacobs, 91): templates for business. (McKeown and Radev, 95): templates for news.

73

Information Extraction method

Example template:

MESSAGE:ID TSL-COL-0001SECSOURCE:SOURCE ReutersSECSOURCE:DATE 26 Feb 93

Early afternoonINCIDENT:DATE 26 Feb 93INCIDENT:LOCATION World Trade CenterINCIDENT:TYPE BombingHUM TGT:NUMBER AT LEAST 5

74

Full Generation Example

Challenge: Pack content densely!

Example (McKeown and Radev, 95): Traverse templates and assign values to

‘realization switches’ that control local choices such as tense and voice.

Map modified templates into a representation of Functional Descriptions (input representation to Columbia’s NL generation system FUF).

FUF maps Functional Descriptions into English.

75

Generation Example (McKeown and Radev, 95)

NICOSIA, Cyprus (AP) – Two bombs exploded near government ministries in Baghdad, but there was no immediate word of any casualties, Iraqi dissidents reported Friday. There was no independentconfirmation of the claims by the Iraqi National Congress. Iraq’sstate-controlled media have not mentioned any bombings.

Multiple sources and disagreement

Explicit mentioning of “no information”.

Graph-based Methods

Degree Centrality LexRank Continuous LexRank

Degree Centrality

Problem Formulation Represent each sentence by a vectorDenote each sentence as the node of a graph Cosine similarity determines the edges

between nodes

Degree Centrality

Since we are interested in significant similarities, we can eliminate some low values in this matrix by defining a threshold.

Degree Centrality

Compute the degree of each sentence

Pick the nodes (sentences) with high degrees

Degree Centrality

Disadvantage in Degree Centrality approach

LexRank Centrality vector p which will give a

lexrank of each sentence (similar to page rank) defined by :

Perron-Frobenius Theorem

An irreducible and aperiodic Markov chain is guaranteed to converge to a stationary distribution

What Should B Satisfy?

Stochastic Matrix and Markov Chain property. Irreducible: any state is reachable from any

other state.Aperiodic:

If a Markov chain has reducible or periodic components, a random walker may get stuck in these components and never visit the other parts of the graph.

Reducibility

Aperiodicity

LexRank

B is a stochastic matrix Is it an irreducible and aperiodic matrix? Dampness (Page et al. 1998)

Matrix Form of p for Dampening

Solve for p using Power method

Continuous LexRank

Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword

Extraction

Author: Xiaojun Wan, Jianwu Yang, Jianguo Xiao

Summarization&keyphrase

句子 - 句子关系图

词 - 词关系图

3 kinds of relations

SS-Relation: U

WW-Relation: V

SW-Relation: W

Build Sentence-Sentence Graph

Sentence relations [ ]ij n nU U

1 2, ,... i ns wt wt wt

*

1 log( / )i i

i i

i t t

t t

wt tf isf

isf N n

( , ),

0,

ij i j

ij

U sim s s i j

U i j

Build Word-Word Graph

Word relations

Word similariy computation:Based on dictionary (WordNet)Based on corpus (mutual information)

[ ]ij n nV V

( , ),

0,

ij i j

ij

V sim t t i j

V i j

Build Sentence-Word Graph

Relation between sentences and words

Similarity computation:{ |1 }iS s i m { |1 }jT t j n

( , ) j j

i

t t

i jt t

t S

tf isfaff s t

tf isf

[ ]ij m nW W ( , )ij i jW aff s t

Document Model Assumption 1 If a sentence is important, its closely connected

sentences are also important; If a word is important, its closely related words are also important.

Assumption 2 More important words are included in a sentence,

more important the sentence is.

More frequent a word occurs in important sentences, more important the word is.

Reinforcement Algorithm

Assumptions:

( ) ( )i ji jju s U u s( ) ( )j ij ii

v t V v t( ) ( )i ji jj

u s W v t( ) ( )j ij ii

v t W u s

matrix form:

1 1

( ) ( ) ( )m n

i ji j ji jj j

u s U u s W v t

1 1

( ) ( ) ( )n m

j ij i ij ii i

v t V v t W u s

T

T T

u aU u W v

v aV v W u

Then we can simultaneously rank sentences (u) and words (v).

98

Table of contents

1. Motivation

2. Definition, genres and types of summaries.

3. Summarization techniques

4. Building a summarization system.

5. Evaluating summaries

6. The future.

Document Understanding Conference

NIST (National Institute of Standards and Technology )

Summarization tasks:2001-2002: 100 words, query-independent

single-doc. and multi-doc. summ.2003: query-dependent summ.2004: multi-lingual summ.2005-2007: Query focused multi-document

summ.

Query focusedMulti-document Summarization

Task description System Implementation

Feature driven system design Improvement based machine learningPost-processing

Task Description

Combination of QA and Summarization Input: topic query and 25 related documentsOutput: a summary no more than 250 words

News documents from Associated Press, New York Times and Xinhua News Agency, related to recently important events.

Evaluation using ROUGE 、 PyramidEach topic has four human summaries

DUC Corpus example(1)

Each topic is composed of a topic query and 25 related documents Query sample :

<topic>

<num> D0601A </num>

<title> Native American Reservation System - pros and cons </title>

<narr>

Discuss conditions on American Indian reservations or among Native American communities. Include the benefits and drawbacks of the reservation system. Include legal privileges and problems.

</narr>

</topic>

DUC corpus example(2)

<DOC>

<DOCNO> APW20000416.0024 </DOCNO>

<DOCTYPE> NEWS STORY </DOCTYPE>

<DATE_TIME> 2000-04-16 12:56 </DATE_TIME>

<HEADLINE> Clinton To Visit Navajo Community </HEADLINE>

By CHRIS ROBERTS, Associated Press Writer

<TEXT>

<P>ALBUQUERQUE, N.M. (AP) -- At Mesa Elementary School in far northwest New Mexico, Navajo children line up to use the few computers connected to the Internet. Their time online must be short for everyone to get a chance.

</P>

</P>

… …

… …

<P>

Navajo Nation: http://www.navajo.org/nnhomepg.html

</P>

</TEXT>

</DOC>

Document sample :

Summarization process

PreprocessingParse the documents and extract the required content

Feature design and Sentence ScoringExtract the features (position, length) for each

sentenceScore each sentence based on the features

PostprocessingSentence simplification, redundancy removal,

reordering

Preprocessing

Input : documents with XML format1) Parse the XML format and extract the text in

the documents

2) Rule based sentence segmentation

3) POS tagging, syntactic parsing

Output : Segmented sentences

Preprocessing Sample

<?xml version='1.0' encoding='UTF-8'?>

<!DOCTYPE DOCSENT SYSTEM "/home/lsjian/mead/dtd/docsent.dtd">

<DOCSENT DID='APW20000416.0024' DOCNO='APW20000416.0024' LANG='ENG' CORR-DOC='FT.c'>

<BODY>

<HEADLINE>

<S PAR='1' RSNT='1' SNO='1'> Clinton To Visit Navajo Community </S>

</HEADLINE>

<TEXT>

<S PAR='2' RSNT='1' SNO='2'> ALBUQUERQUE, N.M. </S>

<S PAR='2' RSNT='2' SNO='3'> At Mesa Elementary School in far northwest New Mexico, Navajo children line up to use the few computers connected to the Internet. </S>

… …

<S PAR='2' RSNT='27' SNO='28'> Clinton's plans Monday including speaking at Shiprock Boys/Girls Club, then joining an evening Webcast at Dine Tribal College in Shiprock that will involve high school student online at Lake Valley Navajo School, about 55 miles away. ------ On the Net: Navajo Nation: http://www.navajo.org/nnhomepg.html </S>

</TEXT></BODY></DOCSENT>

Sentence Scoring

Feature designLexical, syntactic, semanticQuery dependent and query independent

Linear combination of featuresA linear function of weighted features

S= a1f1+a2f2+….

Human weights by experience, trial and error

Features : query-dependent

Describe the overlap between each sentence and the topic query (QA)

featuresFrequency based: word overlap between

sentence and queryNamed entity based: the overlap of named

entities between sentence and queryWordNet based: Computing the similarity

between sentence and query based on WordNet

features : query-independent

Describe the importance of each sentence in the documents (Summarization)

featuresCentroid: sum of word centroid Tf*isf: sum of word tf*isfNamed entities: number of named entitiesStop words: number of stop wordsposition: Length: word number in the sentence

Weights Learning with a regression model

Defects of human weightsSubjectivity

Solution: learning weightsThe scoring process can be seen as a

mapping between a vector and a score.A regression model is used to train the

feature weights

Sentence Scoring based on regression model

ProcessTraining corpus is used to train

the regression function f0

Rank the test sentences with

Keys:Model selection: Support Vector RegressionAcquisition of Training corpus: use human summary

{( ( ), ( )) | }score s V s s D

0 : ( ) ( )f V s score s

* *0ˆ ( ) ( ( ))score s f V s

Postprocessing

Simple processingExtract the highest scored sentences until the

length reaches the requirement.

Problems1: redundancy 2: meaningless words in sentences (rules

based)3: coherence

Sentence simplification

Delete meaningless words in sentences News specific noisy words Content irrelevant words

Rule based method The beginning of news: e.g.,“ALBUQUERQUE, N.M. (AP) ; The initial words in the sentence: such as “and” ,” als

o” ,” besides ,”,” though ,”,” in addition” ,” somebody said” ,“ somebody says”; ;

“somebody ( 代词 )/It is said/reported/noticed/thought that” ; The parenthesized content in captalized letters …

Sentence ordering

Sentence ordering by score: no logic in the content

Temporal based sentence ordering Acquire the time stamp from the original texts

Order sentence according to the publish time of documents; For the sentences in the same document, order them by their occurrence in the document

115

Table of contents

1. Motivation.

2. Genres and types of summaries.

3. Approaches and paradigms.

4. Summarization techniques

5. Evaluating summaries.

6. The future.

116

How can You Evaluate a Summary? When you generate a summary…

..…:1. the gold standard summaries (human),2. choose a granularity (clause; sentence;

paragraph),3. create a similarity measure for that granularity

(word overlap; multi-word overlap, perfect match),

4. measure the similarity of each unit in the new to the most similar unit(s) in the gold standard,

5. measure Recall and Precision.

e.g., (Kupiec et al., 95).

Human summaries

Two persons A and B extract sentences from documents as summaries

Their agreement?Kappa value

≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor

0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect

117

Kappa

P(A): the observed agreement among raters P(E): the expectation probability of chance

agreement

)(1

)()(

EP

EPAP

B

Yes No

AYes 20 5

No 10 15

119

P(A) = (20 + 15) / 50 = 0.70P(E) = 0.5*0.6+ 0.5* 0.4 = 0.3 + 0.2 = 0.5

)(1

)()(

EP

EPAP

Evaluation Manual

Linguistic Quality(readability)GrammaticalityNon-redundancyReferential clarityFocusStructure

Five-point scale (1 very poor, 5 very good) Pyramid: SCU

Automatic Rouge

ROUGE 2ROUGE SU4

4

3

21

– Responsiveness(content)

– BE(basic element)

Pyramid(1)

The pyramid method is designed to address the observation: summaries from different humans always have partly overlapping content.

The pyramid method includes a manual annotation method to represent Summary Content Units (SCUs) and to quantify the proportion of model summaries that express this content.

All SCUs have a weight representing the number of models they occur in, thus from 1 to maxn, where maxn is the total number of models There are very few SCUs expressed in all models (i.e.,

weight=maxn), and increasingly many SCUs at each lower weight, with the most SCUs at weight=1.

1

2

34

SCU example

122

Pyramid(2)

The approach involves two phases of manual annotation:pyramid constructionannotation against the pyramid to determine

which SCUs in the pyramid have been expressed in the peer summary.

The total weight is

ROUGE basics

Rouge(Recall-Oriented Understudy for Gisting Evaluation) Recall-oriented, within-sentence word overlap with model(s)

Models - no theoretical limit to number compared system output to 4 models compared manual summaries to 3 models

Using n-gram Correlate reasonably with human coverage judgements Not address summary discourse characteristics, and suffer from

lack of text cohesion or coherence ROUGE v1.2.1 measures

ROUGE-1,2,3,4: N-gram matching where N = 1,2,3,4 ROUGE-LCS: Longest common substring

ROUGE: ROUGE: RRecall-ecall-OOriented riented UUnderstudy for nderstudy for GGisting isting EEvaluationvaluation

Rouge – Ngram co-occurrence metrics measuring content overlap

Counts of n-gram overlaps between candidate and model

summaries

Total n-grams in summary model

ROUGE

Where Nn represents the set of all n-grams and i is one member from Nn. Xn(i) is the number of times the n-gram i occurred in the summary and Mn(i,j) is the number of times the n-gram i ocurred in the j-th model reference(human) summary. There are totally h human summaries.

1

1

min( ( ), ( , ))( )

( , )n

n

h

n nj i Nn h

nj i N

X i M i jR X

M i j

Example

peer: A B C D E F G A B H1: A B G A D E C D (7) H2: A C E F G A D (6)

How to compute the ROUGE-2 score?

128

129

Table of contents

1. Motivation.

2. Genres and types of summaries.

3. Approaches and paradigms.

4. Summarization methods .

5. Evaluating summaries.

6. The future.

In the last decade there has been a surge of interest in automatic summarizing. … There has been some progress, but there is much to do.

Karen Sparck Jones

Karen Sparck Jones, Karen Sparck Jones, automatic summarising: the state of the artautomatic summarising: the state of the art, , Information Processing & Management, 43(6): 1449--1481, 2007.Information Processing & Management, 43(6): 1449--1481, 2007.

131

The Future (1) — There’s much to do!

Data preparation: Collect large sets of texts with abstracts, all genres. Build large corpora of <Text, Abstract, Extract> tuples. Investigate relationships between extracts and

abstracts (using <Extract, Abstract> tuples).

Topic Identification: Develop new identification methods (discourse, etc.). Develop heuristics for method combination (train

heuristics on <Text, Extract> tuples).

132

The Future (2)

Concept Interpretation (Fusion): Investigate types of fusion (semantic, evaluative…). Create large collections of fusion knowledge/rules. Study incorporation of User’s knowledge in interpretation.

Generation: Develop sentence generation rules (using <Extract,

Abstract> pairs).

Evaluation: Develop better automatic evaluation metrics.

Reference

Karen Sparck Jones, automatic usmmarising: the state of the art, Information Processing & Management, 43(6): 1449--1481, 2007.

Lin C. Y. , Hovy E., Manual and automatic evaluation of summaries, Proceedings of the Workshop on Automatic Summarization (including DUC 2002), Philadelphia, July 2002, pp. 45-51. Association for Computational Linguistics.

http://haydn.isi.edu/ROUGE/ Passonneau, R.J. et. al (2005), applying the Pyramid method in DUC 2005, DUC 2005, 25-32. Jaime Carbonell, Jade Goldstein, "The use of MMR, diversity-based reranking for reordering

documents and producing summaries," in Proceedings of the 21st ACM-SIGIR International Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.

Radev, D.R., Jing, H. and Budzikowska, M. (2000) ‘Centroid-based summarisation of mul-tiple documents: sentence extraction, utility-based evaluation, and user studies’, ANLP/NAACL-00, 2000, 21-30.

J.M. Conroy, et. al., Topic-focused multi-document summarization using an approximate oracle score, ACL 2006.

Paice, C.D. 1990. Constructing Literature Abstracts by Computer: Techniques and Prospects. Information Processing and Management 26(1): 171–186.

Barzilay, R. and M. Elhadad. 1997. Using Lexical Chains for Text Summarization. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the ACL/EACL Conference, 10–17. Madrid, Spain.

Edmundson, H.P. 1968. New Methods in Automatic Extraction. Journal of the ACM 16(2), 264–285.

Hovy, E.H. and Lin, C-Y. 1998. Automated Text Summarization in SUMMARIST. In M. Maybury and I. Mani (eds), Intelligent Scalable Summarization Text Summarization. Forthcoming.

Kupiec, J., J. Pedersen, and F. Chen. 1995. A Trainable Document Summarizer. In Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 68–73. Seattle, WA.

Luhn, H.P. 1959. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 159–165.

Lin, C-Y. 1995. Topic Identification by Concept Generalization. In Proceedings of the Thirty-third Conference of the Association of Computational Linguistics (ACL-95), 308–310. Boston, MA.

Mann, W.C. and S.A. Thompson. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text 8(3), 243–281. Also available as USC/Information Sciences Institute Research Report RR-87-190

134

Marcu, D. 1998. Improving Summarization Through Rhetorical Parsing Tuning. Proceedings of the Workshop on Very Large Corpora. Montreal, Canada.

McKeown, K.R. and D.R. Radev. 1995. Generating Summaries of Multiple News Articles. In Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 74–82. Seattle, WA.

Salton, G., J. Allen, C. Buckley, and A. Singhal. 1994. Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts. Science 264: 1421–1426.

Schank, R.C. and R.P. Abelson. 1977. Scripts, Plans, Goals, and Understanding. Hillsdale, NJ: Lawrence Erlbaum Associates.

DeJong, G. 1978. Fast Skimming of News Stories: The FRUMP System. Ph.D. diss. Yale University.

135

Homework !

Thanks & QA