1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.

1

I256: Applied Natural Language Processing

Marti HearstOct 2, 2006

2From lecture notes by Nachum Dershowitz & Dan Cohen

ContentsContents

Introduction and ApplicationsTypes of summarization tasksBasic paradigmsSingle document summarizationEvaluation methods


Introduction

The problem – Information overload– 4 Billion URLs indexed by Google– 200 TB of data on the Web [Lyman and Varian 03]– Information is created every day in enormous amounts

One solution – summarization Abstracts promote current awareness

save reading timefacilitate selectionfacilitate literature searchesaid in the preparation of reviews

But what is an abstract??


abstract: brief but accurate representation of the contents of a document

goal: take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs.

compression:the amount of text to present or the length of the summary to the length of the source.

Introduction


The problem has been addressed since the 50’s[Luhn 58]Numerous methods are currently being suggestedMost methods still rely on 50’s-70’s algorithmsProblem is still hard yet there are some applications:

MS Word, www.newsinessence.com by Drago Radev’s research group

HistoryHistory


MSWord AutoSummarizeMSWord AutoSummarize


ApplicationsApplications

Abstracts for Scientific and other articlesNews summarization (mostly multiple document summarization)Classification of articles and other written dataWeb pages for search enginesWeb access from PDAs, Cell phonesQuestion answering and data gathering


Types of Summaries Types of Summaries

Indicative vs InformativeInformative: a substitute for the entire documentIndicative: give an idea of what is there

BackgroundDoes the reader have the needed prior knowledge?Expert reader vs Novice reader

Query based or GeneralQuery based – a form is being filled, answers should be answeredGeneral – General purpose summarization


Types ofTypes of Summaries (input)Summaries (input)

Single document vs multiple documents

Domain specific (chemistry) or general

Genre specific (newspaper items) of general


Types of Summaries (output)Types of Summaries (output)

extract vs abstractExtracts – representative paragraphs/sentences/ phrases/words, fragments of the original textAbstracts – a concise summary of the central subjects in the document.Research shows that sometimes readers prefer Extracts!

language chosen for summarizationformat of the resulting summary

(table/paragraph/key words)


MethodsMethodsQuantitative heuristics, manually scoredMachine-learning based statistical scoring methodsHigher semantic/syntactic structures

Network (graph) based methodsOther methods (rhetorical analysis, lexical chains, co-reference chains)

AI methods


Quantitative HeuristicsQuantitative Heuristics

General method: score each entity (sentence, word) ; combine scores; choose best sentence(s)

Scoring techniques:Word frequencies throughout the text (Luhn 58)Position in the text (Edmunson 69, Lin&Hovy 97)Title method (Edmunson 69)Cue phrases in sentences (Edmunson 69)


Using Word Frequencies (Luhn 58)

Very first work in automated summarizationAssumptions:

Frequent words indicate the topicFrequent means with reference to the corpus frequencyClusters of frequent words indicate summarizing sentence

Stemming based on similar prefix charactersVery common words and very rare words are ignored

15

Ranked Word Frequency

Zipf’s curve


Find consecutive sequences of high-weight keywords

Allow a certain number of gaps of low-weight terms

Sentences with highest sum of cluster weights are chosen

Word frequencies (Luhn 58)


Claim : Important sentences occur in specific positions“lead-based” summary inverse of position in document works well for the “news”Important information occurs in specific sections of the document (introduction/conclusion)

Position in the text Position in the text (Edmunson 69)(Edmunson 69)Position in the text Position in the text (Edmunson 69)(Edmunson 69)


Claim : title of document indicates its content Unless editors are being cuteNot true for novels usuallyWhat about blogs …?

words in title help find relevant contentcreate a list of title words, remove “stop words”Use those as keywords in order to find important sentences (for example with Luhn’s methods)

Title method Title method (Edmunson 69)(Edmunson 69)Title method Title method (Edmunson 69)(Edmunson 69)


Cue phrases method method (Edmunson 69)(Edmunson 69)Cue phrases method method (Edmunson 69)(Edmunson 69)

Claim : Important sentences contain cue words/indicative phrases

“The main aim of the present paper is to describe…” (IND)“The purpose of this article is to review…” (IND)“In this report, we outline…” (IND)“Our investigation has shown that…” (INF)

Some words are considered bonus others stigmabonus: comparatives, superlatives, conclusive expressions, etc.stigma: negatives, pronouns, etc.

Claim : Important sentences contain cue words/indicative phrases

“The main aim of the present paper is to describe…” (IND)“The purpose of this article is to review…” (IND)“In this report, we outline…” (IND)“Our investigation has shown that…” (INF)

Some words are considered bonus others stigmabonus: comparatives, superlatives, conclusive expressions, etc.stigma: negatives, pronouns, etc.


Linear contribution of 4 features

title, cue, keyword, positionthe weights are adjusted using training data with any minimization technique

Evaluated on a corpus of 200 chemistry articlesLength ranged from 100 to 3900 wordsJudges were told to extract 25% of the sentences, to maximize coherence, minimize redundancy.

Features– Position (sensitive to types of headings for sections)– cue– title– keyword

Best results obtained with:– cue + title + position

)(.)(.)(.)(.)( SPositionSKeywordSCueSTitleSWeight

Feature combination Feature combination ((Edmundson ’69)Edmundson ’69)Feature combination Feature combination ((Edmundson ’69)Edmundson ’69)


Statistical learning methodFeature set

sentence length– |S| > 5

fixed phrases– 26 manually chosen

paragraph– sentence position in

paragraph

thematic words– binary: whether

sentence is included in manual extract

uppercase words– not common acronyms

Corpus– 188 document +

summary pairs from scientific journals

Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)


Uses Bayesian classifier:

Assuming statistical independence:

k

j j

k

j j

kFP

SsPSsFPFFFSsP

1

121

)(

)()|(),...,|(

),(

)()|,...,(),...,|(

,...21

2121

k

kk FFFP

SsPSsFFFPFFFSsP



Each Probability is calculated empirically from a corpusHigher probability sentences are chosed to be in the summaryPerformance:

For 25% summaries, 84% precision



When a manual summary is available: 1. choose a granularity (clause; sentence; paragraph),

2. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match),

3. measure the similarity of each unit in the new to the most similar unit(s)

4. measure Recall and Precision.

Otherwise1. Intrinsic –how good is the summary as a summary?2. Extrinsic – how well does the summary help the user?

Evaluation methodsEvaluation methodsEvaluation methodsEvaluation methods


Intrinsic measures (glass-box): how good is the summary as a summary?

Problem: how do you measure the goodness of a summary?Studies: compare to ideal (Edmundson, 69; Kupiec et al., 95; Salton et al., 97; Marcu, 97) or supply criteria—fluency, informativeness, coverage, etc. (Brandow et al., 95).

Summary evaluated on its own or comparing it with the sourceIs the text cohesive and coherent?Does it contain the main topics of the document? Are important topics omitted?

Intrinsic measuresIntrinsic measuresIntrinsic measuresIntrinsic measures


(Black box): how well does the summary help a user with a task?

Problem: does summary quality correlate with performance?

Studies: GMAT tests (Morris et al., 92); news analysis (Miike et al. 94); IR (Mani and Bloedorn, 97); text categorization (SUMMAC 98; Sundheim, 98).

Evaluation in an specific task Can the summary be used instead of the document?Can the document be classified by reading the summary?Can we answer questions by reading the summary?

Extrinsic measuresExtrinsic measuresExtrinsic measuresExtrinsic measures

27

The Document Understanding Conference (DUC)

This is really the Text Summarization CompetitionStarted in 2001

Task and Evaluation (for 2001-2004):Various target sizes were used (10-400 words)Both single and multiple-document summaries assessedSummaries were manually judged for both content and readability. Each peer (human or automatic) summary was compared against a single model summary

– using SEE (http://www.isi.edu/ cyl/SEE/) – estimates the percentage of information in the model thatwas

covered in the peer. – Also used ROUGE (Lin ’04) in 2004

Recall-Oriented Understudy for Gisting Evaluation Uses counts of n-gram overlap between candidate and

gold-standard summary, assumes fixed-length summaries

28


Made a big change in 2005Extrinsic evaluation proposed but rejected (write a natural disaster summary)Instead: a complex question-focused summarization task that required summarizers to piece together information from multiple documents to answer a question or set of questions as posed in a DUC topic.Also indicated a desired granularity of information

29


Evaluation metrics for new task:GrammaticalityNon-redundancyReferential clarityFocusStructure and CoherenceResponsiveness (content-based evaluation)

This was a difficult task to do well in.

30

Let’s make a summarizer!

Each person (or pair) write code for one small part of the problem, using Kupiec et al’s method.We’ll combine the parts in class.

31

Next Time

More on Bayesian classificationOther summarization approaches (Marcu paper)Multi-document summarization (Goldstein et al. paper)In-class summarizer!

1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.

Documents

Transcript of 1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.