1 I256: Applied Natural Language Processing Marti Hearst Nov 15, 2006.
1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of 1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.
2From lecture notes by Nachum Dershowitz & Dan Cohen
ContentsContents
Introduction and ApplicationsTypes of summarization tasksBasic paradigmsSingle document summarizationEvaluation methods
3From lecture notes by Nachum Dershowitz & Dan Cohen
Introduction
The problem – Information overload– 4 Billion URLs indexed by Google– 200 TB of data on the Web [Lyman and Varian 03]– Information is created every day in enormous amounts
One solution – summarization Abstracts promote current awareness
save reading timefacilitate selectionfacilitate literature searchesaid in the preparation of reviews
But what is an abstract??
4From lecture notes by Nachum Dershowitz & Dan Cohen
abstract: brief but accurate representation of the contents of a document
goal: take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs.
compression:the amount of text to present or the length of the summary to the length of the source.
Introduction
5From lecture notes by Nachum Dershowitz & Dan Cohen
The problem has been addressed since the 50’s[Luhn 58]Numerous methods are currently being suggestedMost methods still rely on 50’s-70’s algorithmsProblem is still hard yet there are some applications:
MS Word, www.newsinessence.com by Drago Radev’s research group
HistoryHistory
8From lecture notes by Nachum Dershowitz & Dan Cohen
ApplicationsApplications
Abstracts for Scientific and other articlesNews summarization (mostly multiple document summarization)Classification of articles and other written dataWeb pages for search enginesWeb access from PDAs, Cell phonesQuestion answering and data gathering
9From lecture notes by Nachum Dershowitz & Dan Cohen
Types of Summaries Types of Summaries
Indicative vs InformativeInformative: a substitute for the entire documentIndicative: give an idea of what is there
BackgroundDoes the reader have the needed prior knowledge?Expert reader vs Novice reader
Query based or GeneralQuery based – a form is being filled, answers should be answeredGeneral – General purpose summarization
10From lecture notes by Nachum Dershowitz & Dan Cohen
Types ofTypes of Summaries (input)Summaries (input)
Single document vs multiple documents
Domain specific (chemistry) or general
Genre specific (newspaper items) of general
11From lecture notes by Nachum Dershowitz & Dan Cohen
Types of Summaries (output)Types of Summaries (output)
extract vs abstractExtracts – representative paragraphs/sentences/ phrases/words, fragments of the original textAbstracts – a concise summary of the central subjects in the document.Research shows that sometimes readers prefer Extracts!
language chosen for summarizationformat of the resulting summary
(table/paragraph/key words)
12From lecture notes by Nachum Dershowitz & Dan Cohen
MethodsMethodsQuantitative heuristics, manually scoredMachine-learning based statistical scoring methodsHigher semantic/syntactic structures
Network (graph) based methodsOther methods (rhetorical analysis, lexical chains, co-reference chains)
AI methods
13From lecture notes by Nachum Dershowitz & Dan Cohen
Quantitative HeuristicsQuantitative Heuristics
General method: score each entity (sentence, word) ; combine scores; choose best sentence(s)
Scoring techniques:Word frequencies throughout the text (Luhn 58)Position in the text (Edmunson 69, Lin&Hovy 97)Title method (Edmunson 69)Cue phrases in sentences (Edmunson 69)
14From lecture notes by Nachum Dershowitz & Dan Cohen
Using Word Frequencies (Luhn 58)
Very first work in automated summarizationAssumptions:
Frequent words indicate the topicFrequent means with reference to the corpus frequencyClusters of frequent words indicate summarizing sentence
Stemming based on similar prefix charactersVery common words and very rare words are ignored
16From lecture notes by Nachum Dershowitz & Dan Cohen
Find consecutive sequences of high-weight keywords
Allow a certain number of gaps of low-weight terms
Sentences with highest sum of cluster weights are chosen
Word frequencies (Luhn 58)
17From lecture notes by Nachum Dershowitz & Dan Cohen
Claim : Important sentences occur in specific positions“lead-based” summary inverse of position in document works well for the “news”Important information occurs in specific sections of the document (introduction/conclusion)
Position in the text Position in the text (Edmunson 69)(Edmunson 69)Position in the text Position in the text (Edmunson 69)(Edmunson 69)
18From lecture notes by Nachum Dershowitz & Dan Cohen
Claim : title of document indicates its content Unless editors are being cuteNot true for novels usuallyWhat about blogs …?
words in title help find relevant contentcreate a list of title words, remove “stop words”Use those as keywords in order to find important sentences (for example with Luhn’s methods)
Title method Title method (Edmunson 69)(Edmunson 69)Title method Title method (Edmunson 69)(Edmunson 69)
19From lecture notes by Nachum Dershowitz & Dan Cohen
Cue phrases method method (Edmunson 69)(Edmunson 69)Cue phrases method method (Edmunson 69)(Edmunson 69)
Claim : Important sentences contain cue words/indicative phrases
“The main aim of the present paper is to describe…” (IND)“The purpose of this article is to review…” (IND)“In this report, we outline…” (IND)“Our investigation has shown that…” (INF)
Some words are considered bonus others stigmabonus: comparatives, superlatives, conclusive expressions, etc.stigma: negatives, pronouns, etc.
Claim : Important sentences contain cue words/indicative phrases
“The main aim of the present paper is to describe…” (IND)“The purpose of this article is to review…” (IND)“In this report, we outline…” (IND)“Our investigation has shown that…” (INF)
Some words are considered bonus others stigmabonus: comparatives, superlatives, conclusive expressions, etc.stigma: negatives, pronouns, etc.
20From lecture notes by Nachum Dershowitz & Dan Cohen
Linear contribution of 4 features
title, cue, keyword, positionthe weights are adjusted using training data with any minimization technique
Evaluated on a corpus of 200 chemistry articlesLength ranged from 100 to 3900 wordsJudges were told to extract 25% of the sentences, to maximize coherence, minimize redundancy.
Features– Position (sensitive to types of headings for sections)– cue– title– keyword
Best results obtained with:– cue + title + position
)(.)(.)(.)(.)( SPositionSKeywordSCueSTitleSWeight
Feature combination Feature combination ((Edmundson ’69)Edmundson ’69)Feature combination Feature combination ((Edmundson ’69)Edmundson ’69)
21From lecture notes by Nachum Dershowitz & Dan Cohen
Statistical learning methodFeature set
sentence length– |S| > 5
fixed phrases– 26 manually chosen
paragraph– sentence position in
paragraph
thematic words– binary: whether
sentence is included in manual extract
uppercase words– not common acronyms
Corpus– 188 document +
summary pairs from scientific journals
Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)
22From lecture notes by Nachum Dershowitz & Dan Cohen
Uses Bayesian classifier:
Assuming statistical independence:
k
j j
k
j j
kFP
SsPSsFPFFFSsP
1
121
)(
)()|(),...,|(
),(
)()|,...,(),...,|(
,...21
2121
k
kk FFFP
SsPSsFFFPFFFSsP
Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)
23From lecture notes by Nachum Dershowitz & Dan Cohen
Each Probability is calculated empirically from a corpusHigher probability sentences are chosed to be in the summaryPerformance:
For 25% summaries, 84% precision
Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)
24From lecture notes by Nachum Dershowitz & Dan Cohen
When a manual summary is available: 1. choose a granularity (clause; sentence; paragraph),
2. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match),
3. measure the similarity of each unit in the new to the most similar unit(s)
4. measure Recall and Precision.
Otherwise1. Intrinsic –how good is the summary as a summary?2. Extrinsic – how well does the summary help the user?
Evaluation methodsEvaluation methodsEvaluation methodsEvaluation methods
25From lecture notes by Nachum Dershowitz & Dan Cohen
Intrinsic measures (glass-box): how good is the summary as a summary?
Problem: how do you measure the goodness of a summary?Studies: compare to ideal (Edmundson, 69; Kupiec et al., 95; Salton et al., 97; Marcu, 97) or supply criteria—fluency, informativeness, coverage, etc. (Brandow et al., 95).
Summary evaluated on its own or comparing it with the sourceIs the text cohesive and coherent?Does it contain the main topics of the document? Are important topics omitted?
Intrinsic measuresIntrinsic measuresIntrinsic measuresIntrinsic measures
26From lecture notes by Nachum Dershowitz & Dan Cohen
(Black box): how well does the summary help a user with a task?
Problem: does summary quality correlate with performance?
Studies: GMAT tests (Morris et al., 92); news analysis (Miike et al. 94); IR (Mani and Bloedorn, 97); text categorization (SUMMAC 98; Sundheim, 98).
Evaluation in an specific task Can the summary be used instead of the document?Can the document be classified by reading the summary?Can we answer questions by reading the summary?
Extrinsic measuresExtrinsic measuresExtrinsic measuresExtrinsic measures
27
The Document Understanding Conference (DUC)
This is really the Text Summarization CompetitionStarted in 2001
Task and Evaluation (for 2001-2004):Various target sizes were used (10-400 words)Both single and multiple-document summaries assessedSummaries were manually judged for both content and readability. Each peer (human or automatic) summary was compared against a single model summary
– using SEE (http://www.isi.edu/ cyl/SEE/) – estimates the percentage of information in the model thatwas
covered in the peer. – Also used ROUGE (Lin ’04) in 2004
Recall-Oriented Understudy for Gisting Evaluation Uses counts of n-gram overlap between candidate and
gold-standard summary, assumes fixed-length summaries
28
The Document Understanding Conference (DUC)
Made a big change in 2005Extrinsic evaluation proposed but rejected (write a natural disaster summary)Instead: a complex question-focused summarization task that required summarizers to piece together information from multiple documents to answer a question or set of questions as posed in a DUC topic.Also indicated a desired granularity of information
29
The Document Understanding Conference (DUC)
Evaluation metrics for new task:GrammaticalityNon-redundancyReferential clarityFocusStructure and CoherenceResponsiveness (content-based evaluation)
This was a difficult task to do well in.
30
Let’s make a summarizer!
Each person (or pair) write code for one small part of the problem, using Kupiec et al’s method.We’ll combine the parts in class.