Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic...
-
Upload
flora-stone -
Category
Documents
-
view
212 -
download
0
Transcript of Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic...
![Page 1: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/1.jpg)
Dissertation Defense
Multiple Alternative Sentence Compressions as a Tool for
Automatic Summarization Tasks
David Zajic
University of Maryland College ParkDepartment of Computer Science
November 28, 2006
![Page 2: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/2.jpg)
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government. The body of the editor, Misael Tamayo Hernández, of the daily El Despertar de la Costa, was found early Friday with his hands tied behind his back in a room at the Venus Motel…
![Page 3: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/3.jpg)
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Newspaper editor found dead in Pacific resort city
![Page 4: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/4.jpg)
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Paper ran articles about corruption in government
![Page 5: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/5.jpg)
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Hernández, Zihuatanjo: Newspaper editor found dead in Pacific resort city
![Page 6: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/6.jpg)
Intuition
• A newspaper editor was found dead in a hotel room in this Pacific resort city a day after his paper ran articles about organized crime and corruption in the city government.
• Newspaper Editor Killed in Mexico– (A) Newspaper Editor (was) killed in Mexico
![Page 7: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/7.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– Single Document and evaluation
• HMM Hedge, Trimmer, Topiary
– Extending to Multi-document Summarization
• Review of Evaluations• Conclusion• Future Work
![Page 8: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/8.jpg)
Introduction
• Automatic Summarization– Distillation of important information from a
source into an abridged form– Extractive Summarization: select sentences
with important content from the document– Abstractive Summarization– Limitations
• Sentences contain mixture of relevant, non-relevant information
• Sentences partially redundant to rest of summary
![Page 9: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/9.jpg)
Contributions
• Two implementations of select-words-in-order– Statistical Method: HMM Hedge (Headline
Generation)– Syntactic Method: Trimmer
![Page 10: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/10.jpg)
Contributions
• Multiple Alternative Sentence Compressions (MASC)– Framework for Automatic Text Summarization– Sentence Compression: rewriting a sentence in
an abridged form– Generation of many compressions of source
sentences to serve as candidates– Select from candidates using weighted features
to generate summary– Environment for testing hypotheses
![Page 11: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/11.jpg)
Hypotheses
• Extractive summarization systems can create better summaries using larger pool of compressed candidates
• Sentence selectors choose better summary candidates using larger sets of features
• For Headline Generation, combination of fluent text and topics better than either alone
![Page 12: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/12.jpg)
Contributions
• Sentence Compression– HMM Hedge– Trimmer– Topiary
• Sentence Selection– Lead Sentence for Headline Generation– Maximal Marginal Relevance for Multi-
document Summarization
![Page 13: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/13.jpg)
Summarization Tasks
• Single Document Summarization– Very short: Headline Generation– Single sentence– 75 characters– DUC2002, 2003, 2004
• Query-focused Multi-Document Summarization– Multiple sentences– 100 – 250 words– DUC2005, 2006
![Page 14: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/14.jpg)
Headline Generation
• Newspaper Headlines– Natural example of human summarization– Three criteria for a good headline:
• Summarize a story• Make people want to read it• Fit in specified space
– Headlinese: compressed form of English
![Page 15: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/15.jpg)
Introduction
• Headline Types: Eye-Catcher • Indicative • Informative
• Under God Under Fire
• Pledge of Allegiance
• U.S. Court Decides Pledge of Allegiance Unconstitutional
![Page 16: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/16.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
![Page 17: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/17.jpg)
General Architecture
Compression CandidatesDocument CandidateSelection
Summary
HMM Hedge
Trimmer
Topiary
Maximal Marginal Relevance
sentence selection
![Page 18: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/18.jpg)
Sentence Selection
• Select sentences to be compressed
• Lead sentence, first 5
![Page 19: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/19.jpg)
Sentence Compression
• Selecting words in order from a sentence– Or window of words
• Feasibility Studies (my work!)– Describe the task in a backup slide– Humans can almost always do this for written
news– Bias for words from within a single sentence– Bias for words early in document
![Page 20: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/20.jpg)
Sentence Compression
• Human study showed potential for saving space by using sentence compression in multi-document summarization
• Subject was shown 103 sentences, relevant to 39 queries, asked to make relevance judgments on 430 compressed versions
• Potential for 16.7% reduction by word count, 17.6% reduction by characters, with no loss of relevance
![Page 21: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/21.jpg)
Candidate Selection
• Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998)– All candidates given scores: linear combination of
static and dynamic features– Weights optimized for Rouge 1 recall, using BBN’s
Optimizer– Add highest-scoring candidate to summary
• Other compressions of source sentence removed from pool
– Recalculate dynamic features, Rescore candidates– Iterate until summary is complete.
![Page 22: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/22.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
![Page 23: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/23.jpg)
HMM Hedge ArchitectureSingle Compression
HMM HedgeDocument
VerbTags
Selection Summary
Part of Speech Tagger
Headline Language Model
General Language Model
![Page 24: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/24.jpg)
HMM HedgeNoisy Channel Model
• Underlying method: select words in order• Sentences are observed data• Headlines are unobserved data• Noisy channel adds words to headlines to create
sentences
President signed legislation
On Tuesday the President signed the controversial legislation at a private ceremony
![Page 25: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/25.jpg)
HMM HedgeNoisy Channel Model
• Probability of Headline estimated with bigram model of Headlinese
• Probability of observed Sentence given unobserved Headline (the channel model) estimated by unigram model of General English
![Page 26: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/26.jpg)
HMM Hedge
• Viterbi Decoding parameters to mimic Headlines - give examples Are these model parameters of decoding parameters?– Groups of contiguous words, clumpiness– Size of gaps between words, gappiness– Sentence position of words– Require verb
![Page 27: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/27.jpg)
HMM Hedge
• Adaptation to Multi-candidate compression
• Finds the 5 most likely headlines for summary lengths 5 to 15 words of document sentences
![Page 28: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/28.jpg)
Automatic Evaluation (Give references & examples)
• Recall Oriented Understudy of Gisting Evaluation (Rouge)– Rouge Recall: ratio of matching candidate n-gram
count to reference n-gram count– Rouge Precision: ratio of matching candidate n-gram
count to candidate n-gram count times number of references.
– R1 preferred for single document summarization– R2 preferred for multi-document summarization
![Page 29: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/29.jpg)
HMM Hedge
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
1 2 3 4 5N-best at each length
Ro
ug
e 1
Sco
res
R1 Recall, 1 Sentence
R1 Precision, 1 Sentence
R1 Recall, 2 Sentences
R1 Precision, 2 Sentences
R1 Recall, 3 Sentences
R1 Precision, 3 Sentences
![Page 30: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/30.jpg)
HMM Hedge
• Features (5-fold cross validation) default weight, optimized weight fold A) Linear combination scoring function– Word position sum (-0.05, 1.72)– Small gaps (-0.01, 1.02)– Large gaps (-0.05, 3.70)– Clumps (-0.05, -0.17)– Sentence position (0, -945)– Length in words (1, 42)– Length in characters (1, 85) – Unigram probability of story words (1, 1.03)– Bigram probability of headline words (1, 1.51)– Emit probability of headline words (1, 3.60)
![Page 31: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/31.jpg)
HMM Hedge
Fold Default
R1 recall
Weights
R1 Prec.
Optimized
R1 recall
Weights
R1 Prec.
A 0.11214 0.10726 0.24722 0.21482
B 0.11021 0.10231 0.24307 0.21425
C 0.11781 0.10811 0.24129 0.20795
D 0.11993 0.10660 0.16595 0.13454
E 0.11282 0.10003 0.25341 0.21775
Avg 0.11458 0.10486 0.23019 0.19786
![Page 32: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/32.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
![Page 33: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/33.jpg)
HMM HedgeMulti-Document Summarization
HMM Hedge Candidates,HMM Features URADocument
VerbTags
URAIndex
Selection
FeatureWeights
Summary
Part of Speech Tagger
Candidates,HMM Features,URA Features
Query (optional)
Headline Language Model
General Language Model
![Page 34: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/34.jpg)
Candidate Selection
• Static Features– Sentence Position– Relevance– Centrality– Compression-specific features
• Dynamic Features– Redundancy– Count of summary candidates from source
document
![Page 35: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/35.jpg)
Relevance and Centrality
• Universal Retrieval Architecture (URA)– Infrastructure for information retrieval tasks
• Four score components– Candidate Query Relevance: Matching score between
candidate and query– Document Query Relevance: Lucene similarity score
between document and query– Candidate Centrality: Average Lucene similarity of
candidate to other sentences in document– Document Centrality: Average Lucene similarity of
document to other documents in cluster
![Page 36: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/36.jpg)
Redundancy: Intuition
• Consider a summary about earthquakes
• “Generated” by topic: Earthquake, seismic, Richter scale
• “Generated” by general language: Dog, under, during
• Sentences with many words “generated” by the topic are redundant
![Page 37: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/37.jpg)
Redundancy: Formal
Ss
CsPDsPSredundancy
Csize
CwcountCwP
Dsize
DwcountDwP
)|()1()|()(
)(
),()|(
)(
),()|(
![Page 38: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/38.jpg)
HMM Hedge Multi-doc
• Placeholder for results of HMM Hedge
![Page 39: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/39.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
![Page 40: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/40.jpg)
Trimmer
• Underlying method: select words in order• Parse and Trim• Rules come from study of Headlinese
– Different distributions of syntactic structures
Phenomenon Headlines Lead Sent
Preposed adjunct 0% 2.7%
Time expression 1.5% 24%
Noun Phrase Relative Clause 0.3% 3.5%
![Page 41: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/41.jpg)
Trimmer: Mask operation
![Page 42: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/42.jpg)
Trimmer: Mask Outside
![Page 43: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/43.jpg)
Trimmer Single Document
TrimmerCandidates,Trimmer Features
Document
EntityTags
Parses
Selection Summary
Parser
EntityTagger
![Page 44: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/44.jpg)
Trimmer: Root S
• Select the lowest leftmost S which has NP and VP children, in that order.[S [S [NPRebels] [VP agree to talks with
government]] officials said Tuesday.]
![Page 45: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/45.jpg)
Trimmer: Preposed Adjunct
• Remove [YP …] preceding first NP inside chosen S
[S [PP According to a now-finalized blueprint described by U.S. officials and other sources] [NP the Bush administration] [VP plans to take complete, unilateral control of a post-Saddam Hussein Iraq]]
![Page 46: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/46.jpg)
Trimmer: Conjunction
• Remove [X][CC][X] or [X][CC][X][S Illegal fireworks [VP [VP injured hundreds of
people] [CC and] [VP started six fires.]]]
[S A company offering blood cholesterol tests in grocery stores says [S [S medical technology has outpaced state laws,] [CC but] [S the state says the company doesn’t have the proper licenses.]]]
![Page 47: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/47.jpg)
Trimmer
• Adaptation to multi-candidate compression
• Multi-candidate rules– Root S– Preamble– Conjunction
![Page 48: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/48.jpg)
Trimmer
• Multi-candidate Root S• [S1 [S2 The latest flood crest, the eighth this summer,
passed Chongqing in southwest China], and [S3 waters were rising in Yichang, in central China’s Hubei province, on the middle reaches of the Yangtze], state television reported Sunday.]
• Single-candidate version would choose only S2. Multi-candidate Root-S generates all three choices.
![Page 49: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/49.jpg)
Trimmer: Preamble Rule
![Page 50: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/50.jpg)
Trimmer: Preamble Rule
![Page 51: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/51.jpg)
Trimmer: Preamble Rule
![Page 52: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/52.jpg)
Trimmer: Conjunction
• [S Illegal fireworks [VP [VP injured hundreds of people] [CC and] [VP started six fires.]]]
• Illegal fireworks injured hundreds of people
• Illegal fireworks started six fires
![Page 53: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/53.jpg)
Trimmer: Features
• Selection among Trimmer candidates based on three sets of features– L: Length in characters or words– R: Counts of rule applications– C: Centrality
• Baseline LUL: select longest version under limit
![Page 54: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/54.jpg)
Trimmer: Benefit of using more candidates
0.15
0.17
0.19
0.21
0.23
0.25
0.27
Trimmer Trimmer +R Trimmer +P Trimmer +C Trimmer +R+P Trimmer +R+C Trimmer +S+C Trimmer+R+S+C
Ro
ug
e 1
Rec
all
LUL
L
R
C
LR
LC
RC
LRC
![Page 55: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/55.jpg)
Trimmer: Benefit of using more features
0.15
0.17
0.19
0.21
0.23
0.25
0.27
LUL L R C LR LC RC LRC
Ro
ug
e 1
Rec
all
Trimmer
Trimmer +R
Trimmer +P
Trimmer +C
Trimmer +R+P
Trimmer +R+C
Trimmer +S+C
Trimmer +R+S+C
![Page 56: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/56.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
![Page 57: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/57.jpg)
Trimmer Multi-Document
TrimmerCandidates,Trimmer Features URA
Document
EntityTags
Parses URAIndex
Selection
FeatureWeights
Summary
Parser
EntityTagger
Candidates,Trimmer Features,URA Features
Query (optional)
![Page 58: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/58.jpg)
Trimmer
System R1 Recall R1 Prec. R2 Recall R2 Prec
Trimmer
MultiDoc
0.38198 0.37617 0.08051 0.07922
HMM
MultiDoc
0.37404 0.37405 0.07884 0.07887
![Page 59: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/59.jpg)
Talk Roadmap
• Introduction• Automatic Summarization under MASC
framework– HMM Hedge, Trimmer, Topiary– Single Document, Multi-document– Experimental evidence supporting hypotheses
• Review of Evaluations• Conclusion• Future Work
![Page 60: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/60.jpg)
Topiary
• Combines topic terms and fluent text– Fluent text comes from Trimmer– Topics come from Unsupervised Topic Detection
(UTD)
• Single-candidate algorithm– Lower Trimmer threshold to make room for highest
scoring non-redundant topic term– Trim to lower threshold.– Adjust if topic redundancy changes because of
trimming
![Page 61: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/61.jpg)
Topiary Single-Candidate
Topiary
Document
EntityTags
Parses
Summary
Parser
EntityTagger
Unsupervised Topic Detection
Topics
![Page 62: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/62.jpg)
Topiary, Trimmer, UTD
0
0.05
0.1
0.15
0.2
0.25
0.3
Rouge 1 Rouge 2 Rouge 3 Rouge 4
First 75 chars
Topiary
Trimmer 2003
Trimmer 2004
UTD
![Page 63: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/63.jpg)
Topiary
• Multi-candidate Algorithm– Generate Multi-candidate Trimmer candidates– Fill space in all Trimmer candidates with all
combinations of non-redundant topics– Score and select summary– Give an example of how this works.
![Page 64: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/64.jpg)
Topiary Multi-Candidate
TopiaryCandidates,Trimmer Features URA
Document
EntityTags
Parses URAIndex
Selection
FeatureWeights
Summary
Parser
EntityTagger
Candidates,Trimmer Features,URA Features
Query (optional)
Unsupervised Topic Detection
Topics
![Page 65: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/65.jpg)
DUC 2004 Task 1 Results (Rouge)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
ROUGE-1 ROUGE-L ROUGE-W-1.2 ROUGE-2 ROUGE-3 ROUGE-4
1 TOPIARY
9 10
18 25
26 31
32 33
50 51
52 53
54 75
76 77
78 79
80 87
88 89
90 91
92 98
99 100
101 110
128 129
130 131
132 135
136 137
A B
C D
E F
G H
HumanReferences
Topiary
Baseline
AutomaticSummaries
![Page 66: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/66.jpg)
Topiary Evaluation
Rouge Metric Topiary Multi-Candidate Topiary
Rouge-1 Recall 0.25027 0.26490
Rouge-2 Recall 0.06484 0.08168*
Rouge-3 Recall 0.02130 0.02805
Rouge-4 Recall 0.00717 0.01105
Rouge-L 0.20063 0.22283*
Rouge-W1.2 0.11951 0.13234*
![Page 67: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/67.jpg)
Talk Roadmap
• Introduction• Automatic Summarization
– HMM Hedge, Trimmer, Topiary• Single-candidate, MASC versions
– Multi-document Summarizataion• HMM Hedge, Trimmer
• Evaluation• Conclusion• Future Work
![Page 68: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/68.jpg)
Evaluation: Review
• HMM Hedge, Single-document. Rouge-1 recall increases as number of candidates increases
• HMM Hedge, Single-document. Rouge-1 doubles when scored with optimized weights on features
• Trimmer, Single-document. Rouge-1 increases with greater use of multi-candidate rules
• Trimmer, Single-document. Rouge-1 increases with larger set of features
• Topiary, Single-document. Multi-candidate Topiary scores significantly higher on some Rouge metrics than single-candidate Topiary.
• Trimmer scored higher than HMM for Multi-document summarization
![Page 69: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/69.jpg)
Evaluation
Human extrinsic evaluation of HMM, Trimmer, Topiary and First 75
LDC agreement: ~20x increase in speed. Some loss of accuracy.
Relevance Prediction
Baseline First75 char, hard to beat
![Page 70: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/70.jpg)
Talk Roadmap
• Introduction
• Automatic Summarization– HMM Hedge, Trimmer, Topiary– Multiple Alternative Sentence Compressions
(MASC)
• Evaluation
• Conclusion
• Future Work
![Page 71: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/71.jpg)
Contributions
• Use of MASC framework performance across summarization tasks and compression source
• Fluent and informative summaries can be constructed by selecting words in order from sentences. Verified by doing a human study.
• Headlines combining fluent text and topic terms score better than either alone
![Page 72: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/72.jpg)
Future Work
• Enhance redundancy score with paraphrase detection
• Anaphora resolution in candidates
• Expand candidates by sentence merging
• Sentence ordering in multi-sentence summaries
![Page 73: Dissertation Defense Multiple Alternative Sentence Compressions as a Tool for Automatic Summarization Tasks David Zajic University of Maryland College.](https://reader030.fdocuments.in/reader030/viewer/2022032604/56649e685503460f94b63f3f/html5/thumbnails/73.jpg)
End