nenkova/Courses/cis700-2/rst.pdf · · 2007-08-16Created Date: 1/23/2006 2:19:42 PM
Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of...
![Page 1: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/1.jpg)
Text Specificity and Impact on Quality of
News Summaries
Annie Louis & Ani NenkovaUniversity of Pennsylvania
June 24, 2011
![Page 2: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/2.jpg)
Texts are a mix of general and specific sentences
Recently, we have developed a classifier that can distinguish general vs. specific sentences
The notion of specificity could be useful for a number of applications In this work, we consider automatic summarization Summaries cannot include all specific content because
of the space constraint Understand the role of general/specific content in
summaries and how it impacts quality
Specificity: amount of detail
2
![Page 3: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/3.jpg)
Seismologists said the volcano had plenty of built-up magma and even more severe eruptions could come later. [overview]
The volcano's activity -- measured by seismometers detecting slight earthquakes in its molten rock plumbing system -- is increasing in a way that suggests a large eruption is imminent, Lipman said.
[details]
Example general and specific sentences
3
![Page 4: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/4.jpg)
Prior studies of general-specific content in summaries Humans use generalization and specification of
source sentences to create abstract sentences
One generation task is to fuse information from key (general) sentence and specific sentence on the same topic to create an abstract sentence
Subtitles of news broadcasts are often generalized compared to original text
4
[Jing & McKeown (2000)]
[Wan et al. (2008)]
[Marsi et al. (2010)]
![Page 5: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/5.jpg)
Overview of our study Quantitative analysis of specificity in inputs and
summaries using a general/specific classifier
1. Human abstracts have much more general content than system extracts
2. Amount of specific content is related to content quality of system summaries More general ~ better
3. Preliminary study on properties of summary-worthy general sentences 5
![Page 6: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/6.jpg)
Data: DUC 2002 Generic multidocument summarization task
59 input sets 5 to 15 news documents
3 types of summaries 200 words Manually assigned content and linguistic quality scores
1. Humanabstracts
6
2. Humanextracts
3. Systemextracts
2 assessors * 59 2 assessors * 59 9 systems * 59
![Page 7: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/7.jpg)
General vs. specific sentence classifier: prior work
7
Sentence level
Features1. Words2. Named entities, numbers3. Likelihood under language model4. Word specificity5. Adjectives/adverbs, length of phrases6. Polar words7. Sentence length
Training Binary: General or specific Logistic regression: can get probability for a class
[Louis & Nenkova (2011)]
![Page 8: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/8.jpg)
Classification performance
8
75% accurate Validated on human annotations On examples with high annotator agreement – 90%
The probability is indicative of annotator agreement on class Sentences with high agreement ~ high confidence
predictions
![Page 9: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/9.jpg)
Computing specificity for a text Sentences in summary are of varying length, so
we compute a score on word level “Average specificity of words in the text”
9
S1: w12w11 …w13
S2: w22w21 …w23
S3: w32w31 …w33
Confidence for beingin specific class
0.23
0.81
0.680.68 0.68 0.68 0.68
0.23 0.23 0.23 0.23
0.81 0.81 0.81 0.81
Average score on tokens
Specificity score
![Page 10: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/10.jpg)
Average specificity of different types of summaries
1. More general content is preferred in abstracts
2. Simply the process of extraction makes summaries more specific
3. System summaries are overly specific
10
0.7 0.80.6Inputs (0.65)
H. Abs (0.62)
S.ext (0.74)
H.ext (0.72)
specific
Is the difference related to summary quality?
general
![Page 11: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/11.jpg)
Analysis of ‘system summaries’: specificity and quality
1. Content quality Importance of content included in the summary More general ~ better
2. Linguistic quality How well-written the summary is perceived to be More specific ~ better
3. Quality of general/specific summaries When a summary is intended to be general or specific
11
![Page 12: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/12.jpg)
1. Specificity and content quality Coverage score: manually judged at NIST
Similarity to a human summary
Correlation with specificity -0.169 (p-value 0.0006)
More specific ~ decreased content quality
12
![Page 13: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/13.jpg)
But the correlation is not very high Specificity is related to realization of content
Different from importance of the content
Content quality = content importance + appropriate specificity level
Content importance: ROUGE scores N-gram overlap of system summary and human summary Standard evaluation of automatic summaries
13
![Page 14: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/14.jpg)
System summary quality: Specificity as one of the predictors Coverage score ~ ROUGE-2 (bigrams) + specificity
Linear regression
Weights for predictors in the regression model
14
Mean β Significance (hypothesis β = 0)
(Intercept) 0.212 2.3e-11
ROUGE-2 1.299 < 2.0e-16
Specificity -0.166 3.1e-05
Is the combination a better predictor than ROUGE alone?
![Page 15: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/15.jpg)
2. Specificity and linguistic quality Used different data: TAC 2009
DUC 2002 only reported number of errors Were also specified as a range: 1-5 errors
TAC 2009 linguistic quality score Manually judged: scale 1 – 10 Combines different aspects
coherence, referential clarity, grammaticality, redundancy
15
![Page 16: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/16.jpg)
System summaries: What is the avg specificity in different score categories?
More general ~ lower score! General content
is useful but need proper context!
16
Ling score No. summaries
Poor (1, 2) 202
Mediocre (5) 400
Best (9, 10) 79
If a summary starts as follows:“We are quite a ways from that, actually.”As ice and snow at the poles melt, …
Specificity = lowLinguistic quality = 1
Average specificity
0.71
0.72
0.77
![Page 17: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/17.jpg)
3. Specificity and quality of general/specific summaries DUC 2005: General-specific summary task
Create general summaries for some inputs, specific summaries for others
How specificity is related to scores of these summaries?
17
![Page 18: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/18.jpg)
System summaries: Correlation between specificity and content scores
Further hints that specificity alone is not predictive of summary quality Once a summary is general, level of generality is not
longer predictive of quality
18
Summary type
Pearson correlation
General -0.03
Specific 0.18*
Content scores were measured using the pyramid method
![Page 19: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/19.jpg)
Analysis of general sentences in human summaries1. Generalization operation performed in human
abstracts Frequency of operations, amount of deletions
2. How general sentences are used in human extracts Position, type of sentence
19
![Page 20: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/20.jpg)
Data for analysing generalization operation Aligned pairs of abstract and source sentences
conveying the same content Traditional data used for compression experiments
Ziff-Davis corpus 15964 sentence pairs used in Galley & McKeown, 2007 Any number of deletions, up to 7 substitutions
Only 25% abstract sentences are mapped But beneficial to observe the trends
20
![Page 21: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/21.jpg)
Generalization operation in human abstracts
Transition
SS
SG
GG
GS
21
One-third of all transformations are specific to general
Human abstracts involve a lot of generalization
No. pairs % pairs
6371 39.9
5679 35.6
3562 22.3
352 2.2
![Page 22: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/22.jpg)
How specific sentences get converted to general?
SG
SS
GG
GS
22
Orig. length
33.5
33.4
21.5
22.7
New/orig length
40.8
56.6
60.8
66.0
Avg. deletions(words)
21.4
16.3
9.3
8.4
Choose long sentences and compress heavily!
A measure of generality would be useful to guide compression Currently only importance and grammaticality are used
![Page 23: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/23.jpg)
Use of general sentences in human extracts Details of Maxwell’s death were sketchy. Folksy was an understatement. “Long live democracy!” Instead it sank like the Bismarck.
Example use of a general sentence in a summary…With Tower’s qualifications for the job, the nominations
should have sailed through with flying colors. [Specific]Instead it sank like the Bismarck. [General]……
![Page 24: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/24.jpg)
Simple categorization 75 top general sentences according to classifier
confidence
24
Type
First sentence
Last sentence
Attributions
Comparisons
General sentences are used as topic/ emphasis sentences
Proportion
6 (8%)
13 (17%)
14 (18%)
4 (5%)
![Page 25: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/25.jpg)
Conclusion General sentences are useful content for
summaries People use them in summaries for emphasis and topic
They can improve the content quality Choosing good general sentences or generating them
will be an interesting task
But linguistic quality should also be considered General sentences difficult to understand out of context Content planning should consider the order of general
content
![Page 26: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/26.jpg)
Thank you
26
![Page 27: Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d2d5503460f94a041e6/html5/thumbnails/27.jpg)
Histogram of specificity scores