Automatic Summarization Team Final report JHU Summer Workshop August 16, 2001.
-
Upload
rodger-west -
Category
Documents
-
view
214 -
download
0
Transcript of Automatic Summarization Team Final report JHU Summer Workshop August 16, 2001.
The summarization problem space
• Single-document summarization
• Cross-lingual summarization
• Multi-document summarization
• Query-based summarization
• Evaluation
The summarization problem space
• Single-document summarization– 18,146 English-language documents
• Cross-lingual summarization– Chinese translations of the documents
• Multi-document summarization– Clusters of related documents
• Query-based summarization– Queries and their translations
• Evaluation– agreement between humans
– performance of automatic summaries
Technical objectives
• Develop a summarization toolkit including a modular state-of-the art summarizer: single-document, multi-document, generic, query-based
• Develop a summarization evaluation toolkit allowing comparisons between extractive and non-extractive summaries
• Produce an annotated corpus for further research in text summarization
Sample scenarios
• Evaluate an existing summarizer
• Build a summarizer from scratch
• Test a summarization feature
• Test a new evaluation metric
• Test a machine translation system
Resources
• manual summaries (extracts and abstracts)
• baseline summaries
• automatic summaries
• manual and automatic relevance judgements
• XREF, lemmatized, tagged versions of the corpus
• manual and automatic query translations
• sentence segmentation
• sentence alignments
• XML DTDs, converters
• subsumption judgements
• guidelines for judges
• guidelines for building summarizers
• evaluation software
• modular, trainable summarizer
Participants• Non-students
– Dragomir Radev, University of Michigan
– Wai Lam, Chinese University of Hong Kong
– Simone Teufel, Columbia University
– Horacio Saggion, University of Sheffield
• Students– John Blitzer, Cornell University
– Arda Celebi, Bilkent University
– Elliott Drabek, Johns Hopkins University
– Danyu Liu, Chinese University of Hong Kong
– Hong Qi, University of Michigan
Data annotation
• Linguistic Data Consortium– Stephanie Strassel, LDC
– Chris Cieri, LDC
– Dave Graff, LDC
• Bilingual Corpus• Queries• Judgements of relevance of documents and
sentences• Manual summaries
Information Retrieval Architecture
IR Engine
retrievalmodulequery
ranked listof documents
documentsindexingmodule
documentrepresentation(index)
(English)
English /Chinese)(
summaries
summaries
Cross-lingual retrieval
• Bilingual IR engine– Based on Smart, an IR engine developed at
Cornell University• Extend Smart to handle both Chinese and English
documents (XSmart)– Handle double-byte Chinese characters Query
Translation
– Phrasal Translation Module
– Term Disambiguation Module
Phrasal Translation
• Translation process begins for each sentence
• Detecting phrases in a sentence– Construct a phrase/word English-Chinese
lexicon– Maximal matching algorithm
Term Disambiguation
Make use of a cohesion model and corpus statistics– Cooccurrence of translation terms in the corpus
– Correct translations of terms tend to cooccur
,...,,: 321 iiii CCCE
,...,: 2,11,11 iii CCE
,...,: 2,11,11 iii CCE
Sample English Query<?xml version='1.0'?><!DOCTYPE QUERY SYSTEM "../../../dtd/query.dtd" ><QUERY QID="Q-241-E" QNO="241" TRANSLATED="NO"><TITLE>Fire safety, building management concerns</TITLE></QUERY>
<?xml version='1.0'?><!DOCTYPE QUERY SYSTEM “../../../dtd/query.dtd" ><QUERY QID="Q-241-C" QNO="241" TRANSLATED="NO"><TITLE>¨¾¤õ·NÃÑ,¤j·HºÞ²z</TITLE></QUERY>
Sample Chinese Query
Sample Retrieval Result for Full-length Documents
<?xml version='1.0'?><!DOCTYPE DOC-JUDGE SYSTEM "/export/ws01summ/dtd/docjudge.dtd" ><DOC-JUDGE QID="Q-241-E" SYSTEM="SMART" LANG="ENG"> <D DID="D-20000126_008.e" RANK="1" SCORE="135.0000" CORR-DOC="D-20000126_012.c"/> <D DID="D-19980625_007.e" RANK="2" SCORE="99.0000" CORR-DOC="D-19980625_006.c"/> <D DID="D-19990126_017.e" RANK="3" SCORE="98.0000" CORR-DOC="D-19990126_018.c"/> <D DID="D-19981007_018.e" RANK="4" SCORE="91.0000" CORR-DOC="D-19981007_023.c"/> <D DID="D-19980121_004.e" RANK="5" SCORE="78.0000" CORR-DOC="D-19980121_009.c"/> <D DID="D-19971016_004.e" RANK="6" SCORE="72.0000" CORR-DOC="D-19971016_005.c"/>
Sample Retrieval Result for Lead-Based Summary (5%)
<?xml version='1.0'?><!DOCTYPE DOC-JUDGE SYSTEM"/export/ws01summ/dtd/docjudge.dtd" ><DOC-JUDGE QID="Q-241-E" SYSTEM="SMART" LANG="ENG"> <D DID="D-20000126_008.e" RANK="1" SCORE="14.0000" CORR-DOC="D-20000126_012.c"/> <D DID="D-19991214_002.e" RANK="2" SCORE="11.0000" CORR-DOC="D-19991214_001.c"/> <D DID="D-19980810_006.e" RANK="3" SCORE="10.0000" CORR-DOC="D-19980810_003.c"/> <D DID="D-19990505_028.e" RANK="4" SCORE="9.0000" CORR-DOC="D-19990505_034.c"/> <D DID="D-19980115_009.e" RANK="4" SCORE="9.0000" CORR-DOC="D-19980115_013.c"/>:
querySMART
LDC Judges
Rankeddocumentlist
Rankeddocumentlist
IR results
document
Summarycomparison
Correlation
Summarizer
Baselines
Single-document situation
Extract
1. Co-selection2. Similarity
LDC Judges
Summarycomparison
Manual sum.
Summarizer
Baselines
documentcluster
Multi-document situation
1. Co-selection2. Similarity
Extracts
QueriesTRAINING:Group_125 Narcotics RehabilitationGroup_241 Fire safety, building management concernsGroup_323 Battle against disc piracyGroup_551 Natural disaster victims aidedGroup_112 Autumn and sports carnivalsGroup_199 Intellectual Property RightsGroup_398 Flu results in Health ControlsGroup_883 Public health concerns cause food-buisness closingsGroup_1014 Traffic Safety EnforcementGroup_1197 Museums: exhibits/hours
DEV-TEST:Group_447 Housing (Amendment) Bill Brings Assorted Improvements Group_827 Health education for youngstersGroup_885 Customs combats contraband/dutiable cigarette operationsGroup_2 Meetings with foreign leadersGroup_46 Improving Employment OpportunitiesGroup_54 Illegal immigrants Group_60 Customs staff doing good job.Group_61 Permits for charitable fundraisingGroup_62 Y2K readinessGroup_1018 Flower shows
TEST: 20 more queries
Summaries produced
• Single-document extracts– automatic (135 runs on 18,146 documents each): 10
compression rates, Word/Sentence, English/Chinese/Xlingual, 10 summarization methods
– manual (80 runs on 200 documents each): 10 compression rates, Word/Sentence, (3 judges + average)
Summaries produced
• Multi-document summaries– 3 lengths, 3 judges, 14 queries (out of 40)
• Multi-document extracts– automatic (160 extracts) = 8 compression rates (5-
40%,50-200AW) x 20 clusters
– manual (320 extracts) = 8 compression rates x 10 clusters x (3 judges + average)
List of summarizers
• MEAD, Websumm, Summarist, LexChains, Align
• English, Chinese
• Single-document, Multi-document
WEBSUMM:
Some of them are taking temporary shelter at Lung Hang Estate Community Centre in Sha Tin, and Shek Lei Estate Community Centre and Princess Alexandra Community Centre in Tsuen Wan.
Emergency relief by SWD
The Social Welfare Department has provided relief articles and hot meals to 114 people who were affected by the rainstorm or mudslip throughout the territory. The people, comprising adults and children, come from 30 families. Some of them are taking temporary shelter at Lung Hang Estate Community Centre in Sha Tin, and Shek Lei Estate Community Centre and Princess Alexandra Community Centre in Tsuen Wan. The Regional Social Welfare Officer (New Territories East), Mrs Lily Wong, visited victims at Lung Hang State Community Centre this (Thursday) afternoon to offer any necessary assistance. Six victims have so far requested for Comprehensive Social Security Allowance and the applications are being processed. Social workers also escorted an 88-year old man who was feeling unwell to the Prince of Wales hospital for medical checkup.
MEAD:
The Social Welfare Department has provided relief articles and hot meals to 114 people who were affected by the rainstorm or mudslip throughout the territory. The Regional Social Welfare Officer (New Territories East), Mrs Lily Wong, visited victims at Lung Hang State Community Centre this (Thursday) afternoon to offer any necessary assistance.
RANDOM:
The Social Welfare Department has provided relief articles and hot meals to 114 people who were affected by the rainstorm or mudslip throughout the territory. Some of them are taking temporary shelter at Lung Hang Estate Community Centre in Sha Tin, and Shek Lei Estate Community Centre and Princess Alexandra Community Centre in Tsuen Wan.
LEAD:
The Social Welfare Department has provided relief articles and hot meals to 114 people who were affected by the rainstorm or mudslip throughout the territory. The people, comprising adults and children, come from 30 families.
05 05 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 FD W S W S W S W S W S W S W S W S W S W S E-FD - - - - - - - - - - - - - - - - - - - - * 40E-LD X X X X * * X X X X X X X X X X X X X X - 440E-RA X X X X * * X X X X X X X X X X X X X X - 440E-MO * * X * * * X * X * - * - * - * - * - * - 540E-M2 - - - - - X - - - - - - - - - - - - - - - 20E-M3 - - - - - X - - - - - - - - - - - - - - - 8E-S2 - - - - - X - - - - - - - - - - - - - - - E-WS - X X * * X - X - - - - - - - - - - - 160E-WQ - - - - - X - - - - - - - - - - - - - - - 10E-LC - - - - - - - - - - - - - - - E-CY - X - X - * - X - X - - - - - - - - - - - 120E-AL X X X X X X X X X X - - - - - - - - - - - 200E-AR X X X X X X X X X X - - - - - - - - - - - 200E-AM X X X X X X X X X X - - - - - - - - - - - 200C-FD - - - - - - - - - - - - - - - - - - - - * 40C-LD X X X X * * X X X X - - - - - - - - - - - 240C-RA X X X X * * X X X X - - - - - - - - - - - 240C-MO X * X * * * X * X * - - - - - - - - - - - 320C-M2 - - - - - X - - - - - - - - - - - - - - - 20C-CY - X - X - * - X - X - - - - - - - - - - - 120C-AL X X X X X X X X X - - - - - - - - - - - 180C-AR X X X X X X X X X X - - - - - - - - - - - 200C-AM X X X X X X X X - - - - - - - - - - - 120 X-FD - - - - - - - - - - - - - - - - - - - - * 40X-LD X X X X * * X X X X - - - - - - - - - - - 240X-RA X X X X * * X X X X - - - - - - - - - - - 240X-MO X * X * * * X * X * - - - - - - - - - - - 320X-M2 - - - - - X - - - - - - - - - - - - - - - 20X-CY - X - X - * - X - X - - - - - - - - - - - 120X-AL X X X X X X X - - - - - - - - - - - 140X-AR X X X X X X X X - - - - - - - - - - - 160X-AM X X X X X X - - - - - - - - - - - 120
E
C
X
The MEAD Summarizer
• Extractive summarization as a classification task
– How can we deal with the combinatorial problem?
• Approximate by classifying individual sentences
• MEAD’s classification function– Give each sentence a score based on a linear
combination of features (Position, Centroid, First sentence similarity)
}1,0{2: Sf
}1,0{: sf
)()()()(Score sFwsCwsPws FCP
The Redundancy Problem
• Sentences with duplicate information content– Solemn ceremony marks handover.
– A solemn historic ceremony has marked the resumption of the exercise of sovereigntyover Hong Kong.
• MEAD’s method for combating redundancy• Do until # of sentences = max num for
summary:– Foreach sentence si in order of score
– if si too similar to sentences, skip
– else add si
Modular MEAD Architecture
• Feature Extractor– Sentences become feature vectors
• Classifier– Feature vectors become sentence scores
• Reranker– Sentence scores become other sentence scores
Uses of the architecture
• Current– Query-based features (textual similarity with query)
– SVMs and MaxEnt for classification
• Future– Advanced features
• Named entities, semantic similarity with query
– Generalized re-ranking algorithms
– Learned re-ranking preferences• Co-reference, subsumption, rhetorical relations
Sentence alignment
• Based on a statistical model of character lengths (Gale & Church)
• Makes use of the fact that longer sentences in one language tend to be translated into longer sentences in the other language
• Adapt to English-Chinese bilingual corpus
– Compute the length ratio between English and Chinese character
– Mean number of Chinese characters generated by each English character is c=0.6502 , with a standard deviation σ=0.008
0.30%8012=>2
11.04%29,0562=>1
6.60%17,4121=>2
81.65%215,2961=>1
0.28%7511=>0
0.13%357 0=>1
PercentageNumber of Pairs
Total number of sentence pairs = 312,544
Percent Agreement
J1:Relevant J1: Irrelevant
J2:Relevant A B
J2:Irrelevant C D
DCBA
DAagreement
%
Averaged over 3 judge-judge pairs
510
2030
4050
6070
8090
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
% agreement
compression
Humans: Percent Agreement (20-cluster average) and compression
Precision and recall
J1:Relevant J1: Irrelevant
J2:Relevant A B
J2:Irrelevant C D
Precision(J1)=
Recall(J2)=
Recall(J1)=
Precision(J2)=
CA
A
BA
A
CA
A
BA
A
510
2030
4050
6070
8090
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p/r
compression
Random
Humans
Humans: precision/recall (cluster average) and compression
2 46 54 60 61 62 112 125 199 323 398 447 551 827 883 885 1014 1197 241 1018
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p/r
cluster no.
5%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Precision and recall, human agreement
Kappa
• N: number of items (index i)
• n: number of categories (index j)
• k: number of annotators
)(1
)()(
EP
EPAP
N
i
n
jij k
mkNk
AP1 1
2
1
1
)1(
1)(
2
1
1
)(
Nk
mEP
N
iijn
j
510
2030
4050
6070
8090
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K
compression
Humans: Kappa and compression
2 46 54 60 61 62 112 125 199 323 398 447 551 827 883 885 1014 1197 241 1018
5%
20%
40%
60%
80%-0.2
0
0.2
0.4
0.6
0.8
1
K
cluster no.
5%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Kappa, human agreement
2 46 54 60 61 62 112 125 199 323 398 447 551 827 883 885 1014 1197 241 1018
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
K
cluster no
Kappa, human agreement, 40%
2 46 54 60 61 62 112 125 199 323 398 447 551 827 883 885 1014 1197 241 1018
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Random
Websum
MEAD
Humans
LEAD
Kappa, different systems in annotator pool, 20%
0.097
0.1460.168
0.1940.213
0
0.05
0.1
0.15
0.2
0.25
K
Random Websum MEAD Humans Lead
Kappa, different systems in annotator pool, avg, 20%
112125
199241
323398
551883
10141197
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
K
cluster no
MEAD
Humans
Multi-document summaries of length 50 words, kappa on 10 clusters
Content Based Evaluation
• Precision and Recall
F({S1,S2},{S3,S4}) = 0
S1 “The US President visited China”
S4 “The visit of the President of the US to China”
• Vocabulary Overlap
• Text Representation: Bag, Sequence, Vector
• Units: Words, Lemmas, Phrases, Verbs, Nouns, etc.
TEXT1
TEXT2F 0..1
Content Based Evaluation Mono Lingual
HUMAN EXTRACTS MULTI-DOC SUMMARIES
AUTOMATIC EXTRACTS
SINGLE MULTI
FULL DOCUMENT
Content Based Evaluation Cross Lingual
HUMAN EXTRACTS (CH)
AUTOMATIC EXTRACTS (CH)
SINGLE MULTI
FULL DOCUMENT (CH)HUMAN EXTACT (EN)
Text Representation for Content Based Similarity
<S PAR='4' RSNT='1' SNO='4'> <W C='DT' L='a'>A</W> <W C='NNP' L='council'>Council</W> <W C='NN' L='spokesman'>spokesman</W> <W C='VBD' L='say'>said</W> <W C='IN' L='that'>that</W> <W C='DT' L='the'>the</W> <W C='JJ' L='five-storey'>five-storey</W> <W C='NN' L='market'>market</W> <W C=',' L=','>,</W> <W C='RB' L='conveniently'>conveniently</W> <W C='VBN' L='locate'>located</W> <W C='IN' L='at'>at</W>…
a council spokesman said that the five-storey market , conveniently located at 43 - 47 centre street , occupied an area of about 1,200 square metres and accommodated a total of 102 stalls providing a wide range of fresh goods including vegetables , fish , meat and poultry (words)
a council spokesman say that the five-storey market , conveniently locate at 43 - 47 centre street , occupy an area of about 1,200 square metre and accommodate a total of 102 stall provide a wide range of fresh goods include vegetable , fish , meat and poultry (lemmas )
• Words, Lemmas, Nouns, Verbs, …• Term Frequency (word/lemma) • Inverted Document Frequency (word/lemma)
Similarity Measures
• Cosine
• Overlap
t
ii
t
ii
t
iii
ww
wwtt
1
22
1
21
121
21
)()(
*),cos(
||||||||
),(2121
2121
tttttt
ttoverlap
Similarity Measures
• Longest Common Subsequence
),(),(*2 yxeditllyxlcs diyx
21
)max()max(),( 21
tt
i j jiijtext
ll
lcslcsttlcs
¬ü°ê Á`²Î ©M ²Ä¤@ ¤Ò¤H
¬ü°ê Á`²Î ©M ²Ä¤@ ¤Ò¤H
THE US PRESIDENT AND THE FIRST LADY
THE PRESIDENT OF US AND THE 1ST LADY
MONO LINGUAL EVALUATION HUMAN EXTRACTS (COSINE,LEMMAS,NOUNS)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
C_125
C_241
C_323
C_551
C_125 0.50 0.74 0.85 0.91 0.75
C_241 0.59 0.75 0.81 0.90 0.76
C_323 0.48 0.57 0.69 0.75 0.62
C_551 0.28 0.75 0.74 0.75 0.63
10S 20S 30S 40S AVERAGE
MONO LINGUAL EVALUATION HUMAN EXTRACTS (LCS,LEMMAS,NOUNS)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
C_125
C_241
C_323
C_551
C_125 0.38 0.59 0.72 0.81 0.62
C_241 0.36 0.54 0.63 0.76 0.57
C_323 0.33 0.40 0.55 0.59 0.47
C_551 0.17 0.55 0.52 0.53 0.44
10S 20S 30S 40S AVERAGE
CROSS LINGUAL EVALUATION HUMAN EXTRACTS (COSINE)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
C_112
C_125
C_241
C_323
C_551
C_112 0.29 0.52 0.62 0.74 0.54
C_125 0.57 0.77 0.86 0.92 0.78
C_241 0.45 0.65 0.75 0.86 0.68
C_323 0.43 0.54 0.69 0.76 0.60
C_551 0.30 0.69 0.73 0.75 0.62
10S 20S 30S 40S AVERAGE
CROSS LINGUAL EVALUATION HUMAN EXTRACTS (LCS CHR)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
C_112
C_125
C_241
C_323
C_551
C_112 0.31 0.49 0.57 0.70 0.52
C_125 0.39 0.58 0.71 0.80 0.62
C_241 0.33 0.52 0.61 0.73 0.55
C_323 0.36 0.43 0.60 0.67 0.51
C_551 0.18 0.53 0.57 0.59 0.47
10S 20S 30S 40S AVERAGE
MONO LINGUAL (COSINE) C_112
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
SUMM
LEAD
MEAD
RANDOM
WEBSUMM
SUMM 0.01 0.12 0.19 0.24
LEAD 0.27 0.47 0.55 0.72
MEAD 0.37 0.58 0.70 0.81
RANDOM 0.22 0.45 0.60 0.63
WEBSUMM 0.35 0.48 0.62 0.69
10S 20S 30S 40S
MONO LINGUAL EVALUATION (COSINE, LEMMAS, NOUNS) Q112
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
LEAD
MEAD
RANDOM
WEBSUMM
LEAD 0.31 0.52 0.60 0.76
MEAD 0.42 0.63 0.74 0.84
RANDOM 0.27 0.50 0.66 0.69
WEBSUMM 0.42 0.53 0.68 0.75
10S 20S 30S 40S
MONO LINGUAL ALL CLUSTERS (COSINE)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
CYL
LEAD
MEAD
RANDOM
LEXCHAIN
WEBSUMM
CYL 0.14 0.33 0.41 0.47 0.48 0.366
LEAD 0.71 0.68 0.57 0.65 0.66 0.654
MEAD 0.63 0.61 0.64 0.69 0.62 0.638
RANDOM 0.45 0.5 0.51 0.57 0.5 0.506
LEXCHAIN 0.41 0.52 0.53 0.64 0.42 0.504
WEBSUMM 0.54 0.67 0.75 0.66 0.75 0.674
C_112 C_125 C_241 C_323 C_551 AVERAGE
MONO LINGUAL ALL CLUSTERS (LCS)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
CYL
LEAD
LEXCHAIN
MEAD
RANDOM
WEBSUMM
CYL 0.26 0.48 0.37 0.47 0.29 0.37
LEAD 0.44 0.58 0.45 0.63 0.71 0.56
LEXCHAIN 0.47 0.56 0.45 0.62 0.60 0.54
MEAD 0.55 0.58 0.56 0.53 0.49 0.54
RANDOM 0.44 0.43 0.43 0.46 0.48 0.45
WEBSUMM 0.48 0.48 0.55 0.46 0.63 0.52
C_112 C_125 C_241 C_323 C_551 AVERAGE
CROSS LINGUAL EVALUATION (LCS, CHR) Q112
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
CYL
LEAD
MEAD
RANDOM
CYL 0.22 0.47 0.56 0.64
LEAD 0.26 0.38 0.47 0.62
MEAD 0.37 0.53 0.63 0.70
RANDOM 0.26 0.38 0.47 0.47
10S 20S 30S 40S
CROSS LINGUAL ALL CLUSTERS (COSINE)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
CYL
LEAD
MEAD
RANDOM
CYL 0.49 0.79 0.62 0.7 0.65
LEAD 0.46 0.78 0.61 0.69 0.64
MEAD 0.61 0.75 0.69 0.66 0.68
RANDOM 0.45 0.68 0.59 0.55 0.57
C_112 C_125 C_241 C_323 AVERAGE
CROSS LINGUAL ALL CLUSTERS (LCS)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
CYL
LEAD
MEAD
RANDOM
CYL 0.43 0.56 0.43 0.60 0.51
LEAD 0.40 0.56 0.42 0.57 0.49
MEAD 0.52 0.53 0.50 0.57 0.53
RANDOM 0.36 0.45 0.36 0.44 0.40
C_112 C_125 C_241 C_323 AVERAGE
MULTI DOCUMENT SUMMARIES AGREEMENT (COSINE)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
C_125
C_241
C_323
C_551
C_125 0.31 0.41 0.58 0.43
C_241 0.43 0.52 0.63 0.53
C_323 0.32 0.41 0.50 0.41
C_551 0.64 0.47 0.73 0.61
50 200 300 AVERAGE
BIG COS01 COSINE WRD50 0.039854 0.278973 0.171477 0.198361
100 0.043703 0.277186 0.219399 0.210311200 0.067433 0.334611 0.429588 0.227994
BIG COS01 COSINE WRD50 0.063069 0.334818 0.356008 0.233835
100 0.068872 0.322049 0.436582 0.250311200 0.079958 0.330042 0.500884 0.253613
Multi-document summary evaluation (4 queries)
Summary vs. automatic extract
Summary vs. manual extract
Relative Utility
---S10
---S9
---S8
---S7
---S6
---S5
+--S4
---S3
+++S2
-++S1
System 2System 1Ideal
RU = % of ideal utility covered by system summary
--3S10
--3S9
--4S8
--7S7
--6S6
--7S5
+-7S4
--2S3
++8(+)S2
-+10(+)S1
System 2System 1Ideal
A B C D E FG H
I J
R
J
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Relative utility (upper and lower bounds), Q125, 5%
R
J
R 0.648 0.65 0.652 0.465 0.626 0.727 0.509 0.497 0.644 0.566
J 0.715 0.666 0.859 0.726 0.876 0.944 0.909 0.776 0.71 0.869
A B C D E F G H I J
A B C D E FG H
I J
R
J
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Relative utility (upper and lower bounds), Q125, 20%
R
J
R 0.69 0.685 0.679 0.523 0.642 0.741 0.541 0.553 0.699 0.595
J 0.827 0.73 0.866 0.828 0.838 0.913 0.861 0.876 0.736 0.874
A B C D E F G H I J
A B C D E FG H
I J
R
J
0.45
0.55
0.65
0.75
0.85
0.95
Relative utility (upper and lower bounds), Q125, 40%
R
J
R 0.74 0.738 0.724 0.653 0.695 0.77 0.647 0.679 0.764 0.664
J 0.836 0.754 0.878 0.954 0.91 0.952 0.919 0.954 0.811 0.904
A B C D E F G H I J
Relative Utility (RU) per summarizer and compression rate (Single-document)
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Compression rate
Su
mm
ariz
er J
R
WEBS
MEAD
LEAD
J 0.785 0.79 0.81 0.833 0.853 0.875 0.913 0.94 0.962 0.982
R 0.636 0.65 0.68 0.711 0.738 0.765 0.804 0.84 0.896 0.961
WEBS 0.761 0.765 0.776 0.801 0.828
MEAD 0.748 0.756 0.764 0.782 0.808 0.834 0.863 0.895 0.921 0.968
LEAD 0.733 0.738 0.772 0.797 0.829 0.85 0.877 0.906 0.936 0.973
5 10 20 30 40 50 60 70 80 90
Relative Utility (RU) per compression rate (Multi-document)
0.61
0.63
0.65
0.67
0.69
0.71
0.73
0.75
0.77
0.79
0.81
Compression rate
RU
R
S
J
R 0.6116 0.6302 0.6614 0.6894
S 0.6928 0.7246 0.7476 0.766
J 0.6886 0.7296 0.7582 0.7904
5 10 20 30
Relevance Preservation Value (RPV) as a function of compression rate (RANDOM)
0.44
0.54
0.64
0.74
0.84
0.94
Summary length (%)
RPV
Query 112
Query 125
Query 241
Query 323
Query 551
AVERAGE (10 queries)
Query 112 0.5 0.64 0.8 0.86 0.91 0.93 0.95 0.97 0.98 0.99
Query 125 0.44 0.66 0.78 0.87 0.91 0.91 0.96 0.97 0.98 0.99
Query 241 0.68 0.77 0.87 0.91 0.94 0.96 0.97 0.98 0.99 1
Query 323 0.63 0.78 0.85 0.9 0.93 0.95 0.97 0.98 0.99 1
Query 551 0.52 0.69 0.79 0.88 0.92 0.94 0.95 0.97 0.98 0.99
AVERAGE (10 queries) 0.553 0.687 0.8 0.874 0.912 0.932 0.956 0.973 0.984 0.992
5 10 20 30 40 50 60 70 80 90
FD
ME
AD
WE
BS
LEA
D
RA
ND
SU
MM
Q12
5
Q55
1
AV
G(1
0Q)
Q11
2
Q24
1
Q32
3
0.77
0.82
0.87
0.92
0.97
RPV
Summarizer
Query
Relevance Preservation Value (RPV) for different summarizers (English, 20%)
Q125
Q551
AVG(10Q)
Q112
Q241
Q323
Q125 1 0.92 0.82 0.8 0.78 0.79
Q551 1 0.9 0.88 0.81 0.79 0.81
AVG(10Q) 1 0.903 0.843 0.802 0.8 0.775
Q112 1 0.91 0.88 0.8 0.8 0.77
Q241 1 0.93 0.89 0.84 0.87 0.85
Q323 1 0.92 0.91 0.85 0.85 0.88
FD MEAD WEBS LEAD RAND SUMM
FD
ME
AD
SU
MM
ALG
N
LEA
D
RA
ND
Q11
2
Q32
3
Q55
1
AV
G(1
0Q)
Q12
5
Q24
1
0.58
0.63
0.68
0.73
0.78
0.83
0.88
0.93
0.98
RPV
Summarizer
Query
Relevance Preservation Value (RPV) for different summarizers (Chinese, 20%)
Q112
Q323
Q551
AVG(10Q)
Q125
Q241
Q112 1 0.87 0.76 0.74 0.72 0.71
Q323 1 0.66 0.84 0.59 0.58 0.6
Q551 1 0.91 0.75 0.72 0.75 0.74
AVG(10Q) 1 0.85 0.755 0.738 0.733 0.744
Q125 1 0.87 0.75 0.72 0.71 0.75
Q241 1 0.93 0.85 0.83 0.83 0.85
FD MEAD SUMM ALGN LEAD RAND
FDMEAD
WEBSLEAD
SUMMRAND 5%
10%20%
30%40%
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
RPV
Summarizer
Compression rate
Relevance Preservation Value (RPV) per compression rate and summarizer (English, 5 queries)
5%
10%
20%
30%
40%
5% 1 0.724 0.73 0.66 0.622 0.554
10% 1 0.834 0.804 0.73 0.71 0.708
20% 1 0.916 0.876 0.82 0.82 0.818
30% 1 0.946 0.912 0.88 0.848 0.884
40% 1 0.962 0.936 0.906 0.862 0.922
FD MEAD WEBS LEAD SUMM RAND
SUMMLEAD
MEADRAND
WEBS
with cutoff
no cutoff0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
RPV
Summarizer
Correlation method
Relevance Preservation Value (RPV) with and without cutoff (English, 5%)
with cutoff
no cutoff
with cutoff 0.48 0.55 0.61 0.29 0.6
no cutoff 0.61 0.59 0.74 0.44 0.63
SUMM LEAD MEAD RAND WEBS
SUMMLEAD
MEADRAND
WEBS
with cutoff
no cutoff0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
RPV
Summarizer
Correlation method
Relevance Preservation Value (RPV) with and without cutoff (English, 10%)
with cutoff
no cutoff
with cutoff 0.65 0.65 0.76 0.56 0.7
no cutoff 0.73 0.71 0.84 0.66 0.72
SUMM LEAD MEAD RAND WEBS
SUMMLEAD
MEADRAND
WEBS
with cutoff
no cutoff0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RPV
Summarizer
Correlation method
Relevance Preservation Value (RPV) with and without cutoff (English, 20%)
with cutoff
no cutoff
with cutoff 0.71 0.74 0.88 0.72 0.8
no cutoff 0.79 0.8 0.92 0.78 0.82
SUMM LEAD MEAD RAND WEBS
AS
GE
ME
AD
ME
AD
OR
IG
ME
AD
002
ME
AD
003
ME
AD
S00
2
Q55
1
Q11
2
Q-A
VG Q12
5
Q32
3
Q24
1
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
RPV
MEAD policy
Query
Relevance Preservation Value (RPV) per MEAD policy (5 queries)
Q551
Q112
Q-AVG
Q125
Q323
Q241
Q551 0.88 0.9 0.89 0.89
Q112 0.86 0.91 0.9 0.9 0.9
Q-AVG 0.886 0.916 0.908 0.908 0.9125
Q125 0.87 0.92 0.91 0.91 0.91
Q323 0.89 0.92 0.91 0.91 0.91
Q241 0.93 0.93 0.93 0.93 0.93
ASGEMEAD MEADORIG MEAD002 MEAD003 MEADS002
BASE English Chinese-MonoChinese-XAVG-5 English 91.6 48.8 45.8
20% Chinese-Mono 61.4 84.8 34.6Chinese-X 63.2 36 69.4
RPV, E/C-Mono/C-Xlingual
Addressing redundancy
--3S10
--3S9
--4S8
--7S7
--6S6
--7S5
+-7S4
--2S3
++8(+)S2
-+10(+)S1
System 2System 1Ideal
Addressing redundancy
--3S10
--3S9
--4S8
--7S7
--6S6
--7S5
+-7S4
--2S3
++8(+)S2
-+10(+)S1
System 2System 1Ideal
RU1’ = (8+ 10 )/(8+10)
RU2’ = (8+ 7)/(8+10)RU2 = (8+7)/(8+10)
RU1 = (8+10)/(8+10)
Total number of name entities in Mead and Random summaries
020000400006000080000
100000
MEAD
RANDOM
Number of different name entities in Mead and Random summaries
0
5000
10000
15000
20000
MEAD
RANDOM
43.5% name entities are kept in 20% Mead summaries
43.5%
22.6% name entities are kept in 20% random summaries
22.6%
59.3% different name entities are kept in 20% Mead summaries
59.3%
40.7% different name entities are kept in 20% Random summaries
40.7%
Narcotics rehabilitation translation ¬r«~ ±d´_
The study was commissioned to The Chinese University of Hong Kong, and aims to find out the underlying causes for the chronic abuse of illicit drugs and the relapse into drug addiction after detoxification and rehabilitation. Commenting on the research topic, a spokesman of the Narcotics Division said, "It is considered that there is a demonstrated need for such a project, as relapse into drug addiction following detoxification and rehabilitation has been an issue of great concern for policy-makers and workers in the anti-drug field." "It is hoped that the findings of the study will help the Government and ACAN develop effective treatment strategies for chronic or frequently relapsed drug abusers, and look into what modifications should be made to the existing methadone programme. "It is recognized in psychiatric field that, drug abuse cases which are identified early usually stand a better chance of recovering in full, while late ones normally have more severe or perhaps life-long psychiatric damage," the spokesman said.
Run mead on English document
Run mead on Chinese document
Sentence Alignment
Commenting on the research topic, a spokesman of the Narcotics Division said, "It is considered that there is a demonstrated need for such a project, as relapse into drug addiction following detoxification and rehabilitation has been an issue of great concern for policy-makers and workers in the anti-drug field." "It is hoped that the findings of the study will help the Government and ACAN develop effective treatment strategies for chronic or frequently relapsed drug abusers, and look into what modifications should be made to the existing methadone programme. "This coincide with the observation that relapse is an acute problem for those who have a long drug addiction history, in particular methadone patients," he added. "It is recognized in psychiatric field that, drug abuse cases which are identified early usually stand a better chance of recovering in full, while late ones normally have more severe or perhaps life-long psychiatric damage," the spokesman said.
Cosine similarity with human extracts
00.10.20.30.40.50.60.70.8
125 241 323 551 112
English Meadsummaries
Aligned Englishsummaries fromChinese Meadsummaries
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Q-125
Q-323
Q-112
Q-398Q-1014
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
precision
recall
PR graph of monolingual retrieval of English full-length documents
Q-125
Q-241
Q-323
Q-551
Q-112
Q-199
Q-398
Q-883
Q-1014
Q-1197
1 2 3 4 5 6 7 8 9 10 11
Q-447
Q-2
Q-60
Q-1018
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
precision
recall
PR graph of monolingual retrieval of English full-length documents
Q-447
Q-827
Q-885
Q-2
Q-46
Q-54
Q-60
Q-61
Q-62
Q-1018
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Q-125
Q-551
Q-398
Q-1197
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
precision
recall
PR graph of monolingual retrieval of Chinese full-length documents
Q-125
Q-241
Q-323
Q-551
Q-112
Q-199
Q-398
Q-883
Q-1014
Q-1197
1 2 3 4 5 6 7 8 9 10 11
Q-447
Q-2
Q-60
Q-1018
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
precision
recall
PR graph of monolingual retrieval of Chinese full-length documents
Q-447
Q-827
Q-885
Q-2
Q-46
Q-54
Q-60
Q-61
Q-62
Q-1018
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Chinese crosslingualChinese monolingual
English monolingual0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
precision
recall
PR graph of retrieval of English full-length documents (20 query average)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RANDOM
Sent-Algn-MEAD
SUMM
WEBSUMM
LEAD
MEAD
full-length
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
precision
recall
PR graph of retrieval of English summaries (20%)
RANDOM
Sent-Algn-MEAD
SUMM
WEBSUMM
LEAD
MEAD
full-length
0.3298
Sent-Algn-MEAD
0.351
Summarist
0.29930.35360.38450.38920.4387Avg. Prec.
randomWebsummleadMEADFull-length
Precision at the first 30 retrieved documents/summaries (20%)
0.4
0.41
0.42
0.43
0.44
0.45
0.46
0.47
0.48
0.49
pre
cis
ion
Research contributions
• Relevance preservation value: compared to established evaluation metrics
• Relative utility: large-scale evaluation
• Comparison of query-based and generic summarization
• Comparison of manual extracts and manual summaries
• Cross-lingual summarization using alignment
• Evaluation of cross-lingual summaries
Properties of evaluation metricsKappa,P/R,accuracy
RU Wordoverlap,cosine, lcs
Relevancepreserv.
Agreement Humanextracts
X X X
Agreement humanextracts – automaticextracts
X X X
Agreement humansummaries/extracts
X
Non-binary decisions X X X
Full documents vs.extracts
X X
Systems with differentsentence segm.
X X
Multidocument extracts X X X
Full corpus coverage X X
Technical accomplishments
• Develop a summarization toolkit including a modular state-of-the art summarizer: single-document, multi-document, generic, query-based
• Develop a summarization evaluation toolkit allowing comparisons between extractive and non-extractive summaries
• Produce an annotated corpus for further research in text summarization
Future work
• Evaluate trainable framework
• Explore subsumption
• Fact-based evaluation
• Task-based evaluation
• Determine optimal compression rate
Acknowledgements
• Chin-Yew Lin, ISI• Inderjeet Mani, MITRE• Breck Baldwin, Coreference.com• Regina Barzilay, Columbia• Greg Silber, Delaware• Dan Melamed, NYU• Ralph Weischedel, Sean Boisen, BBN
Acknowledgements
• Stephanie Strassel, LDC
• Dave Graff, LDC
• Chris Cieri, LDC
• Donna Harman, NIST
• Paul Over, NIST
• Sasha Blair-Goldensohn, Michigan
• Sanjeev Khudanpur, JHU
• Bill Byrne, JHU
• Laura Graham, JHU
• Jacob Laderman, JHU
• Fred Jelinek, JHU
• The US government agencies who sponsored the research