» Summarisation Levels in SAP COPA #2 – Define Your Summarisation Level
Impact of Citing Papers for Summarisation of Clinical Documents
-
Upload
diego-molla-aliod -
Category
Documents
-
view
125 -
download
1
Transcript of Impact of Citing Papers for Summarisation of Clinical Documents
Impact of Citing Papers for (Extractive)Summarisation of Clinical Documents
Diego Molla1 Christopher Jones1 Abeed Sarker2
1Macquarie University 2Arizona State UniversitySydney, Australia Tempe, AZ, USA
ALTA 2014, Melbourne, Australia
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Citation-based summarisation
Extractive Summarisation
Build a summary by extracting text from the original document.
Citation-based summarisation
Use information from citing texts to build a summary.
I Citance: Text surrounding a reference in a citing paper.
⇐=
Citing Papers for Summarisation D Molla, C Jones, A Sarker 2/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Contents
Background and Related Work
Finding the Best Fit to a Citance
Extracting the Summary
Citing Papers for Summarisation D Molla, C Jones, A Sarker 3/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Contents
Background and Related Work
Finding the Best Fit to a Citance
Extracting the Summary
Citing Papers for Summarisation D Molla, C Jones, A Sarker 4/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Related Work
Citation-based summarisation
I Garfield et al. (1964) performed manual citation analysis tocharacterise a publication.
I Nakov et al. (2004) used the text from citances to build asummary.
I Qazvinian & Radev (2010) automatised the extraction ofcitances.
I Further work on using citances as a surrogate of, or as anaddition to, the document summary (Mohammad et al. 2009;Abu-Jbara & Radev 2011).
Citing Papers for Summarisation D Molla, C Jones, A Sarker 5/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
TAC2014 BiomedSumm Track
Data set
I Training and development: 20 documents.I 4 annotators per document.
I Test set: 30 documents.
I Each document has 10 referring papers.
Tasks
Task 1a Identify the text spans from the reference paper that mostaccurately reflect the text from the citance.
Task 1b Classify what facet of the paper a text span belongs to.
Task 2 Generate a structured summary of the reference paper and allof the community discussion of the paper represented in thecitances.
Citing Papers for Summarisation D Molla, C Jones, A Sarker 6/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Citances for Extractive Summarisation
Research Question
Can information from citances help to extract sentences from thereference paper?
Method
1. Rank sentences from reference paper according to their fit tocitances (BiomedSumm task 1a).
2. Integrate the sentence rankings into extractive summarisation.
3. Compare against versions that do not incorporate thesentence rankings.
Citing Papers for Summarisation D Molla, C Jones, A Sarker 7/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Contents
Background and Related Work
Finding the Best Fit to a Citance
Extracting the Summary
Citing Papers for Summarisation D Molla, C Jones, A Sarker 8/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
General Approach
⇐=
Approach
1. Build vector models of each sentence (tf .idf or SVD).
2. Compute cosine similarity between sentence and citance.
3. Extract top n = 5 sentences (straight or after MMR).
Citing Papers for Summarisation D Molla, C Jones, A Sarker 9/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
tf .idf and SVD
Finding best match to citance 1 in reference paper 1
Reference paper rp1rp sentence 1rp sentence 2· · ·
Citances to rp1citance 1citance 2· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1 · · ·rp 2 · · ·· · · · · ·c 1 · · ·c 2 · · ·· · · · · ·
}•
cosine
score rp 1score rp 2· · ·
Citing Papers for Summarisation D Molla, C Jones, A Sarker 10/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Data to the tf .idf (SVD) Model
Topics All sentences from citing papers of reference paper 1.
Documents All sentences from all BiomedSumm documents.
Abstracts All sentences from a separate collection of 2,657PubMed abstracts.
Finding best match to citance 1 in reference paper 1
Reference paper rp1rp sentence 1rp sentence 2· · ·
Citances to rp1citance 1citance 2· · ·
Topicstop sentence 1top sentence 2· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1 · · ·rp 2 · · ·· · · · · ·c 1 · · ·c 2 · · ·· · · · · ·t 1 · · ·t 2 · · ·· · · · · ·
}•
cosine
score rp 1score rp 2· · ·
Citing Papers for Summarisation D Molla, C Jones, A Sarker 11/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Data to the tf .idf (SVD) Model
Topics All sentences from citing papers of reference paper 1.
Documents All sentences from all BiomedSumm documents.
Abstracts All sentences from a separate collection of 2,657PubMed abstracts.
Finding best match to citance 1 in reference paper 1
Reference paper rp1rp sentence 1rp sentence 2· · ·
Citances to rp1citance 1citance 2· · ·
Documentsdoc sentence 1doc sentence 2· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1 · · ·rp 2 · · ·· · · · · ·c 1 · · ·c 2 · · ·· · · · · ·d 1 · · ·d 2 · · ·· · · · · ·
}•
cosine
score rp 1score rp 2· · ·
Citing Papers for Summarisation D Molla, C Jones, A Sarker 11/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Data to the tf .idf (SVD) Model
Topics All sentences from citing papers of reference paper 1.
Documents All sentences from all BiomedSumm documents.
Abstracts All sentences from a separate collection of 2,657PubMed abstracts.
Finding best match to citance 1 in reference paper 1
Reference paper rp1rp sentence 1rp sentence 2· · ·
Citances to rp1citance 1citance 2· · ·
Abstractsabs sentence 1abs sentence 2· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1 · · ·rp 2 · · ·· · · · · ·c 1 · · ·c 2 · · ·· · · · · ·a 1 · · ·a 2 · · ·· · · · · ·
}•
cosine
score rp 1score rp 2· · ·
Citing Papers for Summarisation D Molla, C Jones, A Sarker 11/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Context to the tf .idf (SVD) Model
Extend sentences with a context window of n sentences.
Finding best match to citance 1 in reference paper 1, contextwindow n = 20
Reference paper rp1rp sentences 1..10rp sentences 2..11· · ·
Citances to rp1citances 1..10citances 2..11· · ·
Topicstop sentences 1..10top sentences 2..11· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1..10 · · ·rp 2..11 · · ·· · · · · ·c 1..10 · · ·c 2..11 · · ·· · · · · ·t 1..10 · · ·t 2..11 · · ·· · · · · ·
}•
cosine
score rp 1score rp 2· · ·
Citing Papers for Summarisation D Molla, C Jones, A Sarker 12/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Information from WordNet and UMLS
I Replace word with WordNet synsets / UMLS ID / UMLSsemantic types.
I Use word and WordNet synsets / UMLS ID / UMLS semantictypes.
I Linear combination of scores: 0.5× w + 0.2× c + 0.3× s.
Finding best match to citance 1 in reference paper 1, context window n = 20
Reference paper rp1Synsets/UMLS 1..10Synsets/UMLS 2..11· · ·
Citances to rp1citance 1..10citance 2..11· · ·
Topicstop sentence 1..10top sentence 2..11· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1..10 · · ·rp 2..11 · · ·· · · · · ·c 1..10 · · ·c 2..11 · · ·· · · · · ·t 1..10 · · ·t 2..11 · · ·· · · · · ·
}•
cosine
score 1score 2· · ·
c 1c 2· · ·
s 1s 2· · ·
+
Citing Papers for Summarisation D Molla, C Jones, A Sarker 13/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Information from WordNet and UMLS
I Replace word with WordNet synsets / UMLS ID / UMLSsemantic types.
I Use word and WordNet synsets / UMLS ID / UMLS semantictypes.
I Linear combination of scores: 0.5× w + 0.2× c + 0.3× s.
Finding best match to citance 1 in reference paper 1, context window n = 20
Reference paper rp1Synsets/UMLS+words 1..10Synsets/UMLS+words 2..11· · ·
Citances to rp1citance 1..10citance 2..11· · ·
Topicstop sentence 1..10top sentence 2..11· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1..10 · · ·rp 2..11 · · ·· · · · · ·c 1..10 · · ·c 2..11 · · ·· · · · · ·t 1..10 · · ·t 2..11 · · ·· · · · · ·
}•
cosine
score 1score 2· · ·
c 1c 2· · ·
s 1s 2· · ·
+
Citing Papers for Summarisation D Molla, C Jones, A Sarker 13/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Adding Information from WordNet and UMLS
I Replace word with WordNet synsets / UMLS ID / UMLSsemantic types.
I Use word and WordNet synsets / UMLS ID / UMLS semantictypes.
I Linear combination of scores: 0.5× w + 0.2× c + 0.3× s.
Finding best match to citance 1 in reference paper 1, context window n = 20
Reference paper rp1Sentence 1..10Sentence 2..11· · ·
Citances to rp1citance 1..10citance 2..11· · ·
Topicstop sentence 1..10top sentence 2..11· · ·
tf .idf (+SVD)w1w2 · · ·
rp 1..10 · · ·rp 2..11 · · ·· · · · · ·c 1..10 · · ·c 2..11 · · ·· · · · · ·t 1..10 · · ·t 2..11 · · ·· · · · · ·
}•
cosine
w 1w 2· · ·
c 1c 2· · ·
s 1s 2· · ·
+
Citing Papers for Summarisation D Molla, C Jones, A Sarker 13/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Oracle
Oracle
I Use one annotator’s output as the system output.
I Compare against all other annotators.
I Average results.
Citing Papers for Summarisation D Molla, C Jones, A Sarker 14/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Results
System R P F1 F1 95% CI
Abstracts 0.190 0.230 0.193 0.179 - 0.208tf.idf 0.331 0.290 0.290 0.276 - 0.303MMR λ = 0.97 0.334 0.293 0.293 0.279 - 0.307SVD with 500 components 0.334 0.295 0.295 0.281 - 0.308
Topics 0.344 0.311 0.307 0.293 - 0.3210.2c + 0.3s + 0.5w 0.364 0.294 0.309 0.297 - 0.320MMR λ = 0.97 on topics 0.345 0.314 0.311 0.296 - 0.325Topics + context 20 0.333 0.334 0.312 0.297 - 0.3260.2c + 0.3s + 0.5w on topics + context 20 0.356 0.307 0.312 0.299 - 0.325Documents + context 20 0.334 0.336 0.314 0.299 - 0.327Documents 0.347 0.325 0.316 0.303 - 0.330Documents + abstracts 0.347 0.327 0.317 0.302 - 0.332MMR λ = 0.97 on topics + context 20 0.336 0.340 0.317 0.303 - 0.331
Topics + context 50 0.341 0.336 0.318 0.302 - 0.332
Oracle 0.442 0.484 0.413 0.404 - 0.421
Table : ROUGE-L results of TAC task 1a, sorted by F1. The best result is inboldface, and all results within the 95% confidence interval range of the bestresult are in italics.
Citing Papers for Summarisation D Molla, C Jones, A Sarker 15/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Contents
Background and Related Work
Finding the Best Fit to a Citance
Extracting the Summary
Citing Papers for Summarisation D Molla, C Jones, A Sarker 16/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Oracle, Baselines
Oracle
I Use the data of one annotator.
I Compare against other annotators.
Baselines
I Score sentence i with the sum of tf .idf /SVD of its vector.
I Use same variants as in task 1a (add texts, context, WordNet,UMLS)
Citing Papers for Summarisation D Molla, C Jones, A Sarker 17/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Using Citing Text
Approach
1. Rank sentences as in task 1a.
2. Combine ranks to produce a final score.
score(i) =∑
c∈citances
1− rank(i , c)
n
Citing Papers for Summarisation D Molla, C Jones, A Sarker 18/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Results: BiomedSumm Summaries
System R P F1 F1 95% CI
Oracle 0.459 0.461 0.458 0.446 - 0.470
tf.idf 0.260 0.264 0.260 0.226 - 0.290SVD with 500 components 0.264 0.247 0.254 0.236 - 0.272Topics 0.260 0.265 0.261 0.226 - 0.292Documents 0.259 0.265 0.260 0.224 - 0.290Topics + context 5 0.259 0.265 0.261 0.226 - 0.291Topics + context 20 0.252 0.261 0.255 0.220 - 0.285
task1a (tf.idf) 0.384 0.375 0.378 0.350 - 0.408task1a (MMR λ = 0.97 on topics) 0.398 0.396 0.396 0.372 - 0.421task1a (MMR λ = 0.97 on topics + context 20) 0.420 0.407 0.412 0.385 - 0.438task1a (0.2c + 0.3s + 0.5w) 0.398 0.392 0.394 0.369 - 0.419task1a (0.2c + 0.3s + 0.5w on topics) 0.405 0.399 0.401 0.378 - 0.423task1a (0.2c + 0.3s + 0.5w on topics + context 20) 0.417 0.404 0.409 0.387 - 0.431
Table : Rouge-L results of task 2 using the TAC 2014 data. The summary sizewas constrained to 250 words. In boldface is the best result. In italics are theresults within the 95% confidence intervals of the best result.
Citing Papers for Summarisation D Molla, C Jones, A Sarker 19/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Results: Targetting Paper Abstracts
System R P F1 F1 95% CI
tfidf 0.293 0.192 0.227 0.190 - 0.261SVD with 500 components 0.291 0.181 0.218 0.197 - 0.239Documents 0.289 0.192 0.226 0.188 - 0.2600.2c + 0.3s + 0.5w 0.314 0.210 0.247 0.207 - 0.284
task1a (tfidf) 0.425 0.264 0.320 0.293 - 0.353task1a (MMR λ = 0.97) 0.418 0.275 0.324 0.299 - 0.351task1a (MMR λ = 0.97 on topics) 0.436 0.272 0.330 0.300 - 0.363task1a (0.2c + 0.3s + 0.5w) 0.439 0.276 0.333 0.308 - 0.358task1a (0.2c + 0.3s + 0.5w on topics) 0.428 0.276 0.330 0.304 - 0.357task1a (0.2c + 0.3s + 0.5w on topics + context 20) 0.451 0.279 0.338 0.312 - 0.366
Table : Rouge-L results of task 2 using the document abstracts as the targetsummaries. The summary size was constrained to 250 words. In boldface isthe best result. In italics are the results within the 95% confidence intervals ofthe best result.
Citing Papers for Summarisation D Molla, C Jones, A Sarker 20/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Conclusions
Conclusions
1. Used unsupervised methods to find the best fit to a citance.
2. Results of best fit improved as we added data to the tf .idf /SVD modelsand as we added context.
3. Information from the citances can help in extractive summarisation.
Further Work
I If there are more training data, try supervised methods.
I Use annotators to produce target summaries.
I Explore new methods to add further data and context.
Questions?
Citing Papers for Summarisation D Molla, C Jones, A Sarker 21/21
Background and Related Work Finding the Best Fit to a Citance Extracting the Summary
Conclusions
Conclusions
1. Used unsupervised methods to find the best fit to a citance.
2. Results of best fit improved as we added data to the tf .idf /SVD modelsand as we added context.
3. Information from the citances can help in extractive summarisation.
Further Work
I If there are more training data, try supervised methods.
I Use annotators to produce target summaries.
I Explore new methods to add further data and context.
Questions?
Citing Papers for Summarisation D Molla, C Jones, A Sarker 21/21