The Computational Linguistics Summarization Pilot task @ TAC 2014 Kokil Jaidka †, Muthu Kumar...
-
Upload
hilary-gibbs -
Category
Documents
-
view
224 -
download
0
Transcript of The Computational Linguistics Summarization Pilot task @ TAC 2014 Kokil Jaidka †, Muthu Kumar...
The Computational Linguistics Summarization Pilot task @ TAC
2014Kokil Jaidka†, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡
Nanyang Technological University †
Dept. of Computer Science, National University of Singapore *Web, IR / NLP Group ‡, National University of Singapore
Scientific Document Summarization
I have an abstract. I am done!
Photo Credits Dennis Jarvis @flickr
2TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Outline• Citation based extractive summaries• Facetted summaries• Automatic literature review• CL development corpus• Annotation• TAC 2015: CL-Summ track• Acknowledgements
3TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Scientific Document Summarization
• Abstracts– Authors’ own summary.
• Citation summary– Scientific community creates summaries of
research papers while they cite a paper but…
• Facetted summaries– Capture all aspects of a paper.
5TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
21 April 2023 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 6
Citation summary & facets
Image credits Ken Ammi @flickr
Structured Abstract:Common in Medicine, Biomed,Bioinformatics domains
Facetted summaries
7TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Facets & Argumentative zones
21 April 2023 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 8
Scientific Document SummarizationCitation based extractive summaries
Scope of Citation• Qazvinian, V., & Radev, D. R. “Identifying non-explicit citing
sentences for citation-based summarization” (ACL, 2010)
• Abu-Jbara, Amjad, and Dragomir Radev. "Reference scope identification in citing sentences.” (ACL, 2012)
Coherence• Abu-Jbara, Amjad, and Dragomir Radev. "Coherent citation-
based summarization of scientific papers.” (ACL 2011)
9TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Scientific Document Summarization & Automatic Literature Review
10TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
11TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Scientific Document Summarization & Automatic Literature Review
Free to access at: http://acl-arc.comp.nus.edu.sg/
12TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
SciSumm Corpus• 10 reference papers or topics randomly
sampled from the ACL ARC corpus.• Upto 10 citing papers per reference paper
including those outside ACL ARC.
13TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Annotation pipeline
21 April 2023 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 14
AUTOMA
TIC SUMAUTOMA
TIC SUM
SCI DOC
SUMMSCI DOC
SUMM
<xml>
<abstract>
…….
</abstract>…
…
<xml>
<abstract>
…….
</abstract>…
…<xml>
<abstract>
…….
</abstract>…
…
<xml>
<abstract>
…….
</abstract>…
…
Annotation!
Post Processing to Biomedsumm format:
1.Scripts from U. Colorado (Prabha)
2.Sentence segmented version from U.Mich (Rahul)
OCR & section parse
OCR & section parse
ParsCit ‘s:SectLabel module
• 3 annotators in all.• Released data has one gold standard
annotation per topic or reference paper.• Discourse facet has a minor change from
Biomedsumm’s categories.
Annotating the SciSumm corpus
15TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
• Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance
Tasks
16TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Reference Paper (RP)Reference Paper (RP)
Citing papers.Citing text is called citance
Tasks• Task 1B: For each cited text span,
identify what facet of the paper it belongs to, from a predefined set of facets.
17TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Reference Paper (RP)Reference Paper (RP)
Mark the cited text in RP and provide its facet.
Citing papers.Citing text is called citance
Evaluation• Small corpus: 10 fold cross validated
evaluation over the 10 documents.• Task 1a scored by overlap with
citances.• Task 1b scored by overlap with
reference text spans.
TAC Biomedsumm Track - The Computational Linguistics Pilot Task 18
Task & evaluation: highlights
• First corpus in the CL that incorporates prior research findings on citation based summaries.
• 10 teams from 5 different countries participated in the evaluation.
19TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Limitations• No gold standard summaries yet
• OCR errors: We hope to have corrected them manually.
• But mainly, we need more annotated data!
21 April 2023 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 20
TAC 2015: CL-Summ shared task
• Plans to rollout a full-fledged official shared task for the CL corpus.
• 20 training topics
• 10 test topics
• 3 annotations per summary.
21 April 2023 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 21
TAC 2015: We need you help!
• We seek support from– summarization community in general and – CL community in particular
to provide manpower for annotating the corpus
• Great to have all participating teams contribute!
21 April 2023 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 22
Acknlowledgements• Hoa Dang, NIST
• Lucy Vanderwende, MSR
• All Biomedsumm track participants.
• This research is partially supported by CSIDM
23TAC Biomedsumm Track - The Computational Linguistics Pilot Task21 April 2023
Questions? Thank you!