1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1...
-
Upload
clementine-nichols -
Category
Documents
-
view
230 -
download
1
Transcript of 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1...
![Page 1: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/1.jpg)
1
Learning Sub-structures of Document Semantic Graphs for Document Summarization
1Jure Leskovec, 1Marko Grobelnik, 2Natasa Milic-Frayling1Jozef Stefan Institute Ljubljana, Slovenia
2Microsoft Research Cambridge, UK
Presenter: Yi-Ting Chen
![Page 2: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/2.jpg)
2
Reference
• Jure Leskovec, Marko Grobelnik, Natasa Milic-Frayling, “Learning Sub-structures of Document Semantic Graphs for Document Summarization”, LinkKDD 2004, August 2004, Seattle, WA, USA.
![Page 3: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/3.jpg)
3
Outline
• Introduction• Semantic Graph Generation• Learning Semantic Sub-Graphs Using Support Vector
Machines & Experimental• Conclusions
![Page 4: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/4.jpg)
4
Introduction (1/2)
• In this paper, identification of textual elements for using in extracting summaries are primarily concerned– The assumption that capturing semantic structure of a document
is essential for summarization– Then, to create semantic representations of the document,
visualized as semantic graphs, and learn the model to extract sub-structures that could be used in document extracts
• The basic elements of our semantic graphs are logical form triples, subject-predicate-object
• Experiments show that characteristics of the semantic graphs are most prominent attributes in the learnt SVM model
![Page 5: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/5.jpg)
5
Introduction (2/2)
• Machine learning algorithms are applied to capture characteristics of human extracted summary sentences
• Elementary syntactic structures from individual sentences in the form of logical form triples are extracted
• Then using semantic properties of nodes in the triples to build semantic graphs for both documents and corresponding summaries
• This paper use logical form triples as basic features and apply Support Vector Machines to learn the summarization model
![Page 6: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/6.jpg)
6
Semantic Graph Generation (1/7)
• To generating semantic representations of documents and summaries involves five phases:– Deep syntactic analysis– Co-reference resolution– Pronominal reference resolution– Semantic normalization– Semantic graph analysis
![Page 7: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/7.jpg)
7
Semantic Graph Generation (2/7)
• An example that outlines the main process and output of different stages involved in generating the semantic graph
To perform deep syntactic analysis to the raw text
The analysis applying several methods: co-reference resolution, pronominal anaphora resolution, and semantic normalization
Resulting triples are then linked based on common concepts into a semantic graph
![Page 8: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/8.jpg)
8
Semantic Graph Generation (3/7)
• Linguistic Analysis– For linguistic analysis of text we use Microsoft’s NLPWin natural l
anguage processing tool– NLPWin first segments the text into individual sentences, convert
s sentence text into a parse tree (Figure2) and then produces a sentence logical form that reflects the meaning (Figure3)
The logical form (triples): “Jure”←”send”→”Marko”
For each node, semantic tags that are assigned by the NLPWin software
![Page 9: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/9.jpg)
9
Semantic Graph Generation (4/7)
• Co-reference Resolution For Named Entities– In document it is common that terms with different surface forms
refer to the same entity– The algorithm, for example, correctly finds that “Hillary Rodham
Clinton”, “Hillary Clinton”, “Hillary Rodham”, and “Mrs. Clinton” all refer to the same entity
• Pronomial Anaphora Resolution– To focus on resolving only five basic types of pronouns including
all different forms: “he” (his, him, himself), “she”, “I”, “they”, and “who”
– For a given pronoun, to search sequentially through sentences, backward and forward and check the type of named entity in order to find suitable candidates
– Each candidate is scored and among the scored candidate entities to chose the one with the best score
![Page 10: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/10.jpg)
10
Semantic Graph Generation (5/7)
• Pronomial Anaphora Resolution– The set contained 1506 pronouns of which 1024 (68%) were fro
m the above selected pronoun types– Average accuracy over the 5 selected pronoun types is 82.1%
![Page 11: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/11.jpg)
11
Semantic Graph Generation (6/7)
• Semantic Normalization– In order to increase the coherence and compactness of the grap
hs, WordNet can be used to establish synonymous relationships among node
– Synonymy information of concepts enables us to find equivalence relations between distinct triples
– For example, Watcher ← follow →moon
spectator← watch →moon
• Construction of a Semantic Graph– To merge the logical form triples on subject and object nodes whi
ch belong to the same normalized semantic class and thus produce semantic graph
– For each node, to calculate in/out degree of a node, PageRank weight, Hubs and Authorities weights and many more
![Page 12: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/12.jpg)
12
Semantic Graph Generation
![Page 13: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/13.jpg)
13
Learning Semantic Sub-Graphs Using Support Vector Machines & Experimental
• DUC Dataset– Document Understanding Conference (DUC) 2002 (300
newspaper on 30 different topics)– For each article have a 100 word human written abstract and for
half of them also human extracted sentences– An average article in the data set contains about 1000 words or
50 sentences– About 7.5 sentences are selected into the summary– After applying linguistic processing, on average 82 logical triples
per document with 15 of them contained in extracted summary sentence
• Feature Set – The resulting vector for logical form triples contain about 466
binary and real-valued attributes– 72 of these components have non-zero values, on average
![Page 14: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/14.jpg)
14
Learning Semantic Sub-Graphs Using Support Vector Machines & Experimental
• Experiment results– Impact of the Training Data Size– Sentence classifiers are trained and tested through 10-fold
cross-validation experiments– Sizes of training data: 10, 20, 50, and 100 documents– A negative impact of small training sets on the generalization of
the model
![Page 15: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/15.jpg)
15
Learning Semantic Sub-Graphs Using Support Vector Machines & Experimental
• Experiment results– Impact of Different Feature Attributes
![Page 16: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/16.jpg)
16
Learning Semantic Sub-Graphs Using Support Vector Machines & Experimental
• Experiment results– Impact of Different Feature Attributes– Semantic graph attributes are consistently ranked high among all
the attributes used in the model– Syntactic and semantic properties of the text are important for
summarization
![Page 17: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/17.jpg)
17
Learning Semantic Sub-Graphs Using Support Vector Machines & Experimental
• Experiment results– Topic Specific Summarization– While the data size for topic specific summarization is small and
thus does not allow generalization
![Page 18: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/18.jpg)
18
Learning Semantic Sub-Graphs Using Support Vector Machines & Experimental
• Sentence Extraction
![Page 19: 1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649f055503460f94c1a8e4/html5/thumbnails/19.jpg)
19
Conclusions
• This paper presented a novel approach of document summarization by generating semantic representation of the document and applying machine learning to extract a sub-graph that correspond to the semantic structure of a human extracted document summary
• Compared to human extracted summaries they achieve on average recall of 75% and precision of 30%