Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing ...
-
Upload
curtis-perkins -
Category
Documents
-
view
234 -
download
0
Transcript of Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing ...
![Page 1: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/1.jpg)
Wikipedia Knowledge Extraction
![Page 2: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/2.jpg)
Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility
![Page 3: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/3.jpg)
“His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama)
![Page 4: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/4.jpg)
“His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama)
Current solution: replace pronouns with article title (very primitive)
Target solution: ◦ Nobody in the world has solved this yet◦ Use an existing system that is usually correct?◦ Simple rules for common patterns?
![Page 5: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/5.jpg)
Convert information into simple sentences:◦ Joe Biden is Barack Obama’s Vice
President ◦ Barack Obama is preceded by
George W. Bush Use type of phrase (Noun
Phrase, Verb Phrase) to determine sentence to form.
Read papers from Turing Center (University of Washington)
![Page 6: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/6.jpg)
Performs a deep analysis on each sentence. E.g. “Yoshi has a long tongue which he uses
to grab enemies and eat them.”◦ has (A0: Yoshi, A1: long tongue)◦ use (A0: Yoshi, A1: long tongue, A2: grab enemies
and eat them) Use SRL parsing to improve quality and
representation of knowledge. Problem: speed and complexity
![Page 7: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/7.jpg)
Current system has Subject, Object, Verb tuples
Problem: hard to define what words to incorporate in each phrase
E.g. “'The dog ( Canis lupus familiaris )' 'is' 'a mammal from the family Canidae‘”◦ The dog? dog? The dog ( Canis lupus familiaris )?◦ a mammal? a mammal from the family Canidae?
Possible solutions: ◦ Different levels of information?◦ Simple rules based on part of speech tags?
![Page 8: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/8.jpg)
Idea: Determine whether two separate mentions point to the same concept◦ ‘The dog’, ‘a dog’, ‘dogs’◦ ‘Cats’, ‘C.A.T.S’, ‘CAT Scan’◦ ‘President Obama’, ‘President Barack Obama’
Possible solutions:◦ Feature-based classification◦ Self organizing map◦ Terms associated
![Page 9: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/9.jpg)
Need to ensure scaling is possible for move to regular Wikipedia
Hadoop is an open source implementation of the Map-Reduce algorithm
Map-Reduce is an algorithm that parallelizes a process by splitting its iterations over several machines
![Page 10: Wikipedia Knowledge Extraction. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649ea25503460f94ba618f/html5/thumbnails/10.jpg)