Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview
-
Upload
radityo-eko-prasojo -
Category
Science
-
view
141 -
download
3
Transcript of Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview
![Page 1: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/1.jpg)
Entity-RelationshipExtractionfromWikipediaUnstructuredText
RadityoEkoPrasojo (Rido)PhDStudent@KRDB,FreeUniversityofBozen-Bolzano
Supervisedby:Mouna Kacimi &WernerNutt
20.07.16,Bilbao,Spain
![Page 2: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/2.jpg)
Automaticallygenerated Manuallycurated
Automatedextractionwithout(yet)aKBasaresult
KnowledgeVault[1]
KnowledgeGraph
NELL[2]
220/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
Infobox completion [3][4]
![Page 3: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/3.jpg)
320/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
![Page 4: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/4.jpg)
420/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
![Page 5: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/5.jpg)
520/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
WherewasObamaborn?
WhoarethechildrenofObama?
![Page 6: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/6.jpg)
620/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
WhenwasObamaborn?
WhoarethechildrenofObama?
Yeswecan!
Honolulu, HawaiiMaliaandSashaObama
![Page 7: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/7.jpg)
720/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
WhichareObama’sfavourite sportsteam?
DoesObamahavepets?
![Page 8: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/8.jpg)
OurgoalistoenrichexistingKnowledgeBasesbyextractingnewfactsintheformofmachine-readableentity-relationshipfromWikipediaunstructuredtext.
Specificfocus:RDF
820/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao
![Page 9: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/9.jpg)
Whyisitdifficult?
• Theextractionproblem• Entityextraction&disambiguation• Relationextraction
• Therepresentationproblem• Lackofpredefinedschema/ontology• Topic-independency• Complexfactrepresentation
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 9
![Page 10: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/10.jpg)
Whyisitdifficult?Example
• “Obamaisasupporterofthe ChicagoWhiteSox”• Straightforward,singletoninformation• Puresyntacticextractionpossible• Barack_Obama supporterOf Chicago_White_Sox
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 10
![Page 11: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/11.jpg)
Whyisitdifficult?Example
• “Obamaisasupporterofthe ChicagoWhiteSox”• Straightforward,singletoninformation• Puresyntacticextractionpossible• Barack_Obama supporterOf Chicago_White_Sox
• “He isalsoprimarily a ChicagoBears footballfaninthe NFL,butinhischildhoodandadolescencewas a fanofthePittsburghSteelers”• Complex,multipleinformation• Semanticunderstandingnecessary• …howdowerepresentthis?
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 11
![Page 12: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/12.jpg)
Example:representingcomplexfact
• “He isalsoprimarily a ChicagoBears footballfaninthe NFL,butinhischildhoodandadolescencewas a fanofthePittsburghSteelers”• Barack_Obama footballFan Chicago_Bears in NFL• supporterOf vsfootballFan• IsitnecessarytoincludeNFL inthewholerelations?• Whatabouttheadjectiveprimarily?Whatinformationdoesitimply?
• Barack_Obama fanOf Pittsburgh_Steelers• fanOf vs supporterOf• Missingthetimeinformationreferredin“inhischildhoodandadolescencewas”
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 12
![Page 13: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/13.jpg)
Approach
• Documentpreprocessingtoannotateallentityoccurences.• Grammaticaldependencytoextract(candidate)relations.
• Separationbetweentheextractionproblemandtherepresentationproblem• Wefirstextractallcandidaterelationsandthenlaterapplysemanticrefinementforbetterrepresentation.
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 13
![Page 14: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/14.jpg)
Preliminaryresults
• Groundtruthmanuallycuratedfrom25Wikipediaarticlesoffamouspeople.• Preprocessing• 4handcraftedextractionrulesleveraginggrammaticaldependency
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 14
![Page 15: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/15.jpg)
Ongoingwork• Automatedrulemining• Semanticrefinementforknowledgerepresentation• Ontologybuilding
• Namingandtaxonomyofentities,classes,andrelations• Handlingcomplexfact
• Obamaappointsxasyinz• Handlingmodality,adjectives,andsentiment
• “Inthepast”,“itisrumoured that”,“itisnottruethat”
• Futureevaluation• Biggergroundtruth(amount+topiccoverage)• EvaluatehowwellweenrichexistingKBs
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 15
![Page 16: Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview](https://reader033.fdocuments.in/reader033/viewer/2022051710/5a67b7787f8b9a360c8b6ecd/html5/thumbnails/16.jpg)
Futurework
• Metadataextraction• Dataquality,datacompleteness
• NaturallanguagequestionansweringbasedontheenrichedKB.
20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 16