Genera&ng Linked Data by Inferring the Semancs of Tables
Transcript of Genera&ng Linked Data by Inferring the Semancs of Tables
![Page 1: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/1.jpg)
Genera&ngLinkedDatabyInferringthe
Seman&csofTables
VarishMulwad,Ph.D.2015h5p://ebiq.org/j/96
![Page 2: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/2.jpg)
Goal:Table=>LOD*
Name Team Posi&on HeightMichaelJordan Chicago ShooMngguard 1.98
AllenIverson Philadelphia Pointguard 1.83
YaoMing Houston Center 2.29
TimDuncan SanAntonio Powerforward 2.11
h5p://dbpedia.org/class/yago/NaMonalBasketballAssociaMonTeams
h5p://dbpedia.org/resource/Allen_Iverson Playerheightinmeters
dbprop:team
*DBpedia 2/49
![Page 3: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/3.jpg)
Goal:Table=>LOD*
Name Team Posi&on HeightMichaelJordan Chicago ShooMngguard 1.98
AllenIverson Philadelphia Pointguard 1.83
YaoMing Houston Center 2.29
TimDuncan SanAntonio Powerforward 2.11
@prefixdbpedia:<h5p://dbpedia.org/resource/>.@prefixdbo:<h5p://dbpedia.org/ontology/>.@prefixyago:<h5p://dbpedia.org/class/yago/>."Name"@enisrdfs:labelofdbo:BasketballPlayer."Team"@enisrdfs:labelofyago:NaMonalBasketballAssociaMonTeams."MichaelJordan"@enisrdfs:labelofdbpedia:MichaelJordan.dbpedia:MichaelJordanadbo:BasketballPlayer."ChicagoBulls"@enisrdfs:labelofdbpedia:ChicagoBulls.dbpedia:ChicagoBullsayago:NaMonalBasketballAssociaMonTeams.
RDFLinkedData
Allthisinacompletelyautomatedway*DBpedia 3/49
![Page 4: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/4.jpg)
Tablesareeverywhere!!…yet…
Theweb–154millionhighqualityrelaMonaltables
4/49
![Page 5: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/5.jpg)
Evidence–basedmedicine
Figure:Evidence-BasedMedicine-theEssenMalRoleofSystemaMcReviews,andtheNeedforAutomatedTextMiningTools,IHI2010
Evidence-basedmedicinejudgestheefficacyoftreatmentsortestsbymeta-analysesofclinicaltrials.KeyinformaMonisolenfoundintablesinarMcles
However,therateatwhichmeta-analysesarepublishedremainsverylow…hamperseffec=vehealthcaretreatment…
#ofClinicaltrialspublishedin2008
#ofmetaanalysispublishedin2008
5/49
![Page 6: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/6.jpg)
~400,000datasets~<1%inRDF
6/49
![Page 7: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/7.jpg)
2010PreliminarySystem
ClasspredicMonforcolumn:77%EnMtyLinkingfortablecells:66%
Examplesofclasslabelpredic=onresults:Column–NaMonalityPredicMon–MilitaryConflictColumn–BirthPlacePredicMon–PopulatedPlace
PredictClassforColumns
Linkingthetablecells
IdenMfyandDiscoverrelaMons
T2LDFramework
![Page 8: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/8.jpg)
SourcesofErrors
• Thesequen9alapproachleterrorsperco-latefromonephasetothenext• ThesystemwasbiasedtowardpredicMngoverlygeneralclassesovermoreappropriatespecificones• HeurisMcslargelydrivethesystem• AlthoughweconsidermulMplesourcesofevidence,wedidnotjointassignment
8/49
![Page 9: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/9.jpg)
Sampling AcronymdetecMon
Pre-processingmodules
QueryandgenerateiniMalmappings
2 1
GenerateLinkedRDF Verify(op9onal) Storeinaknowledgebase&publishasLOD
JointInference/Assignment
ADomainIndependentFramework
9/49
![Page 10: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/10.jpg)
QueryMechanism
MichaelJordan ChicagoBulls Shoo&ngGuard 1.98
{dbo:Place,dbo:City,yago:WomenArMst,yago:LivingPeople,yago:NaMonalBasketballAssociaMonTeams…}
ChicagoBulls,Chicago,JudyChicago… ………
Team
possibletypes possibleen99es
10/49
![Page 11: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/11.jpg)
Rankingthecandidates
Stringsimilaritymetrics
Stringincolumnheader Classfromanontology
11/49
![Page 12: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/12.jpg)
Rankingthecandidates
Stringsimilaritymetrics
Popularitymetrics
Stringintablecell EnMtyfromtheknowledgebase(KB)
12/49
![Page 13: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/13.jpg)
JointInferenceoverevidenceinatable
ü ProbabilisMcGraphicalModels
13/49
![Page 14: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/14.jpg)
AgraphicalmodelfortablesJointinferenceoverevidenceinatable
C1 C2 C3
R11
R12
R13
R21
R22
R23
R31
R32
R33
Team
Chicago
Philadelphia
Houston
SanAntonio
Class
Instance
14/49
![Page 15: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/15.jpg)
Parameterizedgraphicalmodel
C1 C2C3
𝝍𝟓
R11 R12 R13 R21 R22 R23 R31 R32 R33
𝝍𝟑
𝝍𝟑
𝝍𝟑
𝝍𝟒
𝝍𝟒
𝝍𝟒
FuncMonthatcapturestheaffinitybetweenthecolumnheadersandrowvalues
Rowvalue
VariableNode:Columnheader
CapturesinteracMonbetweencolumnheaders
CapturesinteracMonbetweenrowvalues
FactorNode
15/49
![Page 16: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/16.jpg)
Challenge:InterpreMngLiterals
Popula&on
690,000
345,000
510,020
120,000
Age
75
65
50
25
PopulaMon?Profitin$K?
Ageinyears?Percent?
Manycolumnshaveliterals,e.g.,numbers
• PredictproperMesbasedoncellvalues• Cychadhandcodedrules:humansdon’tlivepast120• Weextractvaluedistribu9onsfromLODresources• Differforsubclasses:ageofpeoplevs.poli9calleadersvs.athletes• Representasmeasurements:value+units
• Metric:possibility/probabilityofvaluesgivendistribuMon16/49
![Page 17: Genera&ng Linked Data by Inferring the Semancs of Tables](https://reader031.fdocuments.in/reader031/viewer/2022021713/620b9783f00e0e24c95a26b4/html5/thumbnails/17.jpg)
OtherChallenges• Usingtablecap9onsandothertextisassociateddocumentstoprovidecontext
• Sizeofsomedata.govtables(>400Krows!)makesusingfullgraphicalmodelimpracMcal– Sampletableandrunmodelonthesubset
• Achievingacceptableaccuracymayrequirehumaninput– 100%accuracyuna5ainableautomaMcally– Howbesttolethumansofferadviceand/orcorrectinterpretaMons?
17/49