Instance Matching Benchmarks for Linked Data - ESWC 2016 Tutorial
How well does your Instance Matching system perform? Experimental evaluation with LANCE
-
Upload
holistic-benchmarking-of-big-linked-data -
Category
Science
-
view
230 -
download
0
Transcript of How well does your Instance Matching system perform? Experimental evaluation with LANCE
![Page 1: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/1.jpg)
HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE
TzaninaSaveta,EvangeliaDaskalaki,GiorgosFlouris,
IriniFundulakiInstituteofComputerScience–FORTH,Greece
Axel-CyrilleNgongaNgomoIFI/AKSW,UniversityofLeipzig,Germany
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 1
![Page 2: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/2.jpg)
WhyInstanceMatching?
ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 2*AdaptedfromSuchanek&Weikumtutorial@SIGMOD2013
Differentsourcescontaindifferentdescriptionsofthesamerealworld
entity
![Page 3: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/3.jpg)
InstanceMatchingforLinkedData
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 3
SetofRDFtriplesconstituteanRDF
graph
SparseData
Richsemanticsexpressedinterms
ofontologies
LargenumberofsourcestointegrateValue,Structure
andSemanticsHeterogeneities
*AdaptedfromSuchanek&Weikumtutorial@SIGMOD2013
![Page 4: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/4.jpg)
Benchmarking
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 4
Instancematchinghasledtothedevelopmentofanumberofmatchingtechniquesandtools
• Howtocomparethose?• Howtoassesstheirperformance(efficiencyand
effectiveness)?• Howto“push”systemsintobecomingbetter?
• Benchmarkyoursystems!
![Page 5: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/5.jpg)
InstanceMatchingBenchmarkComponents
• Datasets– Sourceandthetargetdatasetsthatwillbematchedtogethertofindtheentitiesthatrefertothesamerealworldobject
• Groundtruth/Goldstandard/Referencealignment– The“correctanswersheet”usedtojudgethecompletenessandsoundnessoftheresultsproducedbytheSUT
• Organizedintotestcaseseachaddressingdifferentkindofinstancematchingrequirements
• Metrics– Theperformancemetric(s)thatdeterminethesystems’efficiencyandeffectiveness
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 5
![Page 6: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/6.jpg)
LANCE
• Anovelinstancematchingbenchmarkgenerator
• Domain-independent
• Highlyconfigurableandscalable• Standardvalue-basedandstructure-basedtestcases• Advancedsemantics-awaretestcasesconsideringOWL2
expressiveconstructs
• Richweightedgoldstandard
• Additionalmetrics:similarityscoremetric
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 6
![Page 7: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/7.jpg)
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 7
LANCEArchitecture
Source Data
Target Data
Weighted Gold Standard
Resource Transformation
Module
RESCAL [NT12]
MATCHER SAMPLER
Weight Computation Module
Test Case Generation Parameters RDF
Repository Dat
a
Inge
stio
n M
odul
e
Initialization Module
Resource Generator
Test Case Generator SP
ARQ
L Q
uerie
s (S
chem
a St
ats)
SPAR
QL
Que
ries
(IR)
Matched Instances
Source Data
![Page 8: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/8.jpg)
TestCases
Testcasesarebuiltusingavarietyoftransformations
• Value-basedtestcases– Transformationsofvaluesofdatatypeproperties
• Structure-basedtestcases– Transformationsofstructureofobjectanddatatypeproperties
• Semantics-awaretestcases– Transformationsattheinstancelevelconsideringtheschema
• SimpleandComplexcombinationofthethreefirstcategories
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 8
![Page 9: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/9.jpg)
LANCEPerformanceMetrics• Averagesimilarityscore:averagedifficultyofthematchedinstances
– Benchmarkwithhighaveragesimilarityscore:matchedinstancesareeasiertofind
• Standarddeviation:spreadofsimilarityscoresforthematchedinstances– Benchmarkwithhighstandarddeviation:
• scoresarespreadoutfromtheaverage• moreheterogeneityofmatchedinstances
10/31/16 HOBBITPlenary2
Obtainamorefine-grainedunderstandingoftheIMsystem’sperformancebycomparingtheaveragestandarddeviationand
similarityscoreofthesystemandbenchmark
![Page 10: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/10.jpg)
Experiments• EfficiencyandeffectivenessofIMsystemsusingLANCEbenchmarks– Systems:• LogMapVersion2.4[JG11](MoReReasoner[RG13])• OtO[DP12]• LIMES(EAGLEIMalgorithm[NL12])
– Datasets• LDBC’sSPIMBENCHGenerator(SemanticPublishingBenchmark)
• UOBM– MatchingTask• All5categoriesintroducedpreviously• Allinstancesweretransformed
10
![Page 11: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/11.jpg)
SPIMBENCH:StandardMetrics
11
• LogMap– Respondwellinthevalue-basedtestcases– Reducedperformancewhenalsosemantics-awaretestcaseswereapplied
![Page 12: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/12.jpg)
SPIMBENCH:StandardMetrics
12
• OtOandEAGLE– Givegoodresultsregardingthevalue-basedtransformations
– Reducedperformanceintheremainingcategories• EAGLEisnon-deterministicandusesunsupervisedlearning
![Page 13: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/13.jpg)
UOBM:StandardMetrics
• LogMap1.Doesnotperformwelltoanyofthecategories2.Performancenotaffectedbythedatasetsize• OtO1.Performsbetter2.Reducedperformancewhenincreasingdatasetsize
13
![Page 14: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/14.jpg)
SPIMBENCH:AdditionalMetrics
DistributionofsimilarityscoresforLANCEandTruePositivematchesfromIMsystemsforsemantics-awaretestcasesinthecaseofthe10Ktriplesdataset.• LogMapcanaddressdifficulttestcases• EAGLE&OtOcanaddressmostlyvalue-basedtestcases
1
10
100
0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1
log(#ofm
appings)
SimilarityScores
OtO EAGLE LogMap LANCE
14
StandardDevia8on
![Page 15: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/15.jpg)
UOBM:AdditionalMetrics
DistributionofsimilarityscoresforLANCEandTruePositivematchesfromIMsystemsforstructure-basedtestcasesinthecaseofthe10Ktriplesdataset.• LogMapcannotaddresswellthechangeofURIsintheInstances
ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 15
1
10
100
0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9
log(#ofm
appings)
SimilarityOtO LogMap LANCE
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
OtO LogMap LANCE
![Page 16: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/16.jpg)
LessonsLearned• DifferenttypeoftransformationsaffectIMsystem’s
performance• Thecharacteristicsofsourcedatasetsaffectthebehaviorof
IMsystems
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 16
![Page 17: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/17.jpg)
Questions?
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 17
![Page 18: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/18.jpg)
AcknowledgmentsThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020researchandinnovationprogrammeundergrantagreementNo688227.
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 18
![Page 19: How well does your Instance Matching system perform? Experimental evaluation with LANCE](https://reader031.fdocuments.in/reader031/viewer/2022030315/588089df1a28ab35718b66cf/html5/thumbnails/19.jpg)
References[JG11]E.Jimenez-RuizandB.C.Grau.Logmap:Logic-basedandscalableontologymatching.InISWC,2011.[RG13]A.A.Romero,B.C.Grau,etal.MORe:aModularOWLReasonerforOntologyClassification.InORE,pages61-67,2013.[DP12]E.DaskalakiandD.Plexousakis.OtOMatchingSystem:AMulti-strategyApproachtoInstanceMatching.InCAiSE,2012.[NL12]A.-C.NgongaNgomoandK.Lyko.EAGLE:EfficientActiveLearningofLinkSpecificationsusingGeneticProgramming.InESWC,2012.
19