Linkset quality (LWDM 2013)
-
Upload
riccardo-albertoni -
Category
Technology
-
view
180 -
download
1
Transcript of Linkset quality (LWDM 2013)
Assessing Linkset Quality For Complementing Third Party Datasets
Riccardo Albertoni1,2, Asunción Gómez Pérez1
1Ontology Engineering GroupDepartamento de Inteligencia Artificial
Facultad de InformáticaUniversidad Politécnica de Madrid
2CNR-IMATI,Via De Marini, 6, Torre di Francia, 16149 Genova, Italy
3RD INTERNATIONAL WORKSHOP ON LINKED WEB DATA MANAGEMENT (LWDM 2013)
in conjunction with the 16th International Conference on Extending Database Technology (EDBT 2013)
March 22, 2013 - Genoa, Italy
2
Motivations
Riccardo Albertoni
LINKED DATA’s PROMISE: Evolving the Web into a Global Data SpaceIt should help to overcome data silos effect….
So many bubbles there,
THAT’S SO COOL!!
BUT ….
Can I exploit that third party
data for my OWN
ANALYSES?
3
Motivation
Riccardo Albertoni
What does this arrow mean ??
NO GROUND CONCEPT about
what makes a linkset suitable for a target
application
Well founded works on quality for datasets, but
Linksets are not yet directly addressed!SWDF
DBLP
4
What is Linkset Quality for?
Linked Data Publishers can check if a linkset they have provided
• is good enough or need to be improved; • is still good enough after one of the two target
datasets is updated.
Linked Data Consumers can • figure out if they can or can’t rely on a linkset;• have a first guess of what is the next move they can
take to improve the linkset;• rank possible linkset alternatives.
Riccardo Albertoni
5
foaf:made
a
Pub1
Pub2
b
foaf:made
Pub3
Pub4Yolanda Gil
DBLP Y
Linkset L
a owl:sameAs a’b owl:sameAs b’
XL
foaf:member
a’
Afflii5
Affili4
b’
foaf:member
Affili3
X
Journal 1
c’
Complementing a Dataset X via a Linkset L
≠
Complementation might introduce some “data missing”
The less “data missing” (like researcher c) are introduced the more the Linkset is complete
6
What is a Linkset ? (http://vocab.deri.ie/void)
Riccardo Albertoni
Every linkset is a special kind of dataset !!
Every linkset has two target datasets:Subject and Object datasets
Every linkset should have only one linking property
owl:sameAs linksets
7
Defining quality measures
Riccardo Albertoni
Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009
What to define providing a quality measure
Provided in this Linkset quality
Quality Indicator is an aspect of a data item or data set that may give an indication to the user of the suitability of the data for some intended use.
Entities TypesNumber of Entities for Types… …
Scoring Function namely, functions evaluating quality indicators to measure the suitability of the data for some intended use.
Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for Type
Aggregate Metric user-specified assessment metric built upon scoring functions. These aggregations produce new assessment values through the average, sum, max, min or threshold functions applied to the set of scoring functions.
Interpretation tables: interpretation on the scoring functions that helps in figuring out which is the next action to do
8
Defining quality measures
Riccardo Albertoni
Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009
What to define providing a quality measure
Provided in this Linkset quality
Quality Indicator is an aspect of a data item or data set that may give an indication to the user of the suitability of the data for some intended use.
Entities TypesNumber of Entities for Types… …
Scoring Function namely, functions evaluating quality indicators to measure the suitability of the data for some intended use.
Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for Type
Aggregate Metric user-specified assessment metric built upon scoring functions. These aggregations produce new assessment values through the average, sum, max, min or threshold functions applied to the set of scoring functions.
Interpretation tables: interpretation on the scoring functions that helps in figuring out which is the next action to do
9
INDICATORS: Examples on DBLP & SWDF
Riccardo Albertoni
foaf:Organization
foaf:Person
ro:FullPaperfoaf:Document
foaf:Agent
swr:Proceedingsswrc:Proceedings
DBLP SWDF
ro:ShortPaperro:PosterPaper
Type(DBLP) Type(SWDF)
#E4Type(foaf:Agent,DBLP)=1000000
#E4Type(foaf:Document,DBLP)=1984087
#E4Type(swrc:Proceedings,DBLP)=1108400
11
INDICATORS: Examples on DBLP & SWDF
Riccardo Albertoni
foaf:Organization
foaf:Person
ro:FullPaperfoaf:Document
foaf:Agent
swr:Proceedingsswrc:Proceedings
DBLP SWDF
L2
ro:PosterPaper
Type(DBLP) Type(SWDF)
#E4Type(foaf:Agent,L2)=100
#E4Type(foaf:Person,L2)=100 Type(L2)
12
Quality indicators: Types
Riccardo Albertoni
Dataset/ Linkset
Power set on the possible User defined Types
e.g. owl:Class, owl:Restriction, skos:Concept,
skos:ConceptScheme
Returns the types of entities
exposed in a dataset or a
linkset
13
Quality indicators: # of Entity for a Type
Riccardo Albertoni
Dataset/ Linkset
One of the possible User defined Types
Set of (positive) integer
Returns the number of entities exposed in a dataset/ linkset for a given type
Blank nodes are left out
15
Defining quality measures
Riccardo Albertoni
Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009
What to define providing a quality measure
Provided in this Linkset quality
Quality Indicator is an aspect of a data item or data set that may give an indication to the user of the suitability of the data for some intended use.
Entities TypesNumber of Entities for Types… …
Scoring Function namely, functions evaluating quality indicators to measure the suitability of the data for some intended use.
Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for Type
Aggregate Metric user-specified assessment metric built upon scoring functions. These aggregations produce new assessment values through the average, sum, max, min or threshold functions applied to the set of scoring functions.
Interpretation tables: interpretation on the scoring functions that helps in figuring out which is the next action to do
16
SCORING FUNCTIONS: Linkset Type Coverage (1)
Riccardo Albertoni
foaf:Organization
foaf:Personfoaf:Agent
swrc:Proceedings
DBLP SWDF
L1
Type(DBLP) Type(SWDF)
Complementing DBLP with L1, are we adding some new entities to DBLP?
DBLPL1 “imports” organizations for the researchers (foaf:Agent) involved in the linkset
17
SCORING FUNCTIONS: Linkset Type Coverage (2)
Riccardo Albertoni
foaf:Organization
foaf:Personfoaf:Agent
swrc:Proceedings
DBLP SWDF
Type(DBLP) Type(SWDF)
Complementing SWDF with L2, we don’t add any new type of entities
SWDFL2 has exactly the same kind of Entities of SWDF
swr:ProceedingsL2
18
Definition of Linkset Type Coverage
Riccardo Albertoni
LinksetTarget dataset
Considering a dataset X, What percentage of types of X that are also covered by the linkset?
19
SCORING FUNCTION: Ideas behind Type Completeness (1)
Riccardo Albertoni
foaf:Organization
foaf:Personfoaf:Agent
swrc:Proceedings
DBLP SWDF
L1
Type(DBLP) Type(SWDF)
L1 is type complete
It does not make sense to run a procedure ( e.g., SILK) trying to discover
interlinks between the instances of swrc:Proceedings and foaf:Organization!!!
20
SCORING FUNCTION: Ideas behind Type Completeness(2)
Riccardo Albertoni
foaf:Organization
foaf:Personfoaf:Agent
swrc:Proceedings
DBLP SWDF
L1
Type(DBLP) Type(SWDF)
swr:Proceedings
We should try to run a procedure ( e.g., SILK) trying to discover interlinks
between the instances of swrc:Proceedings and swr:Proceedings!!!
Alignment among classes
L1 is type incomplete
21
Formalization of Linkset Type Completeness
Riccardo Albertoni
LinksetTerget dataset 2
Target dataset 1
Types In the subject that are not considered in the linkset
returns the set of types that X have an equivalent in Y according to a relation of equivalence among classes
A linkset is complete with respect to types LTCom= 1LTCom<1 otherwise
22
Example on Type Completeness
Riccardo Albertoni
foaf:Organization
foaf:Personfoaf:Agent
swrc:Proceedings
DBLP SWDF
L1
Type(DBLP) Type(SWDF)
swr:ProceedingsL2
LTCom(L1,DBLP, SWDF) = 1- (|{swrc:Proceedings}| / |{swrc:Proceedings,foaf:Person}|)=1/2
LTCom(L2,DBLP, SWDF) = 1- (|{}| / |{swr:Proceedings,foaf:Person}|)=1
23
foaf:Organization
foaf:Personfoaf:Agent
swrc:Proceedings
DBLP SWDF
L1
L1 and L2 are indistinguishable from the point of view of types
Which is the most interesting? L1 or L2? Or L1 U L2 ?
swr:ProceedingsL2
Linkset Entity Coverage for Type
Riccardo Albertoni
Number of Entity of type T in the linkset L
Number of Entity of type T in the Dataset X
How good is a linkset providing 100 owl:sameAs?
25
Defining quality measures
Riccardo Albertoni
Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009
What to define providing a quality measure
Provided in this Linkset quality
Quality Indicator is an aspect of a data item or data set that may give an indication to the user of the suitability of the data for some intended use.
Entities TypesNumber of Entities for Types… …
Scoring Function namely, functions evaluating quality indicators to measure the suitability of the data for some intended use.
Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for Type
Aggregate Metric user-specified assessment metric built upon scoring functions. These aggregations produce new assessment values through the average, sum, max, min or threshold functions applied to the set of scoring functions.
Interpretation tables: interpretation on the scoring functions that helps in figuring out which is the next action to do
26Riccardo Albertoni
Aggregate Metrics: Interpretation upon the presented score functions
Interpretation is summed up
as “decision tree”
27
Related work: (extended discussion in the paper)
• WIQA is a Information Quality Assessment Framework
• C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. WebSem., 7(1):110, 2009
• LOD2 • P. N. Mendes, C. Bizer, J. H. Young, Z. Miklos, J.-P.
Calbimonte, and A. Moraru. Conceptual model and best practices for high-quality metadata publishing.Technical report, PlanetData, Deliverable 2.1, 2012,http://planet-data-wiki.sti2.at/web/File:D2.1.pdf.
• PlanetData• P. N. Mendes and C. Bizer. Survey report state of the art
in mapping, quality assessment and data fusion. Technical report, LOD2- Creating Knowledge out of Interlinked data, Deliverable 4.3.1, 2011,http://static.lod2.eu/Deliverables
• SIEVE• P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linked
data quality assessment and fusion. In D. Srivastava and I. Ari, editors, LWDM EDBT/ICDT Workshops, pp. 116-123. ACM, 2012.
Riccardo Alberton
Contributes with a policy language, engine for interpreting such policies, Explanation if a piece of information
satisfies that policy
Quality criteria are parameters of the system It does not aim at proposing new
quality measures
Reviews quality dimensions
No indicators or criteria for completeness
Intensionally compl. : the schema contains all the necessary attributes,;Extensionally compl. : all instances re quired al present), LDS Completeness: relevant properties have a values
SIEVE deploys some of the idea developed in WIQA and LDS completeness
They don’t explicitly address quality for Linksets
28
Related work: (extended discussion in the paper)
• Link-QA• C. Gueret, P. T. Groth, C. Stadler, and J. Lehmann.
Assessing linked data mappings using network measures. In E. Simperl, P. Cimiano, A. Polleres, O. Corcho, and V. Presutti, editors, ESWC, volume 7295 of Lecture Notes in Computer Science, pp. 87-102. Springer, 2012
Riccardo Alberton
Different approach:They apply classic network measure such as degree, centrality, clustering coefficient +
open-sameAs chain, description richness for determining whether a bunch of links
improves the overall dataset quality
Quality of interlinking not for linksetLINK-QA works on links independently
of they are part or not of the same linksets;
LINK-QA addresses correctness and it does not deal with
Completeness
LINK-QA is for ranking sets of links, itcan be used to say a linkset is better than
another, but itdoes not suggest what is the next move
a consumer shouldtake to improve his linkset
29
Conclusions
Contribution: Quality measure on linksets• The only measure explicitly addressing linkset
completeness for dataset complementation• Formalization for indicators, score functions and
aggregation metrics; • A first proof of concept prototype (JAVA-JENA)
On-going and Future work• Validation on the LOD,
• How many “incomplete” Linksets can we detect in the LOD?
• Extension for considering others than owl:sameAs Linkset (e-g., skos:exactMatch)
• Other dimensions than completeness (e.g., Timeliness, Availability, Consistency)
Riccardo Albertoni