Amrapali builders(amrapali dream valley highrise brochure)Amrapali Reviews,Amrapali Group
Amrapali Zaveri Defense
-
Upload
amrapali-zaveri -
Category
Science
-
view
302 -
download
0
Transcript of Amrapali Zaveri Defense
17th April, 2015! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Leipzig, Germany
Linked Data Quality Assessment and its Application to Societal Progress Measurement
Amrapali Zaveri
1
Faculty of Mathematics and Computer Science!!
Supervisors:!Prof. Dr. Ing. habil. Klaus-Peter Fähnrich, University of Leipzig!
Dr. Jens Lehmann, University of Leipzig! Prof. Dr. Sören Auer, University of Bonn
Outline
2Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Outline
Motivation — Linked Data Quality
2Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
2Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
2Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
Contributions
2Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Outline
Motivation — Linked Data Quality
Linked Data Quality Assessment Methodologies
Use Case Leveraging Data Quality
Contributions
Future Work
2Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation!
— Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri 3
Data on the Web
4
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Data on the Web
5
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Data on the Web
5
Accessible
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Data on the Web
5
Accessible
Re-usable
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Data on the Web
5
Accessible
Re-usable
Understandable
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Data on the Web
5
Accessible
Re-usable
Understandable
Discoverable
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Linked Data Principles
6
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Linked Data Principles
6
Use URIs as names for things.
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF, RDFS, OWL, SPARQL).
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Linked Data Principles
6
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF, RDFS, OWL, SPARQL).
Include links to other URIs, so that they can discover more things.
Motivation — Linked Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Linked Data
8Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
Data Quality
10
Data Quality is defined as:!
“fitness for use”*!
* Juran, J. (1974). The Quality Control Handbook. McGraw-Hill, New York.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
Consequences of Poor Quality
11Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!#http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!#http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!#http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Loss of important information
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!#http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
Consequences of Poor Quality
11
Propagation of errors in integrated datasets
Major hindrance in acquiring reliable results
Loss of important information
Loss in productivity — Additional costs*#
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
*http://www.gartner.com/newsroom/id/501733!#http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
Data Quality Assessment
12Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
Data Quality Assessment
12
How can one assess the quality of data and make this information explicit?!
Which criteria should be assessed?!
Which measures should be used?!
Which methodologies/tools can be utilized?
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
Main Research Question
13Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
Main Research Question
13
How can we exploit Linked Data for a particular use case and ensure good data quality?
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — Linked Data Quality
Overview
14
Systematic!literature!
review
Linked Data Quality Assessment !Methodologies Evaluation
User-driven Crowdsourcing
Semi-!automated
Use case!leveraging!
quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Overview
15
Systematic!literature!
review
Linked Data Quality Assessment !Methodologies Evaluation
User-driven Crowdsourcing
Semi-!automated
Use case!leveraging!
quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Current State
16Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Current State
16
Lack of unified descriptions for data quality dimensions and metrics for Linked Data
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Current State
16
Lack of unified descriptions for data quality dimensions and metrics for Linked Data
Lack of use-case-driven data quality assessment methodologies for Linked Data
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Current State
16
Lack of unified descriptions for data quality dimensions and metrics for Linked Data
Lack of use-case-driven data quality assessment methodologies for Linked Data
Lack of quality assessment of datasets before utilisation in particular use cases
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
17
Research Questions
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
17
RQ1 What are the existing approaches to assess the quality of Linked Data employing a conceptual framework integrating prior approaches?!
RQ1.1 What are the data quality problems that each approach assesses?!RQ1.2 Which are the data quality dimensions and metrics supported by the proposed approaches?!RQ1.3 Which tools already exist to assess the quality of Linked Data?
Research Questions
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Qualitative Analysis
18
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles18 dimensions - definitions
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles18 dimensions - definitions69 metrics
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Qualitative Analysis
18
30 core articles18 dimensions - definitions69 metrics12 tools compared using 8 attributes
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Dimensions
19Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
*specific for Linked Data
Dimensions
Relevancy Conciseness
Timeliness
Rep.-Conciseness
Interoperability
Consistency
Interpretability
Understandability
Versatility*
Availability
Performance* Interlinking*
SyntacticValidity
Representation
ContextualIntrinsic
Accessibility
Trustworthiness
Two dimensionsare related
Licensing*
Semantic Accuracy
Completeness
Security*
Dim1 Dim2
19Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
*specific for Linked Data
Metrics
20
Linked Data Quality MetricsDimension Metric Description QN/QL*
Completeness Schema completeness No. of classes and properties / !total no. of classes and properties QN
Interlinking Detection of good quality interlinks
(i) detection of (a) interlinking degree, (b) clustering coefficient, (c) centrality, (d) open sameAs chains and (e) description richness through sameAs by using network measures, (ii) via crowdsourcing
QN
Timeliness Freshness of datasets Max{0, 1 − currency / volatility} QN
Trustworthiness Trustworthiness of information provider
indicating the level of trust for the publisher on a scale of 1−9 QL
*QN - Quantitative Metric ; *QL - Qualitative Metric
Systematic Literature Review
Tools
21
Trellis TrustBOT tSPARQL WIQA ProLOD Flemming
Availablility - - ✔ - - ✔
Licensing Open-source - GPL v3 Apache v2 - -
Automation Semi-automated
Semi-automated
Semi-automated
Semi-automated
Semi-automated
Semi-automated
Collaboration Yes No No No No No
Customizability ✔ ✔ ✔ ✔ ✔ ✔
Scalability - No Yes - - No
Usability 2 4 4 2 2 3
Maintainance 2005 2003 2012 2006 2010 2010
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Tools
22
LinkQA Sieve RDFUnit DaCura TripleCheckMate LiQuate
Availablility ✔ ✔ ✔ - ✔ ✔
Licensing Open-source
Apache Apache - Apache -
Automation Automated Semi-automated
Semi-automated
Semi-automated
Semi-automated Semi-automated
Collaboration No No No Yes yes No
Customizability No✔ ✔ ✔ ✔
No
Scalability Yes Yes Yes No Yes No
Usability 2 4 3 1 5 1
Maintainance 2011 2012 2014 2013 2013 2013
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Problems in Current Approaches
23Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Require considerable amount of configuration
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Problems in Current Approaches
23
Not catered to the use case
Results difficult to interpret
Do not report the root cause of the quality issues
Require considerable amount of configuration
Do not allow user to choose input dataset
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Systematic Literature Review
Overview
24
Systematic!literature!
review
Linked Data Quality Assessment !Methodologies Evaluation
User-driven Crowdsourcing
Semi-!automated
Use case!leveraging!
quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Research Questions
25
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Research Questions
25
RQ2 How can we assess the quality of Linked Data using a user-driven methodology?!
RQ2.1 How feasible is it to employ Linked Data experts to assess the quality issues of LD?!RQ2.2 How feasible is it to use a combination of user-driven and semi-automated methodology to assess the quality of LD?
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Methodology
26
Resource Selection
[Per Class] [Manual]
[Random]
Resource
Evaluation mode selection
Resource Evaluation
[Manual]
Triples
[Semi-automatic] [Automatic]
List of invalid facts
Data QualityImprovement
Pre-selection of triples
Patch Ontology
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement
Methodology
26
Resource Selection
[Per Class] [Manual]
[Random]
Resource
Evaluation mode selection
Resource Evaluation
[Manual]
Triples
[Semi-automatic] [Automatic]
List of invalid facts
Data QualityImprovement
Pre-selection of triples
Patch Ontology
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement
Manual!Semi-automated!
!
!
!
!
Manual — Phase I
27
Linked Data Quality Problem TaxonomyDimensions Category
AccuracyTriple incorrectly extracted!Datatype problems!Implicit relationships between attributesRelevancy Irrelevant information extracted
Representational consistency Representation of number values
Interlinking External linksInterlinks with other datasets
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Manual — Phase II
28
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Manual — Phase II
28
Invited Linked Data experts!
Triple-based evaluation!
Contest-based - 3 weeks
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Phase II — TripleCheckMate
29
User-Driven Quality Assessment
https://github.com/AKSW/TripleCheckMate
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Choose a resource
Phase II — TripleCheckMate
30
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Identify erroneous triples
Phase II — TripleCheckMate
30
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Identify erroneous triples
Phase II — TripleCheckMate
30
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Identify erroneous triples
Manual — Results
32
Total no. of users 58
Total no. of distinct resources evaluated 521Total no. of distinct incorrect triples 2928% of triples affected 11.93%
Resource-based inter-rater agreement (Cohen’s kappa) 0.34
Total no. of triples evaluated for correctness 700
% of triples evaluated incorrectly 19%
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
Functionality
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionality
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionalityAsymmetric
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionalityAsymmetricIrreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionalityAsymmetricIrreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Example:
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionalityAsymmetricIrreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Example:Domain: Formula One Racer
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionalityAsymmetricIrreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Example:Domain: Formula One RacerRange: Grand Prix
Semi-automated — Step 1
33
Generate schema axioms for properties via DL-Learner*
FunctionalityInverse functionalityAsymmetricIrreflexivity
User-Driven Quality Assessment
*Lehmann, J. (2009). DL-Learner: learning concepts in description logics. Journal of Machine Learning Research, 10:2639–2642.!
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Example:Domain: Formula One RacerRange: Grand PrixOnly 1 first win of each Formula One Racer (Functional)
Semi-automated — Step 2
34
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms100 random axioms per type
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms100 random axioms per typeOnly those axioms where at least one violation can be found
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Step 2
34
Manual evaluation of generated axioms100 random axioms per typeOnly those axioms where at least one violation can be foundAlso taking target context into account
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Semi-automated — Results
35
User-Driven Quality Assessment
Inverse!functionality
Functionality
Asymmetry
Irreflexivity
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Summary
36
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Summary
36
Quality analysis of over 500 resources
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Summary
36
Quality analysis of over 500 resources12% error detected
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Summary
36
Quality analysis of over 500 resources12% error detectedLinked Data experts performed quality analysis but evaluated correct triples as errors
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Summary
36
Quality analysis of over 500 resources12% error detectedLinked Data experts performed quality analysis but evaluated correct triples as errors 75% functionality violations of property characteristics detected but required manual verification
User-Driven Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Overview
37
Systematic!literature!
review
Linked Data Quality Assessment !Methodologies Evaluation
User-driven Crowdsourcing
Semi-!automated
Use case!leveraging!
quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Research Questions
38
Crowdsourcing Linked Data Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Research Questions
38
RQ2.3 Is it possible to detect quality issues in LD datasets via crowdsourcing mechanisms?!
RQ2.4 What type of crowd is most suitable for each type of quality issue?!
RQ2.5 Which types of assessment errors are made by lay users and experts?
Crowdsourcing Linked Data Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Concepts
39
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
MTurk Workers - monetary reward for each HIT
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
Concepts
39
AMT - Amazon Mechanial Turk
HITs - Human Intelligent Tasks/microtasks
MTurk Workers - monetary reward for each HIT
Find-Fix-Verify phases
Crowdsourcing Linked Data Quality Assessment
- Crowdsourcing Linked Data quality assessment. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
- Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck and Jens Lehmann. SWJ (Submitted) 2015.
Methodology
40
Resource
[Manual]
[Any]
Resource selection
Evaluation of resource’s
triples
Selection of quality issues
[Incorrect triples]
[Yes]
[No]
List of incorrect triples classified by quality issue
(Find stage) LD Experts in contest
HIT generation
(Verify stage) Workers in paid microtasks
Accept HIT
Assess triple according to
the given quality issue
Submit HIT
[Per Class]
[Correct]
[Incorrect]
[Data doesn’t make sense] [I don’t know]
[More triples to assess]
[No]
[Yes]
Experts Workers
Crowdsourcing Linked Data Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Quality Issues Types
41
Incorrect/incomplete object value
Crowdsourcing Linked Data Quality Assessment
Quality Issues Types
41
Incorrect/incomplete object valuedbpedia:Oreye! !dbpedia-owl:postalCode! !“4360”!@en
Incorrect datatypes/literals
Crowdsourcing Linked Data Quality Assessment
Quality Issues Types
41
Incorrect/incomplete object value
Incorrect interlink
dbpedia:Oreye! !dbpedia-owl:postalCode! !“4360”!@en
Incorrect datatypes/literals
Crowdsourcing Linked Data Quality Assessment
Results - Experts vs. Crowd
42
Crowdsourcing Linked Data Quality Assessment
LD Expert MTurk Worker
58 80
3 weeks
4 days
1512
1073
0.38 0.73
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
LD experts MTurk Workers
Object values Fair!- required validation
Fair!- simple comparisons
Datatypes & literals Fair!- required validation
Poor!- inexperienced with
RDF
Interlinks Poor!- high effort required
Good!- high inter-rater
agreement
Summary — Experts vs. Crowd
43
Crowdsourcing Linked Data Quality Assessment
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Overview
44
Systematic!literature!
review
Linked Data Quality Assessment !Methodologies Evaluation
User-driven Crowdsourcing
Semi-!automated
Use case!leveraging!
quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Research Questions
45
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Research Questions
45
RQ2.6 How can we semi-automatically assess the quality of datasets and provide meaningful results to the user?!RQ3 How can we exploit Linked Data for building a use case and ensure good data quality?
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Motivation — User Scenario
46
Healthcare!Policy maker
Use Case Leveraging Data Quality
Which diseases?!Deaths per diseases?!
Where to allocate funds?
interested in
Motivation — User Scenario
46
Healthcare!Policy maker
Use Case Leveraging Data Quality
Which diseases?!Deaths per diseases?!
Where to allocate funds?
interested in
Databases!e.g. WHO, !
ClinicalTrials.gov
looks
at
Motivation — User Scenario
46
Healthcare!Policy maker
Use Case Leveraging Data Quality
Which diseases?!Deaths per diseases?!
Where to allocate funds?
interested in
Databases!e.g. WHO, !
ClinicalTrials.gov
looks
at
Data in disparate datasets, !in different formats!
Data quality problems!Subset of data!
Error-prone analysis etc.
analysis
Motivation — User Scenario
46
Healthcare!Policy maker
Use Case Leveraging Data Quality
Which diseases?!Deaths per diseases?!
Where to allocate funds?
interested in
Databases!e.g. WHO, !
ClinicalTrials.gov
looks
at
Data in disparate datasets, !in different formats!
Data quality problems!Subset of data!
Error-prone analysis etc.
analysis translates to Inadequate !allocations of!
funds
Use Case — Societal Progress Indicators
47
Evaluate the impact of Research & Development (R&D) — educational performance — on a country’s performance in:!
Economical!
Healthcare
Use Case Leveraging Data Quality
Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon. ISWC 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Datasets & Variables
48
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Datasets & Variables
48
4 datasets!
World Bank!
LinkedCT!
Scimago!
USPTO
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Datasets & Variables
48
4 datasets!
World Bank!
LinkedCT!
Scimago!
USPTO
17 variables !
Examples!
GDP (economical)!
Birth rate, death rate (healthcare)!
h-index (educational)
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
extract
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
extract
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
perform
extract
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Methodology
49
World Bank Scimago
LinkedCT USPTO
Quality !Assessment
RSPARQL*
Use Case Leveraging Data Quality
*van Hage, W. R., Kauppinen, T., Graeler, B., Davis, C., Hoek- sema, J., Ruttenberg, A., and Bahls, D. (2014). SPARQL Package, v1.6. R Foundation for Statistical Computing.!* https://github.com/amrapalijz/R-LOD-SEM/blob/master/RSPARQL
perform
extract
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
R2RLint tool*!
7 dimensions!
13 quality metrics !
Use case specific
Semi-automated Quality Assessment
50
*https://github.com/AKSW/R2RLint
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
R2RLint tool*!
7 dimensions!
13 quality metrics !
Use case specific
Semi-automated Quality Assessment
50
Availability Completeness
Interlinking
Syntactic!validity!
Consistency
Interpretability
Representational conciseness
*https://github.com/AKSW/R2RLint
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Quality Assessment Results
51
Use Case Leveraging Data Quality
Interlinking !completeness
Population !incompleteness
Inconsistency
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Total no. detected
11/17 Variables
52
Latent variables
Observed variables
Educational!
performance
Number of articles (h) that have at least h citations (h-index)Total no. of documents published per country per yearHigh-technology export (HTE)
Healthcare!performance
Adolescent fertility rate (AFR)Birth rate (BR)Death rate (DR)Health expenditure public (HEP)Immunization DPT (IDPT)Immunization measles (IM)Mortality rate, infant (MR)
Economic performance
GDP per capita (current US$)
Use Case Leveraging Data Quality
Methodology
53
World Bank
Scimago
Structural Equation Modeling
EFA*-CFA*-!EFA-CFA
Apply SEM to !hypothesis variables
Step I
Step II
Use Case Leveraging Data Quality
apply
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Methodology
53
World Bank
Scimago
Structural Equation Modeling
EFA*-CFA*-!EFA-CFA
Apply SEM to !hypothesis variables
Step I
Step II
*EFA - Exploratory Factor Analysis!*CFA - Confirmatory Factor Analysis
Use Case Leveraging Data Quality
apply
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Theoretical Framework
54
Use Case Leveraging Data Quality
Educational !performance
Healthcare!performance
Economical!performance
correla
tion
correlation
correlation
Structural Equation Modeling
55
Use Case Leveraging Data Quality
https://github.com/amrapalijz/R-LOD-SEM/blob/master/sem_script.RLinked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
#Insert covariance matrix!var<-var(semdata)!cov<-cov(datanew)!cor<-cor(datanew)!#Acquire data!> data<-with(data,data.frame(hindex,noOfDocs,IDPT,IM,MR,! AFR,BR,DR,GDP,HEP,HET))!> ssemmodel<- specifyModel()!
#Latent Variables!> HealthCare->IDPT,efa14,NA; HealthCare->IM,efa11,NA; HealthCare-> MR,efa12,NA; HealthCare->AFR,efa13,NA;!….!#Running SEM model!> sem <- sem::sem(semmodel,cor, N=781)!> summary(sem,fit.indices=c("GFI", "AGFI", "RMSEA", "NFI","NNFI", "CFI", "RNI", "IFI", "SRMR", "AIC", "AICc"))!> modIndices(sem)!> qgraph(sem,cut = 0.8,gray=TRUE)
Structural Equation Modeling
56
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Theoretical Framework
57
Use Case Leveraging Data Quality
Educational !performance
Healthcare!performance
Economical!performance
correla
tion
correlation
correlation
Conclusions
58
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Conclusions
58
Performing robust statistical analysis on Linked Data can lead to important and meaningful insights on publicly available data for societal progress measurement.!
Importance of performing use-case driven data quality assessment of datasets before their utilisation.
Use Case Leveraging Data Quality
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Contributions
59
Comprehensive survey !
18 data quality dimensions with definitions; 69 metrics!
12 tools compared according to 8 attributes!
Development and evaluation of data quality assessment methodologies!
User-driven - manual and semi-automated!
Crowdsourcing - experts vs. workers!
Semi-automated - application to a use case !
Consumption of Linked Data leveraging data quality
Future Work
60Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Future Work
60
Standardized Quality assessment methodology for Linked Data
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Future Work
60
Standardized Quality assessment methodology for Linked Data
Quality assessment tools for Linked Data
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Future Work
60
Standardized Quality assessment methodology for Linked Data
Quality assessment tools for Linked Data
Detection as well as improvement of quality issues before utilization in Linked Data use cases
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Conference Publications
61
Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon. ISWC 2013.!
Crowdsourcing Linked Data quality assessment. Maribel Acosta and Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann. ISWC 2013. !
User-driven Quality Evaluation of DBpedia. Amrapali Zaveri, Dimitris Kontokostas, Mohamed A. Sherif, Lorenz Bühmann, Mohamed Morsey, Sören Auer and Jens Lehmann. ISEMANTICS 2013.
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Journal Publications
62
Quality assessment methodologies for Linked Data: A Survey. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer. Semantic Web Journal 2015.!
Using Linked Data to build an Observatory of Societal Progress Indicators. Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Patrick Westphal, Jose Roberto Nascimento Junior, Luciano de Andrade, Cinzia Daraio, Jens Lehmann. Journal of Web Semantics 2014 (under review).!
Publishing and Interlinking the USPTO Patent Data. Amrapali Zaveri, Mofeed M. Hassan, Tariq Yousef, Sören Auer, Jens Lehmann. Semantic Web Journal 2014 (under review).
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri
Publications
63
No. of publications: 34 (Google Scholar),16 (DBLP)!
Citations: 251 !
h-index: 9; i-10 index: 8 (Google Scholar)
Linked Data Quality Assessment and its Application to Societal Progress Measurement A. Zaveri