FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data

Post on 07-Jan-2017

20 views 0 download

Transcript of FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data

Chemnitz University of Technology Prof. Dr.-Ing. Martin Gaedke & Team 29.11.2016

FAME.Q A Formal Approach to Master Quality in Enterprise Linked Data

André Langer and Martin Gaedke

Semantic Web and XML @ ICWI2016

VSR://IntelligentInformationManagement/LEDS

What we do

2

VSR://IntelligentInformationManagement/LEDS/Background 3

Linked Open Data Corporate Data Social Networks

Data Lake

Knowledge Graphs

Management of Background Knowledge

Data Quality and Coherence

Knowledge Extraction

Search in Linked Data

E-Commerce Applications

VSR://IntelligentInformationManagement/LEDS/Background 4

Linked Open Data Corporate Data Social Networks

Data Lake

Knowledge Graphs

Management of Background Knowledge

Data Quality and Coherence

Knowledge Extraction

Search in Linked Data

E-Commerce Applications

Main Focus

Initial

Question

5

What is Data Quality?

Which common definitions exist?

How can DQ be measured ?

VSR://IntelligentInformationManagement/LEDS/Motivation 6

Data Quality

• Is a multi-dimensional concept

• data that is „fit for use“ by data consumers (Wang & Strong, 1996; Strong, Lee & Wang, 1997b)

• data that is „free of defects and posesses desired features“ (Redman, 2001)

VSR://IntelligentInformationManagement/LEDS/Motivation 7

2.1 Simple Example

8

VSR://IntelligentInformationManagement/LEDS/Motivation 9

VSR://IntelligentInformationManagement/LEDS/Motivation 10

VSR://IntelligentInformationManagement/LEDS/Motivation 11

How can existing

definitions be formalized? ?

VSR://IntelligentInformationManagement/LEDS/Motivation 12

Approach

13

Data Quality characterizes

data to which degree it corresponds

to specific requirements

VSR://IntelligentInformationManagement/LEDS/Approach 14

Data Quality characterizes

data to which degree it corresponds

to specific requirements

VSR://IntelligentInformationManagement/LEDS/Approach 15

Context

Data Quality characterizes

data to which degree it corresponds

to specific requirements

VSR://IntelligentInformationManagement/LEDS/Approach 16

Context

metrics

Data Quality characterizes

data to which degree it corresponds

to specific requirements

VSR://IntelligentInformationManagement/LEDS/Approach 17

Context

metrics a percentage

VSR://IntelligentInformationManagement/LEDS/Approach 18

Context

metrics a percentage

Simplified Version

VSR://IntelligentInformationManagement/LEDS/Approach 19

Context

metrics a percentage

Simplified Version

VSR://IntelligentInformationManagement/LEDS/Approach 20

Common Quality dimensions and

appropriate metrics have already been

extensively classified by other authors

• Wang & Strong, 1996

• Zaveri et al, 2014

VSR://IntelligentInformationManagement/LEDS/Approach 21

Common Quality dimensions and

appropriate metrics have already been

extensively classified by other authors

• Wang & Strong, 1996

• Zaveri et al, 2014

Zaveri, A. et al., 2014. Quality Assessment for Linked Open Data: A Survey. Semantic Web Journal, 1, p. 22

VSR://IntelligentInformationManagement/LEDS/Approach 22

FAME.Q Quality Assessment Levels

Data Quality

Instance Level Schema Level Service Level

VSR://IntelligentInformationManagement/LEDS/Approach 23

Example calculation 1

VSR://IntelligentInformationManagement/LEDS/Approach 24

Example calculation 2

VSR://IntelligentInformationManagement/LEDS/Approach 25

Summary: What is Data Quality?

„fatal“ „perfect“

Conclusion

26

VSR://IntelligentInformationManagement/LEDS/Conclusion 27

Data Quality can be interpreted as the degree to which data fits to current requirements • Build upon and reuse existing definitions • Apply it to the field of the Semantic Web • Set it in a formalized schema

4.1 Future Steps

28

VSR://IntelligentInformationManagement/LEDS/Conclusion 29

VSR://IntelligentInformationManagement/LEDS/Conclusion 30

Several related Quality Measurement frameworks already exist(ed) with different result output capabilities • SWIQA (Fürber & Hepp, 2011a) • Luzzu (Debattista et al., 2015) • Roomba OpenData Checker (Assaf et al., 2015)

VSR://IntelligentInformationManagement/LEDS/Conclusion 31

We output the results of our Quality Assessment tool with the means of the data quality vocabulary (dqv)

VSR

Chemnitz University of Technology Prof. Dr.-Ing. Martin Gaedke & Team 29.11.2016

Inspired and Interested?

Andre.Langer@Informatik.TU-Chemnitz.de

VSR.Informatik.TU-Chemnitz.de

@andrelanger @myVSR /myVSR