Linkset quality (LWDM 2013)

Assessing Linkset Quality For Complementing Third Party Datasets

Riccardo Albertoni1,2, Asunción Gómez Pérez1

1Ontology Engineering GroupDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

2CNR-IMATI,Via De Marini, 6, Torre di Francia, 16149 Genova, Italy

3RD INTERNATIONAL WORKSHOP ON LINKED WEB DATA MANAGEMENT (LWDM 2013)

in conjunction with the 16th International Conference on Extending Database Technology (EDBT 2013)

March 22, 2013 - Genoa, Italy

2

Motivations

Riccardo Albertoni

LINKED DATA’s PROMISE: Evolving the Web into a Global Data SpaceIt should help to overcome data silos effect….

So many bubbles there,

THAT’S SO COOL!!

BUT ….

Can I exploit that third party

data for my OWN

ANALYSES?

3

Motivation

Riccardo Albertoni

What does this arrow mean ??

NO GROUND CONCEPT about

what makes a linkset suitable for a target

application

Well founded works on quality for datasets, but

Linksets are not yet directly addressed!SWDF

DBLP

4

What is Linkset Quality for?

Linked Data Publishers can check if a linkset they have provided

• is good enough or need to be improved; • is still good enough after one of the two target

datasets is updated.

Linked Data Consumers can • figure out if they can or can’t rely on a linkset;• have a first guess of what is the next move they can

take to improve the linkset;• rank possible linkset alternatives.

Riccardo Albertoni

5

foaf:made

a

Pub1

Pub2

b

foaf:made

Pub3

Pub4Yolanda Gil

DBLP Y

Linkset L

a owl:sameAs a’b owl:sameAs b’

XL

foaf:member

a’

Afflii5

Affili4

b’

foaf:member

Affili3

X

Journal 1

c’

Complementing a Dataset X via a Linkset L

≠

Complementation might introduce some “data missing”

The less “data missing” (like researcher c) are introduced the more the Linkset is complete

6

What is a Linkset ? (http://vocab.deri.ie/void)

Riccardo Albertoni

Every linkset is a special kind of dataset !!

Every linkset has two target datasets:Subject and Object datasets

Every linkset should have only one linking property

owl:sameAs linksets

7

Defining quality measures

Riccardo Albertoni

Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009

What to define providing a quality measure

Provided in this Linkset quality

Quality Indicator is an aspect of a data item or data set that may give an indication to the user of the suitability of the data for some intended use.

Entities TypesNumber of Entities for Types… …

Scoring Function namely, functions evaluating quality indicators to measure the suitability of the data for some intended use.

Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for Type

Aggregate Metric user-specified assessment metric built upon scoring functions. These aggregations produce new assessment values through the average, sum, max, min or threshold functions applied to the set of scoring functions.

Interpretation tables: interpretation on the scoring functions that helps in figuring out which is the next action to do

8


Riccardo Albertoni










9

INDICATORS: Examples on DBLP & SWDF

Riccardo Albertoni

foaf:Organization

foaf:Person

ro:FullPaperfoaf:Document

foaf:Agent

swr:Proceedingsswrc:Proceedings

DBLP SWDF

ro:ShortPaperro:PosterPaper

Type(DBLP) Type(SWDF)

#E4Type(foaf:Agent,DBLP)=1000000

#E4Type(foaf:Document,DBLP)=1984087

#E4Type(swrc:Proceedings,DBLP)=1108400

11

INDICATORS: Examples on DBLP & SWDF

Riccardo Albertoni

foaf:Organization

foaf:Person

ro:FullPaperfoaf:Document

foaf:Agent

swr:Proceedingsswrc:Proceedings

DBLP SWDF

L2

ro:PosterPaper


#E4Type(foaf:Agent,L2)=100

#E4Type(foaf:Person,L2)=100 Type(L2)

12

Quality indicators: Types

Riccardo Albertoni

Dataset/ Linkset

Power set on the possible User defined Types

e.g. owl:Class, owl:Restriction, skos:Concept,

skos:ConceptScheme

Returns the types of entities

exposed in a dataset or a

linkset

13

Quality indicators: # of Entity for a Type

Riccardo Albertoni

Dataset/ Linkset

One of the possible User defined Types

Set of (positive) integer

Returns the number of entities exposed in a dataset/ linkset for a given type

Blank nodes are left out

15


Riccardo Albertoni










16

SCORING FUNCTIONS: Linkset Type Coverage (1)

Riccardo Albertoni

foaf:Organization

foaf:Personfoaf:Agent

swrc:Proceedings

DBLP SWDF

L1


Complementing DBLP with L1, are we adding some new entities to DBLP?

DBLPL1 “imports” organizations for the researchers (foaf:Agent) involved in the linkset

17

SCORING FUNCTIONS: Linkset Type Coverage (2)

Riccardo Albertoni

foaf:Organization


swrc:Proceedings

DBLP SWDF


Complementing SWDF with L2, we don’t add any new type of entities

SWDFL2 has exactly the same kind of Entities of SWDF

swr:ProceedingsL2

18

Definition of Linkset Type Coverage

Riccardo Albertoni

LinksetTarget dataset

Considering a dataset X, What percentage of types of X that are also covered by the linkset?

19

SCORING FUNCTION: Ideas behind Type Completeness (1)

Riccardo Albertoni

foaf:Organization


swrc:Proceedings

DBLP SWDF

L1


L1 is type complete

It does not make sense to run a procedure ( e.g., SILK) trying to discover

interlinks between the instances of swrc:Proceedings and foaf:Organization!!!

20

SCORING FUNCTION: Ideas behind Type Completeness(2)

Riccardo Albertoni

foaf:Organization


swrc:Proceedings

DBLP SWDF

L1


swr:Proceedings

We should try to run a procedure ( e.g., SILK) trying to discover interlinks

between the instances of swrc:Proceedings and swr:Proceedings!!!

Alignment among classes

L1 is type incomplete

21

Formalization of Linkset Type Completeness

Riccardo Albertoni

LinksetTerget dataset 2

Target dataset 1

Types In the subject that are not considered in the linkset

returns the set of types that X have an equivalent in Y according to a relation of equivalence among classes

A linkset is complete with respect to types LTCom= 1LTCom<1 otherwise

22

Example on Type Completeness

Riccardo Albertoni

foaf:Organization


swrc:Proceedings

DBLP SWDF

L1


swr:ProceedingsL2

LTCom(L1,DBLP, SWDF) = 1- (|{swrc:Proceedings}| / |{swrc:Proceedings,foaf:Person}|)=1/2

LTCom(L2,DBLP, SWDF) = 1- (|{}| / |{swr:Proceedings,foaf:Person}|)=1

23

foaf:Organization


swrc:Proceedings

DBLP SWDF

L1

L1 and L2 are indistinguishable from the point of view of types

Which is the most interesting? L1 or L2? Or L1 U L2 ?

swr:ProceedingsL2

Linkset Entity Coverage for Type

Riccardo Albertoni

Number of Entity of type T in the linkset L

Number of Entity of type T in the Dataset X

How good is a linkset providing 100 owl:sameAs?

25


Riccardo Albertoni










26Riccardo Albertoni

Aggregate Metrics: Interpretation upon the presented score functions

Interpretation is summed up

as “decision tree”

27

Related work: (extended discussion in the paper)

• WIQA is a Information Quality Assessment Framework

• C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. WebSem., 7(1):110, 2009

• LOD2 • P. N. Mendes, C. Bizer, J. H. Young, Z. Miklos, J.-P.

Calbimonte, and A. Moraru. Conceptual model and best practices for high-quality metadata publishing.Technical report, PlanetData, Deliverable 2.1, 2012,http://planet-data-wiki.sti2.at/web/File:D2.1.pdf.

• PlanetData• P. N. Mendes and C. Bizer. Survey report state of the art

in mapping, quality assessment and data fusion. Technical report, LOD2- Creating Knowledge out of Interlinked data, Deliverable 4.3.1, 2011,http://static.lod2.eu/Deliverables

• SIEVE• P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linked

data quality assessment and fusion. In D. Srivastava and I. Ari, editors, LWDM EDBT/ICDT Workshops, pp. 116-123. ACM, 2012.

Riccardo Alberton

Contributes with a policy language, engine for interpreting such policies, Explanation if a piece of information

satisfies that policy

Quality criteria are parameters of the system It does not aim at proposing new

quality measures

Reviews quality dimensions

No indicators or criteria for completeness

Intensionally compl. : the schema contains all the necessary attributes,;Extensionally compl. : all instances re quired al present), LDS Completeness: relevant properties have a values

SIEVE deploys some of the idea developed in WIQA and LDS completeness

They don’t explicitly address quality for Linksets

28

Related work: (extended discussion in the paper)

• Link-QA• C. Gueret, P. T. Groth, C. Stadler, and J. Lehmann.

Assessing linked data mappings using network measures. In E. Simperl, P. Cimiano, A. Polleres, O. Corcho, and V. Presutti, editors, ESWC, volume 7295 of Lecture Notes in Computer Science, pp. 87-102. Springer, 2012

Riccardo Alberton

Different approach:They apply classic network measure such as degree, centrality, clustering coefficient +

open-sameAs chain, description richness for determining whether a bunch of links

improves the overall dataset quality

Quality of interlinking not for linksetLINK-QA works on links independently

of they are part or not of the same linksets;

LINK-QA addresses correctness and it does not deal with

Completeness

LINK-QA is for ranking sets of links, itcan be used to say a linkset is better than

another, but itdoes not suggest what is the next move

a consumer shouldtake to improve his linkset

29

Conclusions

Contribution: Quality measure on linksets• The only measure explicitly addressing linkset

completeness for dataset complementation• Formalization for indicators, score functions and

aggregation metrics; • A first proof of concept prototype (JAVA-JENA)

On-going and Future work• Validation on the LOD,

• How many “incomplete” Linksets can we detect in the LOD?

• Extension for considering others than owl:sameAs Linkset (e-g., skos:exactMatch)

• Other dimensions than completeness (e.g., Timeliness, Availability, Consistency)

Riccardo Albertoni

30

THANKS for your ATTENTION! [email protected]

Riccardo Albertoni

Linkset quality (LWDM 2013)

Technology

Transcript of Linkset quality (LWDM 2013)