Finding Commonalities: from Description Logics to the Web of Data

54

description

The talk covers my rPhD esearch work so far and has been given as an introductory presentation at the beginning of my visiting period at the Web&Media Group of the Vrije Universiteit, Amsterdam. At first, I introduce a system for automated knowledge management, I.M.P.A.K.T., which embeds a module for Core Competence extraction. The module is described as use case for the application of non-standard inference services based on Least Common Subsumer in Description Logics (DLs) to the problem of finding commonalities in knowledge bases modeled in DLs. Moreover, I present the Knowledge Compilation approach adopted for efficiently solving subsumption through only standard SQL queries. Then, I focus on my current investigation related to the possibility of expand Common Subsumer (CS) reasoning service to RDF datasets. Here, the formal definition of CS in RDF is given, together with a sketch of possible applications (e.g. clustering of RDF resources).

Transcript of Finding Commonalities: from Description Logics to the Web of Data

Page 1: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities in Linked Open Data

Silvia Giannini

PhD Student(Supervisor: Prof. Eugenio Di Sciascio)

Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI),Politecnico di Bari, Bari, Italy

in collaboration withProf. Francesco M. Donini, Ph.D. Simona Colucci

Web&Media Group Meeting | 31 March, 2014

Page 2: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Outline

1 Finding Commonalities: A DLs use caseThe I.M.P.A.K.T. systemThe Core Competence module

2 Finding Commonalities: the Web of Data

3 Conclusion

Silvia Giannini Finding commonalities in Linked Open Data

Page 3: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The I.M.P.A.K.T. system

What is I.M.P.A.K.T.

Information Management and Processing with the Aid ofKnowledge-based Technologies

An integrated system managing three enterprise business services based onknowledge management:

1 Skill Matching 1

2 Team Composition 2

3 Core Competence Extraction 3

1E. Tinelli, S. Colucci, S. Giannini, E. Di Sciascio, and F.M. Donini, Large scale skill matching

through knowledge compilation In: Proc. of ISMIS 2012, Springer-Verlag (2012) 192�201.2E. Tinelli, S. Colucci, E. Di Sciascio, and F.M. Donini, Knowledge compilation for automated team

composition exploiting standard SQL In: Proc. of SAC 2012, ACM (2012) 1680�1685.3S. Colucci, E. Tinelli, S. Giannini, E. Di Sciascio, and F.M. Donini, Knowledge Compilation for Core

Competence Extraction in Organizations In: Proc. of Business Information Systems 2013, Springer(2013) 163�174.

Silvia Giannini Finding commonalities in Linked Open Data

Page 4: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The I.M.P.A.K.T. system

What is I.M.P.A.K.T.

Information Management and Processing with the Aid ofKnowledge-based Technologies

An integrated system managing three enterprise business services based onknowledge management:

1 Skill Matching 1

2 Team Composition 2

3 Core Competence Extraction 3

1E. Tinelli, S. Colucci, S. Giannini, E. Di Sciascio, and F.M. Donini, Large scale skill matching

through knowledge compilation In: Proc. of ISMIS 2012, Springer-Verlag (2012) 192�201.2E. Tinelli, S. Colucci, E. Di Sciascio, and F.M. Donini, Knowledge compilation for automated team

composition exploiting standard SQL In: Proc. of SAC 2012, ACM (2012) 1680�1685.3S. Colucci, E. Tinelli, S. Giannini, E. Di Sciascio, and F.M. Donini, Knowledge Compilation for Core

Competence Extraction in Organizations In: Proc. of Business Information Systems 2013, Springer(2013) 163�174.

Silvia Giannini Finding commonalities in Linked Open Data

Page 5: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The I.M.P.A.K.T. system

What is I.M.P.A.K.T.

Skill Matching GUI

Silvia Giannini Finding commonalities in Linked Open Data

Page 6: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The I.M.P.A.K.T. system

Behind I.M.P.A.K.T.

An ontology for the HR domain (nearly 5000 concepts)

T -Box

Employee Profile(M

0)

Industry

(M1)

ComplementarySkill(M

2)

Level

(M3)

Language

(M5)

JobTitle(M

6)

Knowledge

(M4)

Main module M0: it models the properties (entry points) needed toimports all the sections describing an employee CV.

Silvia Giannini Finding commonalities in Linked Open Data

Page 7: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The I.M.P.A.K.T. system

Behind I.M.P.A.K.T.

An ontology for the HR domain (nearly 5000 concepts)

T -Box

Employee Profile(M

0)

Industry

(M1)

ComplementarySkill(M

2)

Level

(M3)

Language

(M5)

JobTitle(M

6)

Knowledge

(M4)

Possible employee skills and technical tools usage ability.

Speci�ed through:type - experience role (e.g., developer, administrator)year - experience levellastdate - last temporal update of work experience

Silvia Giannini Finding commonalities in Linked Open Data

Page 8: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The I.M.P.A.K.T. system

Behind I.M.P.A.K.T.

A Curriculum Vitae representation

A-Box

A pro�le P = u(∃R0j .C) is a concept in ALE(D), where R0

j , 1 ≤ j ≤ 6, isan entry point, and C is a concept in FL0(D) modeled in Mj .

Silvia Giannini Finding commonalities in Linked Open Data

Page 9: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

What is a Core Competence

Core Competence: a Knowledge Management process

"Core competencies are a company collective knowledge abouthow to coordinate diverse production skills and integrate multiple

streams of technologies. Identifying core comptencies helps in supportcompetitive advantage, articulate a strategic intent, and allocateresources to build cross-unit technological and production links."

(G. Hamel, and C.K.A. Prahalad, The core competence of the corporation. Harvard Business, in HarvardBusiness Review May-June (1990) 79�90)

Examples:

Apple - design

Net�ix - content delivery

Google - expertise in algorithms

. . .

Silvia Giannini Finding commonalities in Linked Open Data

Page 10: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The reasoning service

Objective: Automatically extract Core Competence, by identifying a commonknow-how in a signi�cant portion of personnel (k employees, with k set as athreshold value by the people in charge for the strategic analysis).

Tool:

Logic-based approachNon-standard inference services (LCS, k-CS, BICS)

Method:

Knowledge-compilation processIt solves subsumption only via SQL queries against a proper R-DB schema,without any exponential-time inference engine

Silvia Giannini Finding commonalities in Linked Open Data

Page 11: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

A logic-based approach

Least Common Subsumer (LCS)

Let C1, . . . , Cn be a collection of nconcepts in a DL L. The LeastCommon Subsumer (LCS) ofC1, . . . , Cn is a concept D in L suchthat D is the most speci�c conceptsubsuming all the elements of thecollection.

k-Common Subsumer (k-CS)

Let C1, . . . , Cn be a collection of nconcepts in a DL L and let k < n. Ak-Common Subsumer (k-CS) ofC1, . . . , Cn is a concept D in L suchthat D is an LCS of k concepts amongC1, . . . , Cn.

Informative k-Common Subsumer(IkCS)

Given k < n, an Informativek-Common Subsumer (IkCS) of theconcepts C1, . . . , Cn in a DL L is aconcept D such that D is a k-CSstricltly subsumed by theLCS(C1, . . . , Cn) and addinginformative content to it.

Best Informative Common Subsumer(BICS)

Given k < n, a Best InformativeCommon Subsumer (BICS) of theconcepts C1, . . . , Cn in a DL L is aconcept B such that B is an IkCS forC1, . . . , Cn, and for every k < j ≤ nevery j-CS is not informative.

Silvia Giannini Finding commonalities in Linked Open Data

Page 12: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Knowledge Compilation process

Issues:

Computational di�culties of deduction in knowledge bases expressedthrough a logical formalism;

Combining the representation power of a logical language, with thescalability and e�ciency of information processing in a DBMS.

Knowledge Compilation:

1 OFF-LINE REASONINGpre-processing of a company intellectual capital, described in a DescriptionLogics (DLs) Knowledge Base (KB), in an appropriate relational databaseschema.

2 ON-LINE REASONINGquerying of the data structure coming out from the �rst phase throughstandard SQL-queries for e�cient Core Competence Extraction.

Silvia Giannini Finding commonalities in Linked Open Data

Page 13: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

CV translation

Silvia Giannini Finding commonalities in Linked Open Data

Page 14: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

OFF-LINE REASONING: Relational schema design rules

T -Box informative contentTable CONCEPT: it stores CCNF of all the FL0(D) concepts (part (a))

Silvia Giannini Finding commonalities in Linked Open Data

Page 15: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

OFF-LINE REASONING: Relational schema design rules

T -Box informative contentA table is created for each entry point R0

j , j > 0 (part (b))

Silvia Giannini Finding commonalities in Linked Open Data

Page 16: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

OFF-LINE REASONING: Relational schema design rules

A-Box informative contentEach atom of CCNF(C) of a conjunct ∃R0

j .C is stored in a di�erent tupleof table Rj with the same groupID (part (b))

Silvia Giannini Finding commonalities in Linked Open Data

Page 17: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

OFF-LINE REASONING: Relational schema design rules

A-Box informative contentTable PROFILE includes pro�leID and extra-ontological structuredinformation (e.g., personal data, work-related information) (part (b))

Silvia Giannini Finding commonalities in Linked Open Data

Page 18: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

ON-LINE REASONING: The Core Competence Extraction Algorithm

1 Pro�les Subsumers Matrix computation

Idea: Extract the common know-how, expressed in form of atomicinformation, shared by the same group of employees, with cardinalitygreater or equal to k.

Example

Mario Rossi: Cplusplus (5 years), Java (5 years), Visual Basic (5 years)

Daniela Bianchi: Cplusplus (2 years), Java (6 years), Visual Basic (1 years)

Elena Pomarico: CplusPlus, Java, Visual Basic

Carmelo Piccolo: VBScript, Process Performance Monitoring

Lucio Battista: DBMS (2 years)

Mariangela Porro: DBMS (2 years), Internet Technologies (2 years)

Nicola Marco: DBMS (5 years), Internet Technologies (5 years)

Domenico De Palo: OOprogramming (6 years), Arti�cial intelligence (4 years), Internet technologies (4years)

Silvia Giannini Finding commonalities in Linked Open Data

Page 19: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

1 Pro�les Subsumers Matrix computation

Idea: Extract the common know-how, expressed in form of atomicinformation, shared by the same group of employees, with cardinalitygreater or equal to k.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

Silvia Giannini Finding commonalities in Linked Open Data

Page 20: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

1 Pro�les Subsumers Matrix computation

Idea: Extract the common know-how, expressed in form of atomicinformation, shared by the same group of employees, with cardinalitygreater or equal to k.

D1 ∃hasKnowledge.ComputerScienceSkillD2 ∃hasKnowledge.(ComputerScienceSkillu =2 years)D3 ∃hasKnowledge.ProgrammingLanguageD4 ∃hasKnowledge.OOPD5 ∃hasKnowledge.(ComputerScienceSkillu =5 years)D6 ∃hasKnowledge.(DBMSu =2 years)D7 ∃hasKnowledge.(OOPu =5 years)D8 ∃hasKnowledge.(InternetTechnologiesu =2 years)D9 ∃hasKnowledge.C++D10 ∃hasKnowledge.VisualBasicD11 ∃hasKnowledge.Java...

Table: Description of D1, . . . , D11 reported in the previous Table

Silvia Giannini Finding commonalities in Linked Open Data

Page 21: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

1 Pro�les Subsumers Matrix computation

Silvia Giannini Finding commonalities in Linked Open Data

Page 22: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

2 Common Subsumers enumeration

Referring to the PSM of the set P = {P (a1), . . . , P (an)}, and to a conceptcomponent Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is theunion of the most speci�c features (i.e., pro�le concept components Dj) sharedby the same group of k employees, where k is a prede�ned threshold.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

LCS = ∃hasKnowledge.ComputerScienceSkill

Silvia Giannini Finding commonalities in Linked Open Data

Page 23: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

2 Common Subsumers enumeration

Referring to the PSM of the set P = {P (a1), . . . , P (an)}, and to a conceptcomponent Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is theunion of the most speci�c features (i.e., pro�le concept components Dj) sharedby the same group of k employees, where k is a prede�ned threshold.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

BICS = ∃hasKnowledge.ComputerScienceSkillu =5 years

Silvia Giannini Finding commonalities in Linked Open Data

Page 24: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

2 Common Subsumers enumeration

Referring to the PSM of the set P = {P (a1), . . . , P (an)}, and to a conceptcomponent Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is theunion of the most speci�c features (i.e., pro�le concept components Dj) sharedby the same group of k employees, where k is a prede�ned threshold.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

ICS3 = ∃hasKnowledge.(DBMSu =2 years)

Silvia Giannini Finding commonalities in Linked Open Data

Page 25: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

2 Common Subsumers enumeration

Referring to the PSM of the set P = {P (a1), . . . , P (an)}, and to a conceptcomponent Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is theunion of the most speci�c features (i.e., pro�le concept components Dj) sharedby the same group of k employees, where k is a prede�ned threshold.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

ICS3 = ∃hasKnowledge.(OOPu =5 years)

Silvia Giannini Finding commonalities in Linked Open Data

Page 26: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

2 Common Subsumers enumeration

Referring to the PSM of the set P = {P (a1), . . . , P (an)}, and to a conceptcomponent Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is theunion of the most speci�c features (i.e., pro�le concept components Dj) sharedby the same group of k employees, where k is a prede�ned threshold.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

ICS3 = ∃hasKnowledge.(InternetTechnologiesu =2 years)

Silvia Giannini Finding commonalities in Linked Open Data

Page 27: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

The Core Competence Extraction Algorithm

2 Common Subsumers enumeration

Referring to the PSM of the set P = {P (a1), . . . , P (an)}, and to a conceptcomponent Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is theunion of the most speci�c features (i.e., pro�le concept components Dj) sharedby the same group of k employees, where k is a prede�ned threshold.

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...

1 1 1 1 1 1 0 1 0 1 1 1 ...

2 1 1 1 1 1 0 1 0 1 1 1 ...

3 1 1 0 0 0 1 0 0 0 0 0 ...

4 1 1 0 0 0 1 0 1 0 0 0 ...

5 1 1 0 0 1 1 0 1 0 0 0 ...

6 1 0 1 0 0 0 0 0 0 0 0 ...

7 1 0 1 1 0 0 0 0 1 1 1 ...

8 1 1 1 1 1 0 1 1 0 0 0 ...

Table: Portion of the previous Example Pro�le Subsumers Matrix

ICS3 = ∃hasKnowledge.(C++ u VisualBasic u Java)

Silvia Giannini Finding commonalities in Linked Open Data

Page 28: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

Core Competence module GUI

Silvia Giannini Finding commonalities in Linked Open Data

Page 29: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

Core Competence module GUI

Silvia Giannini Finding commonalities in Linked Open Data

Page 30: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

Core Competence module GUI

Silvia Giannini Finding commonalities in Linked Open Data

Page 31: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

Core Competence module GUI

Silvia Giannini Finding commonalities in Linked Open Data

Page 32: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

The Core Competence module

Lessons learned

Proposal: Knowledge Compilation approach for Core Competence Extraction.

+ It improves performances in terms of execution times, w.r.t. classicallogic-based approach.

+ It adopts standard SQL-queries to compute the same informative contentas advanced inference services.

+ It makes the computational costs of the process a�ordable also for largeorganizations, while retaining the full expressiveness of the logic-basedapproaches.

Notes on Performance:

The number of pro�les is highly relevant in the common subsumersenumeration process.

The most computationally expensive process is the pro�le subsumersmatrix creation, under a threshold of pro�les concept components.

Silvia Giannini Finding commonalities in Linked Open Data

Page 33: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Outline

1 Finding Commonalities: A DLs use case

2 Finding Commonalities: the Web of DataCommon Subsumer in RDFRDF Clustering

3 Conclusion

Silvia Giannini Finding commonalities in Linked Open Data

Page 34: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Motivation

Learning from the Web of Data:huge amount of interconnected and machine-understandable data

data modeled as RDF resources

dataset addressed as Linked (Open) Data (LOD).

Facts to learnidenti�cation of subsets of resources related to a common informativecontent

- Cluster search (approximate matching)- Disambiguation- Personalization

Silvia Giannini Finding commonalities in Linked Open Data

Page 35: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Problem De�nition

In analogy to the LCS service, proposed in DLs to learn from examples.

Adaptation to the Web of Data:

giving up to the subsumption minimality requirement: even roughCommon Subsumers are useful for learning in the Web of Data

de�nition of Common Subsumer of pairs of RDF resources

De�nition (Rooted Graph (r-graph))

Let TWr be the set of all triples with subject r in the Web. A Rooted Graph(r-graph) is a pair 〈r, Tr〉, where

1 r is either the URI of an RDF resource, or a blank node

2 Tr = {t | t = <<r p c>>} is a subset of relevant triples in TWr

Silvia Giannini Finding commonalities in Linked Open Data

Page 36: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: A Possible Representation for resources a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 37: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: A(nother) Possible Representation for resources a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 38: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Common Subsumer

De�nition (Common Subsumer)

Let 〈a, Ta〉, 〈b, Tb〉 be two r-graphs and x, w, y be blank nodes.

If 〈a, Ta〉 = 〈b, Tb〉, then 〈a, Ta〉 is a Common Subsumer of 〈a, Ta〉, 〈b, Tb〉.if Ta = ∅ or Tb = ∅, the pair 〈x, ∅〉 is a Common Subsumer of 〈a, Ta〉,〈b, Tb〉Otherwise, a pair 〈x, T 〉 is a Common Subsumer of 〈a, Ta〉, 〈b, Tb〉 i�:∃t = <<x w y>> such that (T entails t)

⇒ (1)

∃t1 = <<a p c>>, t2 = <<b q d>> such that(T entails t1) ∧ (T entails t2)where Ta ⊆ T, Tb ⊆ T and 〈w, T 〉 is a Common Subsumer of 〈p, Tp〉 and〈q, Tq〉, and 〈y, T 〉 is a Common Subsumer of 〈c, Tc〉 and 〈d, Td〉.

Note: We consider only simple entailment

Silvia Giannini Finding commonalities in Linked Open Data

Page 39: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a Common Subsumer of a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 40: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a Common Subsumer of a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 41: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a Common Subsumer of a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 42: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a Common Subsumer of a and b

Note: Triples with a blank node in predicate and object positions are discarded

Silvia Giannini Finding commonalities in Linked Open Data

Page 43: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a(nother) Common Subsumer of a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 44: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a(nother) Common Subsumer of a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 45: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a(nother) Common Subsumer of a and b

Silvia Giannini Finding commonalities in Linked Open Data

Page 46: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Example: a(nother) Common Subsumer of a and b

Note: Triples with a blank node in predicate and object positions are discarded

Silvia Giannini Finding commonalities in Linked Open Data

Page 47: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Common Subsumer in RDF

Solving Algorithm

Main Features:anytime: if interrupted, it always returns a Common Subsumer of theinput pair of RDF resourcesmodular: it takes as input a function computing the sets of triples relevantfor the input RDF resources

Our current criterion for triples selection:

triples within a given graph distance from the input resourcetriples having properties within to a selected set of signi�cant propertiesfor the dataset/application of interest

Output: A Common Subsumer of two r-graphs 〈a, Ta〉 and 〈b, Tb〉:a pair made up by a resource (anonymous or not) and a set of triplesstating facts about such a resource which are "true" for both a and b.Alternative cases:

〈_ : cs, T 〉: a blank node _ : cs together with a set of triples related to_ : cs.〈a, Ta〉, i� and 〈a, Ta〉 = 〈b, Tb〉〈_ : cs, ∅〉 if either Ta = ∅ or Tb = ∅

Silvia Giannini Finding commonalities in Linked Open Data

Page 48: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

RDF Clustering

Target Semantic Web Task

Clustering of Web resources with a CS

retrieving resources conveying the same informationin their di�erent RDF descriptions

CS description → SPARQL queries:WHERE { Tcs [blank nodes → variables] }

Silvia Giannini Finding commonalities in Linked Open Data

Page 49: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

RDF Clustering

Clustering with a CS: A use case

The Italian Chamber of Deputies LOD

Public SPARQL endpoint (http://dati.camera.it/sparql)

Running example: Find the commonalities between deputies Nilde Iotti

and Tina Anselmi in the 10th Legislature

Silvia Giannini Finding commonalities in Linked Open Data

Page 50: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

RDF Clustering

Clustering with a CS: A use case

The Italian Chamber of Deputies LOD

Public SPARQL endpoint (http://dati.camera.it/sparql)

Running example: Find the commonalities between deputies Nilde Iotti

and Tina Anselmi in the 10th Legislature

Silvia Giannini Finding commonalities in Linked Open Data

Page 51: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

RDF Clustering

Clustering with a CS: A use case

The Italian Chamber of Deputies LOD

Public SPARQL endpoint (http://dati.camera.it/sparql)

SELECT DISTINCT ?x0

WHERE{

?x0 a <http://dati.camera.it/ocd/deputato> .

?x0 <http:xmlns.comfoaf0.1gender> �female� .

?x0 <http://dati.camera.it/ocd/rif_mandatoCamera> ?x1 .

. . .}

Silvia Giannini Finding commonalities in Linked Open Data

Page 52: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

RDF Clustering

Clustering with a CS: A use case

1st Legislature clusters

Silvia Giannini Finding commonalities in Linked Open Data

Page 53: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Outline

1 Finding Commonalities: A DLs use case

2 Finding Commonalities: the Web of Data

3 Conclusion

Silvia Giannini Finding commonalities in Linked Open Data

Page 54: Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion

Conclusion

Motivation: learning shared informative content in collections of RDFresources

Problem De�nition: search for Common Subsumers not subsumptionminimal in order to ensure computability in the Web of Data, too large tobe explored

Results:An anytime algorithm computing Common Subsumers of pairs of RDFresources:

allowing for using partial learned informative content for further processing,whenever the search for Common Subsumers is interruptedpossibly supporting the clustering of collections of RDF resources, byexploiting associativity of Common Subsumers.

Future works:

Extension of CS de�nition to other entailment regimes

Investigation on methods for selection of relevant triples

Automated link traversal techniques for more dataset exploration

Application to data quality problems (e.g.,missing values)

Silvia Giannini Finding commonalities in Linked Open Data