Machine Learning Methods for Analysing and Linking RDF Data

68
Machine Learning Methods for Analysing and Linking RDF Data Jens Lehmann September 16, 2014 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35

description

Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM) The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.

Transcript of Machine Learning Methods for Analysing and Linking RDF Data

Page 1: Machine Learning Methods for Analysing and Linking RDF Data

Machine Learning Methodsfor Analysing and Linking RDF Data

Jens Lehmann

September 16, 2014

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35

Page 2: Machine Learning Methods for Analysing and Linking RDF Data

Structured Machine Learning

How to analysestructured data?

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35

Page 3: Machine Learning Methods for Analysing and Linking RDF Data

Structured Machine Learning

How to analysestructured data?

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35

Page 4: Machine Learning Methods for Analysing and Linking RDF Data

Detecting Prime Patterns: Series Finder

Construct "Modus operandi" of criminals - identified 9 new crimepatterns in Cambridge MA, USA

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 3 / 35

Wang, Tong, et al. "Detecting Patterns of Crime with Series Finder." AAAI 2013.

Page 5: Machine Learning Methods for Analysing and Linking RDF Data

Discovery of Laws of Physics

Background data generated using experimentsMathematical functions on input variables form hypothesis space

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 4 / 35

Schmidt, Lipson. "Distilling free-form natural laws from experimental data." Science 2009.

Page 6: Machine Learning Methods for Analysing and Linking RDF Data

Protein Interaction

Rules learned via Inductive Logic Programming (ProGolem)understandable by experts and competitive with statistical learnersPossibly better drug design and reduction of side effects

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 5 / 35

Santos et al. "Automated identification of protein-ligand interaction features using InductiveLogic Programming: a hexose binding case study." BMC Bioinformatics 2012.

Page 7: Machine Learning Methods for Analysing and Linking RDF Data

Background Knowledge

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35

Page 8: Machine Learning Methods for Analysing and Linking RDF Data

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

Page 9: Machine Learning Methods for Analysing and Linking RDF Data

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

Page 10: Machine Learning Methods for Analysing and Linking RDF Data

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

Page 11: Machine Learning Methods for Analysing and Linking RDF Data

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

Page 12: Machine Learning Methods for Analysing and Linking RDF Data

OWL Ontologies

Web Ontology Language (OWL) builds on RDF and DescriptionLogics

ObjectsSpecific resources (constants)Examples: MARIA, LEIPZIG

ClassesSets of objects (unary predicates)Examples: Student, Car, Country

PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf

Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35

Page 13: Machine Learning Methods for Analysing and Linking RDF Data

OWL Ontologies

Web Ontology Language (OWL) builds on RDF and DescriptionLogicsObjects

Specific resources (constants)Examples: MARIA, LEIPZIG

ClassesSets of objects (unary predicates)Examples: Student, Car, Country

PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf

Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35

Page 14: Machine Learning Methods for Analysing and Linking RDF Data

OWL Ontologies

Web Ontology Language (OWL) builds on RDF and DescriptionLogicsObjects

Specific resources (constants)Examples: MARIA, LEIPZIG

ClassesSets of objects (unary predicates)Examples: Student, Car, Country

PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf

Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35

Page 15: Machine Learning Methods for Analysing and Linking RDF Data

Learning OWL Class Expressions - Definition

Given:Background Knowledge (OWL ontologies and RDF datasets)Positive and negative examples (objects in datasets)

Goal:Find OWL class expression describing positive but not negativeexamples

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 9 / 35

Page 16: Machine Learning Methods for Analysing and Linking RDF Data

Application Example: Therapy Response Prediction

≈ 0.5-1% of population affected by Rheumatoid ArthritisAnti-TNF not effective for several million persons for unknown reasons

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 10 / 35

Page 17: Machine Learning Methods for Analysing and Linking RDF Data

Learning OWL Class Expressions - Approaches

Least common subsumersCohen et al. "Computing least common subsumers in descriptionlogics." AAAI 1992

Terminological decision treesFanizzi et al. "Induction of concepts in web ontologies throughterminological decision trees." ECML PKDD 2010

Rule-basedFanizzi et al. "DL-FOIL concept learning in description logics." ILP2008

Genetic ProgrammingLehmann, Jens. "Hybrid learning of ontology classes." MLDM 2007

Refinement operatorsLehmann et al. "Concept learning in description logics using refinementoperators." ML 2010Iannone et al. "An algorithm based on counterfactuals for conceptlearning in the semantic web." AI 2007

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 11 / 35

Page 18: Machine Learning Methods for Analysing and Linking RDF Data

Refinement Operators - Definitions

Given a DL L, consider the quasi-ordered space 〈C(L),vT 〉 overconcepts of Lρ : C(L)→ 2C(L) is a downward L refinement operator if for anyC ∈ C(L):

D ∈ ρ(C) implies D vT C

Notation: Write C ρ D instead of D ∈ ρ(C)Example refinement chain:

> ρ Person ρ Man ρ Man u ∃hasChild.>

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 12 / 35

Page 19: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamples

Operator specialisesContinue untilterminationcriterion met

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Page 20: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialises

Continue untilterminationcriterion met

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Page 21: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialises

Continue untilterminationcriterion met

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Page 22: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialisesContinue untilterminationcriterion met

=Learning Algorithm

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Page 23: Machine Learning Methods for Analysing and Linking RDF Data

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)

Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ E

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Page 24: Machine Learning Methods for Analysing and Linking RDF Data

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.

Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ E

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Page 25: Machine Learning Methods for Analysing and Linking RDF Data

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T D

Complete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ E

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Page 26: Machine Learning Methods for Analysing and Linking RDF Data

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ EJens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Page 27: Machine Learning Methods for Analysing and Linking RDF Data

Properties of Refinement Operators

Properties indicate how suitable a refinement operator is for solvingthe learning problem:

Incomplete operators may miss solutionsRedundant operators may lead to duplicate concepts in the search treeImproper operators may produce equivalent concepts (which cover thesame examples)For infinite operators it may not be possible to compute all refinementsof a given concept

Key question: Which properties can be combined?

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 15 / 35

Page 28: Machine Learning Methods for Analysing and Linking RDF Data

Theorem: Properties of L Refinement Operators

Theorem

Maximum sets of combinable properties of L refinement operators forL ∈ {ALC,ALCN ,SHOIN ,SROIQ} are:

1 {weakly complete, complete, finite}2 {weakly complete, complete, proper}3 {weakly complete, non-redundant, finite}4 {weakly complete, non-redundant, proper}5 {non-redundant, finite, proper}

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 16 / 35

Foundations of Refinement Operators for Description Logics; Lehmann, Hitzler, ILP confer-ence, 2008

Concept Learning in Description Logics Using Refinement Operators, Lehmann, Hitzler, Ma-chine Learning journal, 2010

Page 29: Machine Learning Methods for Analysing and Linking RDF Data

Definition of ρ

ρ(C) =

{{⊥} ∪ ρ>(C) if C = >ρ>(C) otherwise

ρB (C) =

∅ if C = ⊥{C1 t · · · t Cn | Ci ∈ MB (1 ≤ i ≤ n)} if C = >{A′ | A′ ∈ sh↓(A)} if C = A (A ∈ NC )∪{A u D | D ∈ ρB (>)}

{¬A′ | A′ ∈ sh↑(A)} if C = ¬A (A ∈ NC )∪{¬A u D | D ∈ ρB (>)}

{∃r.E | A = ar(r), E ∈ ρA(D)} if C = ∃r.D∪ {∃r.D u E | E ∈ ρB (>)}∪ {∃s.D | s ∈ sh↓(r)}

{∀r.E | A = ar(r), E ∈ ρA(D)} if C = ∀r.D∪ {∀r.D u E | E ∈ ρB (>)}∪ {∀r.⊥ |

D = A ∈ NC and sh↓(A) = ∅}∪ {∀s.D | s ∈ sh↓(r)}

{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

∪ {(C1 t · · · t Cn) u D |D ∈ ρB (>)}

Base Operator (Excerpt)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35

Page 30: Machine Learning Methods for Analysing and Linking RDF Data

Definition of ρ

ρ(C) =

{{⊥} ∪ ρ>(C) if C = >ρ>(C) otherwise

ρB (C) =

∅ if C = ⊥{C1 t · · · t Cn | Ci ∈ MB (1 ≤ i ≤ n)} if C = >{A′ | A′ ∈ sh↓(A)} if C = A (A ∈ NC )∪{A u D | D ∈ ρB (>)}

{¬A′ | A′ ∈ sh↑(A)} if C = ¬A (A ∈ NC )∪{¬A u D | D ∈ ρB (>)}

{∃r.E | A = ar(r), E ∈ ρA(D)} if C = ∃r.D∪ {∃r.D u E | E ∈ ρB (>)}∪ {∃s.D | s ∈ sh↓(r)}

{∀r.E | A = ar(r), E ∈ ρA(D)} if C = ∀r.D∪ {∀r.D u E | E ∈ ρB (>)}∪ {∀r.⊥ |

D = A ∈ NC and sh↓(A) = ∅}∪ {∀s.D | s ∈ sh↓(r)}

{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

∪ {(C1 t · · · t Cn) u D |D ∈ ρB (>)}

Base Operator (Excerpt)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35

Page 31: Machine Learning Methods for Analysing and Linking RDF Data

Definition of ρ

{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}

∪ {∃s.D | s ∈ sh↓(r)}

Examples:

∃takesPartIn.SocialEvent

∃takesPartIn.Meeting

Student u ∃takesPartIn.SocialEvent

∃leads.SocialEvent

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35

Page 32: Machine Learning Methods for Analysing and Linking RDF Data

Definition of ρ

{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}

∪ {∃s.D | s ∈ sh↓(r)}

Examples:

∃takesPartIn.SocialEvent

∃takesPartIn.Meeting

Student u ∃takesPartIn.SocialEvent

∃leads.SocialEvent

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35

Page 33: Machine Learning Methods for Analysing and Linking RDF Data

Definition of ρ

{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}

∪ {∃s.D | s ∈ sh↓(r)}

Examples:

∃takesPartIn.SocialEvent

∃takesPartIn.Meeting

Student u ∃takesPartIn.SocialEvent

∃leads.SocialEvent

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35

Page 34: Machine Learning Methods for Analysing and Linking RDF Data

Properties of ρ

ρ↓ is completeρ↓ is infinite, e.g. there are infinitely many refinement steps of theform:

> ρ↓ C1 t C2 t C3 t . . .

ρcl↓ is properρ↓ is redundant: ∀r1.A1 t ∀r2.A1 ρ↓ ∀r1.(A1 u A2) t ∀r2.A1

ρ↓

ρ↓

∀r1.A1 t ∀r2.(A1 u A2) ρ↓ ∀r1.(A1 u A2) t ∀r2.(A1 u A2)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 19 / 35

“DL-Learner: Learning Concepts in Description Logics”,Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009

Page 35: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 36: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 37: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 38: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 39: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 40: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 41: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 42: Machine Learning Methods for Analysing and Linking RDF Data

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Page 43: Machine Learning Methods for Analysing and Linking RDF Data

Scalability

Refinement operator should build coherent concepts

Inference:Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Page 44: Machine Learning Methods for Analysing and Linking RDF Data

Scalability

Refinement operator should build coherent conceptsInference:

Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Page 45: Machine Learning Methods for Analysing and Linking RDF Data

Scalability

Refinement operator should build coherent conceptsInference:

Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Page 46: Machine Learning Methods for Analysing and Linking RDF Data

Scalability

Refinement operator should build coherent conceptsInference:

Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Page 47: Machine Learning Methods for Analysing and Linking RDF Data

Carcinogenesis

Goal: predict whether substance causes cancerWhy:

Each year 1000 new substances developedSubstances can often be only be validated using time consuming andexpensive experiments with mice → prioritise those with high risk

Background knowledge:Database of the US National Toxicology Program (NTP)

“Obtaining accurate structural alerts for the causes of chemical cancers isa problem of great scientific and humanitarian value.” (A. Srinivasan, R.D.King, S.H. Muggleton, M.J.E. Sternberg 1997)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 22 / 35

Page 48: Machine Learning Methods for Analysing and Linking RDF Data

Knowledge Base Enrichment

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 23 / 35

Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; InternationalSemantic Web Conference (ISWC) 2013Universal OWL Axiom Enrichment for Large Knowledge Bases; Lorenz Bühmann, JensLehmann; Knowledge Engineering and Knowledge Management (EKAW) 2012

Page 49: Machine Learning Methods for Analysing and Linking RDF Data

Protégé Plugin

Support for ontology creation and maintenance

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 24 / 35

Page 50: Machine Learning Methods for Analysing and Linking RDF Data

Ontology Debugging: ORE

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 25 / 35

ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional Semantic Web Conference (ISWC) 2010

Page 51: Machine Learning Methods for Analysing and Linking RDF Data

Data Quality Measurement: RDFUnit

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 26 / 35

Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW),ACM, 2014; Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, JensLehmann, Roland Cornelissen, Amrapali J. Zaveri

Page 52: Machine Learning Methods for Analysing and Linking RDF Data

Robot Scientists Adam & Eve

Abduction to form hypothesis and ≈ 1 000 experiments per day12 new scientific discoveries regarding functions of genes in yeast

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 27 / 35

King, Ross D et al. "The automation of science." Science 324 (2009): 85-89.

Page 53: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Motivation

Links are backbone of traditional WWW and Data WebLinks are central for data integration, deduplication, cross-ontologyquestion answering, reasoning, federated queries . . .Central problem for many large IT companies

Automated tools (LIMES, SILK) can create a high number of linksbetween RDF resources by using heuristics

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35

Page 54: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Motivation

Links are backbone of traditional WWW and Data WebLinks are central for data integration, deduplication, cross-ontologyquestion answering, reasoning, federated queries . . .Central problem for many large IT companies

Automated tools (LIMES, SILK) can create a high number of linksbetween RDF resources by using heuristics

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35

Page 55: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Page 56: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"

dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S.rdfs:label,T .dc:title)

δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Page 57: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Page 58: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)

δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Page 59: Machine Learning Methods for Analysing and Linking RDF Data

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Page 60: Machine Learning Methods for Analysing and Linking RDF Data

Example: Link Specification

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 30 / 35

Page 61: Machine Learning Methods for Analysing and Linking RDF Data

Link Specification Syntax and Semantics

LS [[LS]]f (m, θ,M) {(s, t, r)|(s, t, r) ∈ M ∧ (m(s, t) ≥ θ)}LS1 u LS2 {(s, t, r) | (s, t, r1) ∈ [[L1]] ∧ (s, t, r2) ∈ [[L2]] ∧ r = min(r1, r2)}

LS1 t LS2

(s, t, r) |

r = r1 if ∃(s, t, r1) ∈ [[L1]] ∧ ¬(∃r2 : (s, t, r2) ∈ [[L2]]),

r = r2 if ∃(s, t, r2) ∈ [[L2]] ∧ ¬(∃r1 : (s, t, r1) ∈ [[L1]]),

r = max(r1, r2) if (s, t, r1) ∈ [[L1]] ∧ (s, t, r2) ∈ [[L2]].

Syntax and semantics allow to define an ordering similar tosubsumption (more specific specs generate less links)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 31 / 35

Page 62: Machine Learning Methods for Analysing and Linking RDF Data

Link Specification Refinement Operator

ρ↓(LS) =

{f (m1, 1,∆) u · · · u f (mn, 1,∆) if LS = ⊥| mi ∈ SM, 1 ≤ i ≤ n, n ≤ 2|SM|}f (m, dt(θ),M) ∪ LS t f (m′, 1,M) if LS = f (m, θ,M) (atomic)(m ∈ SM,m 6= m′)LS1 u · · · u LSi−1 u LS ′ u LSi+1 u · · · u LSn if LS = LS1 u · · · u LSn(n ≥ 2)

with LS ′ ∈ ρ↓(LSi)

LS1 t · · · t LSi−1 t LS ′ t LSi+1 t · · · t LSn if LS = LS1 t · · · t LSn(n ≥ 2)

with LS ′ ∈ ρ↓(LSi) ∪ LS t f (m, 1,M)

(m ∈ SM,m not used in LS)

Upward refinement operatorPostitive: Weakly complete, finiteNegative: Not complete, redundant, not proper

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 32 / 35

Page 63: Machine Learning Methods for Analysing and Linking RDF Data

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Page 64: Machine Learning Methods for Analysing and Linking RDF Data

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Page 65: Machine Learning Methods for Analysing and Linking RDF Data

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Page 66: Machine Learning Methods for Analysing and Linking RDF Data

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Page 67: Machine Learning Methods for Analysing and Linking RDF Data

Projects: DL-Learner and LIMES

DL-LearnerOpen-Source-Project: http://dl-learner.orgExtensible Platform for concept learning algorithmsSupports all RDF/OWL serialisations and major reasonersSeveral thousand downloads

LIMES (http://aksw.org/Projects/LIMES.html)Highly scalable engine (fastest RDF link discovery tool)Several machine learning approaches integrated (including the onepresented)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 34 / 35

“DL-Learner: Learning Concepts in Description Logics”,Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009

Page 68: Machine Learning Methods for Analysing and Linking RDF Data

Summary & Conclusions

Many interesting applications of structured machine learning (therapyresponse prediction, disease prediction, protein folding, data qualitymeasurement, ontology debugging)Still few machine learning tools for working with RDF/OWL althoughmore and more data availableRefinement operators allow to apply supervised machine learning oncomplex background knowledgeCan be applied to other languages like link specifications

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 35 / 35