Machine Learning Methods for Analysing and Linking RDF Data

Post on 05-Dec-2014

385 views 0 download

description

Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM) The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.

Transcript of Machine Learning Methods for Analysing and Linking RDF Data

Machine Learning Methodsfor Analysing and Linking RDF Data

Jens Lehmann

September 16, 2014

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35

Structured Machine Learning

How to analysestructured data?

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35

Structured Machine Learning

How to analysestructured data?

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35

Detecting Prime Patterns: Series Finder

Construct "Modus operandi" of criminals - identified 9 new crimepatterns in Cambridge MA, USA

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 3 / 35

Wang, Tong, et al. "Detecting Patterns of Crime with Series Finder." AAAI 2013.

Discovery of Laws of Physics

Background data generated using experimentsMathematical functions on input variables form hypothesis space

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 4 / 35

Schmidt, Lipson. "Distilling free-form natural laws from experimental data." Science 2009.

Protein Interaction

Rules learned via Inductive Logic Programming (ProGolem)understandable by experts and competitive with statistical learnersPossibly better drug design and reduction of side effects

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 5 / 35

Santos et al. "Automated identification of protein-ligand interaction features using InductiveLogic Programming: a hexose binding case study." BMC Bioinformatics 2012.

Background Knowledge

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

RDF and the Linked Data Principles

RDF Triple:

Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸

Subject

http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate

http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object

The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.

Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35

OWL Ontologies

Web Ontology Language (OWL) builds on RDF and DescriptionLogics

ObjectsSpecific resources (constants)Examples: MARIA, LEIPZIG

ClassesSets of objects (unary predicates)Examples: Student, Car, Country

PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf

Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35

OWL Ontologies

Web Ontology Language (OWL) builds on RDF and DescriptionLogicsObjects

Specific resources (constants)Examples: MARIA, LEIPZIG

ClassesSets of objects (unary predicates)Examples: Student, Car, Country

PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf

Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35

OWL Ontologies

Web Ontology Language (OWL) builds on RDF and DescriptionLogicsObjects

Specific resources (constants)Examples: MARIA, LEIPZIG

ClassesSets of objects (unary predicates)Examples: Student, Car, Country

PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf

Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35

Learning OWL Class Expressions - Definition

Given:Background Knowledge (OWL ontologies and RDF datasets)Positive and negative examples (objects in datasets)

Goal:Find OWL class expression describing positive but not negativeexamples

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 9 / 35

Application Example: Therapy Response Prediction

≈ 0.5-1% of population affected by Rheumatoid ArthritisAnti-TNF not effective for several million persons for unknown reasons

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 10 / 35

Learning OWL Class Expressions - Approaches

Least common subsumersCohen et al. "Computing least common subsumers in descriptionlogics." AAAI 1992

Terminological decision treesFanizzi et al. "Induction of concepts in web ontologies throughterminological decision trees." ECML PKDD 2010

Rule-basedFanizzi et al. "DL-FOIL concept learning in description logics." ILP2008

Genetic ProgrammingLehmann, Jens. "Hybrid learning of ontology classes." MLDM 2007

Refinement operatorsLehmann et al. "Concept learning in description logics using refinementoperators." ML 2010Iannone et al. "An algorithm based on counterfactuals for conceptlearning in the semantic web." AI 2007

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 11 / 35

Refinement Operators - Definitions

Given a DL L, consider the quasi-ordered space 〈C(L),vT 〉 overconcepts of Lρ : C(L)→ 2C(L) is a downward L refinement operator if for anyC ∈ C(L):

D ∈ ρ(C) implies D vT C

Notation: Write C ρ D instead of D ∈ ρ(C)Example refinement chain:

> ρ Person ρ Man ρ Man u ∃hasChild.>

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 12 / 35

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamples

Operator specialisesContinue untilterminationcriterion met

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialises

Continue untilterminationcriterion met

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialises

Continue untilterminationcriterion met

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Learning using Refinement Operators

>0,45

Cartoo weak

Person0,73

Person u ∃attends.>0,78

Person u ∃attends.Talk

0,97. . .

. . .

. . .

Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialisesContinue untilterminationcriterion met

=Learning Algorithm

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)

Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ E

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.

Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ E

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T D

Complete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ E

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Properties of Refinement Operators

An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.

C

C1 . . . . . . Cn

C

E . . .

D

C

C ≡ E

C

. . .

D ≡ EJens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35

Properties of Refinement Operators

Properties indicate how suitable a refinement operator is for solvingthe learning problem:

Incomplete operators may miss solutionsRedundant operators may lead to duplicate concepts in the search treeImproper operators may produce equivalent concepts (which cover thesame examples)For infinite operators it may not be possible to compute all refinementsof a given concept

Key question: Which properties can be combined?

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 15 / 35

Theorem: Properties of L Refinement Operators

Theorem

Maximum sets of combinable properties of L refinement operators forL ∈ {ALC,ALCN ,SHOIN ,SROIQ} are:

1 {weakly complete, complete, finite}2 {weakly complete, complete, proper}3 {weakly complete, non-redundant, finite}4 {weakly complete, non-redundant, proper}5 {non-redundant, finite, proper}

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 16 / 35

Foundations of Refinement Operators for Description Logics; Lehmann, Hitzler, ILP confer-ence, 2008

Concept Learning in Description Logics Using Refinement Operators, Lehmann, Hitzler, Ma-chine Learning journal, 2010

Definition of ρ

ρ(C) =

{{⊥} ∪ ρ>(C) if C = >ρ>(C) otherwise

ρB (C) =

∅ if C = ⊥{C1 t · · · t Cn | Ci ∈ MB (1 ≤ i ≤ n)} if C = >{A′ | A′ ∈ sh↓(A)} if C = A (A ∈ NC )∪{A u D | D ∈ ρB (>)}

{¬A′ | A′ ∈ sh↑(A)} if C = ¬A (A ∈ NC )∪{¬A u D | D ∈ ρB (>)}

{∃r.E | A = ar(r), E ∈ ρA(D)} if C = ∃r.D∪ {∃r.D u E | E ∈ ρB (>)}∪ {∃s.D | s ∈ sh↓(r)}

{∀r.E | A = ar(r), E ∈ ρA(D)} if C = ∀r.D∪ {∀r.D u E | E ∈ ρB (>)}∪ {∀r.⊥ |

D = A ∈ NC and sh↓(A) = ∅}∪ {∀s.D | s ∈ sh↓(r)}

{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

∪ {(C1 t · · · t Cn) u D |D ∈ ρB (>)}

Base Operator (Excerpt)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35

Definition of ρ

ρ(C) =

{{⊥} ∪ ρ>(C) if C = >ρ>(C) otherwise

ρB (C) =

∅ if C = ⊥{C1 t · · · t Cn | Ci ∈ MB (1 ≤ i ≤ n)} if C = >{A′ | A′ ∈ sh↓(A)} if C = A (A ∈ NC )∪{A u D | D ∈ ρB (>)}

{¬A′ | A′ ∈ sh↑(A)} if C = ¬A (A ∈ NC )∪{¬A u D | D ∈ ρB (>)}

{∃r.E | A = ar(r), E ∈ ρA(D)} if C = ∃r.D∪ {∃r.D u E | E ∈ ρB (>)}∪ {∃s.D | s ∈ sh↓(r)}

{∀r.E | A = ar(r), E ∈ ρA(D)} if C = ∀r.D∪ {∀r.D u E | E ∈ ρB (>)}∪ {∀r.⊥ |

D = A ∈ NC and sh↓(A) = ∅}∪ {∀s.D | s ∈ sh↓(r)}

{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)

∪ {(C1 t · · · t Cn) u D |D ∈ ρB (>)}

Base Operator (Excerpt)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35

Definition of ρ

{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}

∪ {∃s.D | s ∈ sh↓(r)}

Examples:

∃takesPartIn.SocialEvent

∃takesPartIn.Meeting

Student u ∃takesPartIn.SocialEvent

∃leads.SocialEvent

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35

Definition of ρ

{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}

∪ {∃s.D | s ∈ sh↓(r)}

Examples:

∃takesPartIn.SocialEvent

∃takesPartIn.Meeting

Student u ∃takesPartIn.SocialEvent

∃leads.SocialEvent

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35

Definition of ρ

{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}

∪ {∃s.D | s ∈ sh↓(r)}

Examples:

∃takesPartIn.SocialEvent

∃takesPartIn.Meeting

Student u ∃takesPartIn.SocialEvent

∃leads.SocialEvent

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35

Properties of ρ

ρ↓ is completeρ↓ is infinite, e.g. there are infinitely many refinement steps of theform:

> ρ↓ C1 t C2 t C3 t . . .

ρcl↓ is properρ↓ is redundant: ∀r1.A1 t ∀r2.A1 ρ↓ ∀r1.(A1 u A2) t ∀r2.A1

ρ↓

ρ↓

∀r1.A1 t ∀r2.(A1 u A2) ρ↓ ∀r1.(A1 u A2) t ∀r2.(A1 u A2)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 19 / 35

“DL-Learner: Learning Concepts in Description Logics”,Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Learning using Refinement Operators

>0,47 [0]0,45 [1]

Cartoo weak

Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]

Person u ∃attends.>0,79 [4]0,78 [5]

Person u ∃attends.Talk

0,97 [4]. . .

. . .

. . .

Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35

Scalability

Refinement operator should build coherent concepts

Inference:Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Scalability

Refinement operator should build coherent conceptsInference:

Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Scalability

Refinement operator should build coherent conceptsInference:

Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Scalability

Refinement operator should build coherent conceptsInference:

Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)

Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)

Fragment extraction for application on large knowledge bases

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35

Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011

Carcinogenesis

Goal: predict whether substance causes cancerWhy:

Each year 1000 new substances developedSubstances can often be only be validated using time consuming andexpensive experiments with mice → prioritise those with high risk

Background knowledge:Database of the US National Toxicology Program (NTP)

“Obtaining accurate structural alerts for the causes of chemical cancers isa problem of great scientific and humanitarian value.” (A. Srinivasan, R.D.King, S.H. Muggleton, M.J.E. Sternberg 1997)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 22 / 35

Knowledge Base Enrichment

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 23 / 35

Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; InternationalSemantic Web Conference (ISWC) 2013Universal OWL Axiom Enrichment for Large Knowledge Bases; Lorenz Bühmann, JensLehmann; Knowledge Engineering and Knowledge Management (EKAW) 2012

Protégé Plugin

Support for ontology creation and maintenance

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 24 / 35

Ontology Debugging: ORE

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 25 / 35

ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional Semantic Web Conference (ISWC) 2010

Data Quality Measurement: RDFUnit

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 26 / 35

Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW),ACM, 2014; Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, JensLehmann, Roland Cornelissen, Amrapali J. Zaveri

Robot Scientists Adam & Eve

Abduction to form hypothesis and ≈ 1 000 experiments per day12 new scientific discoveries regarding functions of genes in yeast

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 27 / 35

King, Ross D et al. "The automation of science." Science 324 (2009): 85-89.

Link Discovery - Motivation

Links are backbone of traditional WWW and Data WebLinks are central for data integration, deduplication, cross-ontologyquestion answering, reasoning, federated queries . . .Central problem for many large IT companies

Automated tools (LIMES, SILK) can create a high number of linksbetween RDF resources by using heuristics

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35

Link Discovery - Motivation

Links are backbone of traditional WWW and Data WebLinks are central for data integration, deduplication, cross-ontologyquestion answering, reasoning, federated queries . . .Central problem for many large IT companies

Automated tools (LIMES, SILK) can create a high number of linksbetween RDF resources by using heuristics

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"

dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S.rdfs:label,T .dc:title)

δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)

δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Link Discovery - Definition

Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

S: DBpedia

rdfs:label: "African Elephant"

T: BBC Wildlife

dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?

δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35

Example: Link Specification

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 30 / 35

Link Specification Syntax and Semantics

LS [[LS]]f (m, θ,M) {(s, t, r)|(s, t, r) ∈ M ∧ (m(s, t) ≥ θ)}LS1 u LS2 {(s, t, r) | (s, t, r1) ∈ [[L1]] ∧ (s, t, r2) ∈ [[L2]] ∧ r = min(r1, r2)}

LS1 t LS2

(s, t, r) |

r = r1 if ∃(s, t, r1) ∈ [[L1]] ∧ ¬(∃r2 : (s, t, r2) ∈ [[L2]]),

r = r2 if ∃(s, t, r2) ∈ [[L2]] ∧ ¬(∃r1 : (s, t, r1) ∈ [[L1]]),

r = max(r1, r2) if (s, t, r1) ∈ [[L1]] ∧ (s, t, r2) ∈ [[L2]].

Syntax and semantics allow to define an ordering similar tosubsumption (more specific specs generate less links)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 31 / 35

Link Specification Refinement Operator

ρ↓(LS) =

{f (m1, 1,∆) u · · · u f (mn, 1,∆) if LS = ⊥| mi ∈ SM, 1 ≤ i ≤ n, n ≤ 2|SM|}f (m, dt(θ),M) ∪ LS t f (m′, 1,M) if LS = f (m, θ,M) (atomic)(m ∈ SM,m 6= m′)LS1 u · · · u LSi−1 u LS ′ u LSi+1 u · · · u LSn if LS = LS1 u · · · u LSn(n ≥ 2)

with LS ′ ∈ ρ↓(LSi)

LS1 t · · · t LSi−1 t LS ′ t LSi+1 t · · · t LSn if LS = LS1 t · · · t LSn(n ≥ 2)

with LS ′ ∈ ρ↓(LSi) ∪ LS t f (m, 1,M)

(m ∈ SM,m not used in LS)

Upward refinement operatorPostitive: Weakly complete, finiteNegative: Not complete, redundant, not proper

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 32 / 35

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Refinement Chain Example

f (edit(:socId, :socId), 1.0)

f (edit(:socId, :socId), 0.5)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)

t

f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35

Projects: DL-Learner and LIMES

DL-LearnerOpen-Source-Project: http://dl-learner.orgExtensible Platform for concept learning algorithmsSupports all RDF/OWL serialisations and major reasonersSeveral thousand downloads

LIMES (http://aksw.org/Projects/LIMES.html)Highly scalable engine (fastest RDF link discovery tool)Several machine learning approaches integrated (including the onepresented)

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 34 / 35

“DL-Learner: Learning Concepts in Description Logics”,Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009

Summary & Conclusions

Many interesting applications of structured machine learning (therapyresponse prediction, disease prediction, protein folding, data qualitymeasurement, ontology debugging)Still few machine learning tools for working with RDF/OWL althoughmore and more data availableRefinement operators allow to apply supervised machine learning oncomplex background knowledgeCan be applied to other languages like link specifications

Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 35 / 35