Reasoning on Data

102
Reasoning on Data: The Ontology-Mediated Query Answering Problem Marie-Laure Mugnier University of Montpellier UNILOG School – Vichy 2018

Transcript of Reasoning on Data

Page 1: Reasoning on Data

ReasoningonData:

TheOntology-MediatedQueryAnsweringProblem

Marie-Laure Mugnier

University of Montpellier

UNILOGSchool–Vichy–2018

Page 2: Reasoning on Data

KNOWLEDGEREPRESENTATIONANDREASONING(KR)

•  A field historically at the heart of Artificial Intelligence

•  Study formalisms (or languages) to

•  represent various kinds of human knowledge •  do reasoning on these representations

•  along the tradeoff expressivity / tractability of reasoning

à KR languages based on computational logic In this talk: classical first-order logic (FOL)

Major conferences: IJCAI, AAAI, KR

M.-L.Mugnier–UNILOGSchool– 2018 2

Page 3: Reasoning on Data

Part1:Basics

Knowledge bases, Ontologies Logical view of Queries and Data Main KR formalisms to represent and reason with ontologies Ontology-Mediated Query Answering

Part2:KRformalismsandalgorithmicapproaches

Part3:DecidabilityissuesintheexistenFalruleframework

OVERVIEWOFTHETUTORIAL

M.-L.Mugnier–UNILOGSchool– 2018 3

Page 4: Reasoning on Data

Knowledge Base (KB)

Reasoning Services

•  General knowledge on the application domain « Cats are Mammals »

•  Factual Knowledge Description of specific individuals, situations, ...

Félix is a Cat

Factbase, Database

Fundamental tasks

•  Checking the consistency of the KB

•  Computing answers to a query over the KB

...

KNOWLEDGEBASEDSYSTEMS

Ontology

Knowledge expressed in a KR language

Reasoning algorithms associated with the KR language

M.-L.Mugnier–UNILOGSchool– 2018 4

Page 5: Reasoning on Data

WHATISANONTOLOGY?

In computer science: a formal specification of the knowledge of a particular domain Ø  which allows for machine processing Ø  that relies on the semantics of knowledge

> automated reasoning

Such a specification consists of Ø  a vocabulary in terms of concepts and relations Ø  semantic relationships between these elements

M.-L.Mugnier–UNILOGSchool– 2018 5

Page 6: Reasoning on Data

EXAMPLESOFONTOLOGIES

¢  Medecine and life sciences : hundreds of available ontologies

�  general medical ontologies

SNOMED CT (400 000 terms) GALEN (> 30 000 terms)

�  specialized medical ontologies FMA (anatomy) NCI (cancer), ...

�  biology �  agronomy

¢  Information systems of large organizations and corporations

M.-L.Mugnier–UNILOGSchool– 2018 6

Page 7: Reasoning on Data

ATTHECOREOFONTOLOGIES:CONCEPTS/CLASSES

¢  Concept/class:acategoryofenSSes(objects)thatshareproperSes InFOL:unarypredicate:Cat,Mammal

¢  Instanceofclass:aspecificmemberofthisclass

¢  FundamentalrelaSononclasses:specializaFon(subsumpFon)(«isakindof»,«subclassof»)

SemanScs:everyinstanceofC1isalsoinstanceofC2

C1

C2

C1 subclass of C2

�x (C1(x) à C2(x)) �x (Cat(x) à Mammal(x))

In FOL: a term (variable or constant)

M.-L.Mugnier–UNILOGSchool– 2018 7

Page 8: Reasoning on Data

ATTHEHEARTOFONTOLOGIES:CONCEPTS/CLASSES

Disease

LungDisease

Pneumonia

InfecSousPneumonia

InfecSousDisease

BacterialDisease

ViralDisease

BacterialPneumonia

Legionella

InfecSousAgent

Bacteria Virus

OrganismDisorder

However, an ontology is not just a classification!

Concepts organized by specialization

(SNOMEDCT)

M.-L.Mugnier–UNILOGSchool– 2018 8

Page 9: Reasoning on Data

ONTOLOGIESAREMUCHMORETHANCLASSIFICATIONS

Vocabulary 1.  concepts / classes

2.  relations (between instances)

« An ontology specifies the vocabulary of an application domain and semantic relationships between the terms of the vocabulary»

+ semantic relationships between concepts

+ semantic relationships between relations

+ properties of relations

+ other axioms that more generally express domain knowledge

+ properties of concepts

M.-L.Mugnier–UNILOGSchool– 2018 9

Page 10: Reasoning on Data

RELATIONSBETWEENINSTANCES

Often these are binary relations (also called « roles » or « properties »)

subrelation of

SignatureofarelaSon:assignsamaximumconcepttoeachargument(«domain»and«range»inOWL)

�x�y (hasCA(x,y) à dueTo(x,y))

dueTo

hasCausaFveAgent

�x�y (hasCA(x,y) à Disease(x) � Organism(y))

Argument1:aDisorderArgument2:anOrganismor[...]

Argument1:aDiseaseArgument2:anOrganism

M.-L.Mugnier–UNILOGSchool– 2018 10

Page 11: Reasoning on Data

EXAMPLESOFOTHERFREQUENTTYPESOFAXIOMS

•  Necessaryand/orsufficientproperSesofconcepts (ex: BacterialDisease)

•  Properties of relations

•  NegaSveconstraints(disjointnessbetweenconcepts,relaSons,...)

Bacteria∩Virus=� �x(Bacteria(x)�Virus(x)à )�x(Bacteria(x)à¬Virus(x))

�x (BacterialDisease(x) → �y (Bacteria(y) � hasCausativeAgent(x,y))

A bacterial disease is caused by a bacteria

inverse relations: �x�y (hasPart(x,y) ⟷ isPartOf(y,x))

symmetry, transitivity, ...

functional relation: �x�y�z (isPartOf(x,y) � isPartOf(x,z) → y = z)

M.-L.Mugnier–UNILOGSchool– 2018 11

Page 12: Reasoning on Data

WHATKINDSOFLANGUAGESTOEXPRESSONTOLOGIES?

HierarchiesofclassesHierarchiesofbinaryrelaSons(called«properSes»)SignaturesoftheserelaSons(«domain»and«range»)

àOWLDLfragmentofRDFSchema(SemanScWeb)

DescripFonLogicsRule-basedlanguages Datalog,existenSalrules,

RDFdeducSverules,AnswerSetProgramming...

Very light languages

More expressive fragments of first-order logics

From a logical viewpoint: anontologyiscomposedofafinitesetofpredicates(toexpressconceptsandrelaSons)afinitesetof(closed)formulasoverthesepredicates oftheform�X(condiFon[X]àconclusion[X])

M.-L.Mugnier–UNILOGSchool– 2018 12

Page 13: Reasoning on Data

WHATAREONTOLOGIESGOODFOR?

•  provideacommonvocabulary

àitiseasiertoshareinformaSon(typicallybetweenexpertsofseveraldomains)

•  constrainthemeaningofterms

àforcestoexplicitnot-saidthingsandtoremoveambiguiSeshencelessmisunderstandings

•  todoautomatedreasoning,basisofhigh-levelservices

à findimplicitlinksbetweenpiecesofknowledgeà checktheconsistencyoftheKB,finderrorsinmodelingà enrichdataqueryanswering

M.-L.Mugnier–UNILOGSchool– 2018 13

Page 14: Reasoning on Data

«findallpaSentsaffectedbyalungdiseaseduetoabacteria»

Data

Query (SQL, SPARQL, MongoDB ...)

ONTOLOGY-MEDIATEDQUERYANSWERING(EX:MEDICALRECORDS)

Database(relaSonal,RDF,NoSQL,...)

??

PaSentP:Diagnosis=«legionella»

M.-L.Mugnier–UNILOGSchool– 2018 14

Page 15: Reasoning on Data

DonnéesData

Query

Knowledge Base

AlegionellaisbacterialpneumoniaAbacterialpneumoniaisapneumoniaApneumoniaisalungdiseaseAbacterialpneumoniaiscausedbyabacteriaIfxiscausedbyythenxisduetoyIfthediagnosisofapaSentxcontainsadiseaseythenxisaffectedbyy

ONTOLOGY-MEDIATEDQUERYANSWERING

«findallpaSentsaffectedbyalungdiseaseduetoabacteria»

PaSentP:Diagnosis=«legionella»

Ontology

M.-L.Mugnier–UNILOGSchool– 2018 15

Page 16: Reasoning on Data

Knowledgebase

Query

Ontology

Addinganontologicallayerontopofdata

1-Enrichthevocabularyallowingtoabstractfromaspecificdatastorage

2-Infernewfacts,notexplicitelystored,allowingforincompletedatarepresentaSon

Data

ONTOLOGY-MEDIATEDQUERYANSWERING(OMQA)

M.-L.Mugnier–UNILOGSchool– 2018 16

Page 17: Reasoning on Data

Query

Ontology

Data

3–provideaunifiedviewofmulSplesources

Data Data

ONTOLOGY-MEDIATEDQUERYANSWERING(OMQA)

M.-L.Mugnier–UNILOGSchool– 2018 17

Page 18: Reasoning on Data

OMQAEXAMPLE:ONTOLOGICALKNOWLEDGE

AlegionellaisbacterialpneumoniaAbacterialpneumoniaisapneumoniaApneumoniaisalungdiseaseAbacterialpneumoniaiscausedbyabacteriaIfxiscausedbyythenxisduetoyIfthediagnosisofapaSentxcontainsadiseaseythenxisaffectedbyy

�x(Legionella(x)→BacterialPneumonia(x))

�x(BacterialPneumonia(x)→�y(hasCausaSveAgent(x,y)�Bacteria(y)))

�x�y(hasCausaSveAgent(x,y)→dueTo(x,y))

�x�y((Diagnosis(x,y)�Disease(y))→isAffectedBy(x,y))

M.-L.Mugnier–UNILOGSchool– 2018 18

Page 19: Reasoning on Data

FACTBASE

RelaSonalschema: finitesetRofrelaSonsàpredicates infinitedomainofvaluesàconstants

Factbase:usuallyasetofgroundatoms(ontheontologicalvocabulary)seenastheconjuncSonoftheseatoms

F={PaSent(P),Diagnosis(P,M),Legionella(M)}

ArelaFonaldatabasemaynaturallybeviewedasafactbase

InstanceofarelaSonr: finitesetoftuplesonràatomsonr

rakr1 akr2a1a2a1

a2a3a1

«Thediagnosisforthepa4entPislegionella»

Databaseinstance={instanceforeachrinR} àfactbase

{r(a1,a2),r(a2,a3),r(a1,a1)}

M.-L.Mugnier–UNILOGSchool– 2018 19

Page 20: Reasoning on Data

CONJUNCTIVEQUERIES(CQ)

« find all patients affected by a lung disease due to a bacteria»q(x) = ∃y ∃z (Patient(x) � isAffBy(x,y) � LungDisease(y) � dueTo(y,z) � Bacteria(z)) A CQ is an existentially quantified conjunction of atoms The free variables are the answer variables If closed formula: Boolean CQ DatalognotaSonans(x)ßPaSent(x),isAffBy(x,y),LungDisease(y),dueTo(y,z),Bacteria(z)Select-Join-ProjectqueriesinrelaSonalalgebra(SQL)SELECT...FROM…WHERE<joincondi4ons> SPARQL(semanScwebqueries)SELECT…WHERE<basicgraphpa=ern>

M.-L.Mugnier–UNILOGSchool– 2018 20

Page 21: Reasoning on Data

ANSWERSTOACONJUNCTIVEQUERY

¢  TheanswertoaBooleanCQqinFisyesifF�qyes=()

¢  LettheCQq(x1,...,xk).Atuple(a1,…,ak)ofconstantsisananswertoqwithrespecttoafactbaseFif F�q[a1,...,ak],whereq[a1,...,ak]isobtainedfromq(x1,...,xk)byreplacingeachxibyai

¢  LetFandqbeseenassetsofatoms.AhomomorphismhfromqtoFisa

mappingfromvariables(q)toterms(F)suchthath(q)F

F�q()iffqcanbemappedbyhomomorphismtoF(a1,…,ak)isananswertoq(x1,...,xk)w.r.t.Fiff

thereisahomomorphismfromqtoFthatmapseachxitoai

M.-L.Mugnier–UNILOGSchool– 2018 21

Page 22: Reasoning on Data

KEYNOTION:HOMOMORPHISM

q(x)=∃y(movie(y)∧play(x,y))

HomomorphismhfromqtoF:subsStuSonofvar(q)byterms(F)suchthath(q)F

F

h1:xàayàm1

h2:xàayàm2

h1(q)=movie(m1)∧play(a,m1)

h2(q)=movie(m2)∧play(a,m2)

movie(m1)movie(m2)movie(m3)actor(a)actor(b)actor(c)play(a,m1)play(a,m2)play(c,m3)

Answers:obtainedbyrestricSngthedomainsofhomomorphisms toanswervariables

x = a x = c

h3:xàcyàm3 h3(q)=movie(x0)∧play(c,m3)

movie(y)play(x,y)

M.-L.Mugnier–UNILOGSchool– 2018 22

Page 23: Reasoning on Data

« find all patients affected by a l u n g d i s e a s e d u e t o a bacteria »

q(x) = ∃y∃z(PaSent(x)�isAffectedBy(x,y)�LungDisease(y)�dueTo(y,z)�Bacteria(z))

Factbase = { Patient(P), Diagnosis(P,M), Legionella(M) } « The diagnosis for the patient P is legionella »

Legionellaspecialisa4onofLungDiseaseandBacterialDisease(andDisease)henceLungDisease(M) henceBacterialDisease(M),

Disease(M)

�x(BacterianDisease(x)→�y(hasCausaSveAgent(x,y)�Bacteria(y)))hencehasCausaSveAgent(M,b)andBacteria(b)

�x�y(hasCausaSveAgent(x,y)→dueTo(x,y))hencedueTo(M,b)

�x�y((Diagnosis(x,y)�Disease(y))→isAffectedBy(x,y))henceisAffectedBy(P,M) Answer : x = P

ONTHEOMQAEXAMPLE

M.-L.Mugnier–UNILOGSchool– 2018 23

Page 24: Reasoning on Data

AMOREGENERALSCHEMA

Query

Ontology

Data Data Data

Mappingsfromdatatofacts

{Databasequery⤳Facts}

Factbase

Conceptuallevel

DescripSonoftheapplicaSondomainwithahighabstracSonlevel

Queryusingthevocabularyoftheontology

Factbase(possiblyvirtual)usingthevocabularyoftheontology

Independentandheterogeneousdatasources

Theanswerstothequeryareinferredfromtheknowledgebase

«Ontology-BasedDataAccess»[Poggietal.,JoDS,2008]

M.-L.Mugnier–UNILOGSchool– 2018 24

Page 25: Reasoning on Data

MAPPINGS

q(x):�n�sPaSent_T(x,n,s) ⤳PaSent(x)q’(x):�n�sPaSent_T(x,n,s)�DiagnosSc_T(x,y)�y=«Legionella»

⤳�z(diagnosis(x,z)�legionella(z))

Patient_T [ID_PATIENT, NAME,SSN] Diagnosis_T[ID_PATIENT, DISORDER]

PaSent/1Diagnosis/2Legionella/1

Mapping:databasequery(X)⤳conjuncFonwithfreevariablesX

PaSent(P)Diagnosis(P,M)Legionella(M)

PaSent_T Diagnosis_T

......

id ssn disidname

⤳ P....

..

..

..

..

..

..

«Leg.»....

P....

M.-L.Mugnier–UNILOGSchool– 2018 25

Page 26: Reasoning on Data

Knowledgebase

Query

Ontology

Factbase

ONTOLOGY-MEDIATEDQUERYANSWERING(OMQA)

(Boolean)conjuncSvequeryq

TheoryOinasuitableFOLfragment

Setofgroundatoms(orexistenSallyclosedformula)F

Fundamental decision problem

O, F � q ?

M.-L.Mugnier–UNILOGSchool– 2018 26

Page 27: Reasoning on Data

OVERVIEWOFTHELECTURE

Part1:Basics

Part2:KRformalismsandalgorithmicapproaches

Outline of description logics – Horn DLs Existential Rules Materialization approach (forward chaining) Query rewriting approach (related to backward chaining)

Part3:DecidabilityissuesintheexistenFalruleframework

M.-L.Mugnier–UNILOGSchool– 2018 27

Page 28: Reasoning on Data

DESCRIPTIONLOGICS

¢  AfamilyofKRlanguagesforrepresenSngandreasoningwithontologies

¢  MostlycorrespondtodecidablefragmentsofFOL

(relatedtomodalproposiSonallogic,theguardedfragmentofFOL,...)¢  Variable-freesyntax

¢  Usedtobecalled«conceptlanguages»:fromconceptandrolenames(unaryandbinarypredicates)andasetofconstructorsdefinecomplexconcepts(morerecently:complexroles)

¢  Anontologyisasetofaxiomsthatstateinclusionsbetweenconcepts (andbetweenroles)

M.-L.Mugnier–UNILOGSchool– 2018 28

Page 29: Reasoning on Data

DESCRIPTIONLOGICS:BUILDINGBLOCKS(SYNTAX)Vocabulary

Atomicconcepts: Human,Parent,Student…(unarypredicates)Atomicroles: parentOf,siblingOf,… (binarypredicates)

Complexconceptsandrolescanbebuiltusingasetofconstructors(whichdependsoneachparFcularDL)

conjuncSon(П),disjuncSon(�),negaSon(¬)HumanП¬Parent Female�Male

restrictedformsofexistenSalanduniversalquanSficaSon(�,�)

∃parentOf.(FemaleПStudent) �parentOf.Male inverseofarole(-),composiSonofroles(o) ∃parentOf- parentOfoparentOf

M.-L.Mugnier–UNILOGSchool– 2018 29

Page 30: Reasoning on Data

DESCRIPTIONLOGICS:BUILDINGBLOCKS(SEMANTICS)

ToeachconceptisassignedaFOLsentencewithfreevariablex Human Human(x)

HumanП¬Parent Human(x)�¬Parent(x)∃parentOf.(FemaleПStudent) ∃y(parentOf(x,y)�Female(y)

�Student(y))�parentOf.Female �y(parentOf(x,y)àFemale(y))ToeachroleisassignedaFOLsentencewith2freevariablesxandy parentOfoparentOf ∃z(parentOf(x,z)�parentOf(z,y))

M.-L.Mugnier–UNILOGSchool– 2018 30

Page 31: Reasoning on Data

DESCRIPTIONLOGICS:KNOWLEDGEBASE

KnowledgeBase=TBox(ontology)+ABox(factbase)Tbox:axiomsoftheformC1�C2∀x(fol(C1)àfol(C2))

orr1�r2 ∀x∀y(fol(r1)àfol(r2))

Abox:setofgroundfacts parentOf(A,B),Female(A),…

Human�Male�Female ∀x(Human(x)àMale(x)�Female(x))Adult�¬Child ∀x(Adult(x)∧Child(x)à⊥)Parent��parentOf ∀x(Parent(x)à�yparentOf(x,y))HappyFather��parentOf.Female ∀x(HP(x)à(�y(parentOf(x,y)àFemale(y))Human��parentOf-.Human ∀x(Human(x)à�y(parentOf(y,x)∧Human(y)))parentOfoparentOf�ancestorOf ∀x∀y(�z(parentOf(x,z)∧parentOf(z,y))à

ancestorOf(x,y)

M.-L.Mugnier–UNILOGSchool– 2018 31

Page 32: Reasoning on Data

DESCRIPTIONLOGICS:STANDARDREASONINGTASKSStandardreasoningtasksonaKB(T,A)

¢  ConceptsubsumpSon T�C�D?¢  ConceptsaSsfiability isCsaSsfiablew.r.t.T ?¢  KBsaSsfiability is(T,A)saSsfiable?

¢  Instancechecking (T,A)�C(b),wherebisaconstant?

ConceptsubsumpSon T�C�Diff(T, {C(a),¬D(a)})unsaSsfiableConceptsaSsfiability CsaSsfiablew.r.t.T iff(T, {C(a)})saSsfiable

Instancechecking (T,A)�C(b)iff(T,A�{¬C(b)})unsaSsfiable

AllthesetaskscanbeexpressedintermsofKB(un)saSsfiabilityprovidedthattheconstructorsintheconsideredDLallowforit

Queryansweringbeyondinstancechecking?cannotbereducedtothestandardreasoningtasks

M.-L.Mugnier–UNILOGSchool– 2018 32

Page 33: Reasoning on Data

EVOLUTIONOFDLS

¢  Concepts:

¢  TBoxaxioms:onlyconceptinclusions

SaSsfiabilityandinstancecheckinginALCare:EXPTIME-completeincombinedcomplexitycoNP-completeindatacomplexity

Evenworseifweaddinverseroles:2EXPTIME-completeincombinedcomplexity

StandardexpressiveDL ALC

M.-L.Mugnier–UNILOGSchool– 2018 33

Page 34: Reasoning on Data

TWOCOMPLEXITYMEASURESFORQUERYANSWERINGPROBLEMS

Combined complexity (usual complexity measure) The input is O, F and q

Data complexity

The input is F (O and q supposed to be fixed)

This distinction comes from database theory: the size of the query is negligible compared to the size of the data

Problem: Given a KB = (O, F), with O the ontology and F the factbase, and a query q, is q entailed by the KB?

E.g.,qBooleanCQ,FfactbaseDoesF�q?

NP-complete(combined)PTime(data)

M.-L.Mugnier–UNILOGSchool– 2018 34

Page 35: Reasoning on Data

EVOLUTIONOFDLS

¢  Concepts:

¢  TBoxaxioms:onlyconceptinclusions

SaSsfiabilityandinstancecheckinginALCare:EXPTIME-completeincombinedcomplexitycoNP-completeindatacomplexity

Evenworseifweaddinverseroles:2EXPTIME-completeincombinedcomplexity

TwofactorsledtotheevoluFonofdescripFonlogics:1.  pracScaluse(e.g.SNOMEDCT):peoplemostlyuseconjuncSonandexistenSal

quanSficaSon2.  complexitytoohighforqueryansweringproblems

StandardexpressiveDL ALC

M.-L.Mugnier–UNILOGSchool– 2018 35

Page 36: Reasoning on Data

NEWDLSWITHLOWERCOMPLEXITY

DL-LiteR EL

Commonfeature:nodisjuncSon(no«true»negaSon) ThenasaSsfiableKBhasauniquecanonicalmodelM:

ForanyBooleanCQq,KB�qiffMisamodelofqReasoningtechniquesfortheselighterDLsareverysimilartoforwardorbackwardchaininginrule-basesystems

where

where LargeTBoxesClassificaSon

LargeABoxesQueryanswering

M.-L.Mugnier–UNILOGSchool– 2018 36

Page 37: Reasoning on Data

COMPLEXITYINTRODUCEDBYDISJUNCTIONORNEGATION

q(): �x�y (Blue(x) � on(x,y) � Other(y))

CBA

T: T � Blue � Other A: Blue(A), Other(C), on(A,B), on(B,C)

To answer q, we have to consider two cases: in each model of the KB, either Blue(B) or Other(B) holds

Similarly if we replace T by: ¬Blue � Other (equivalentaxiom)

KB (T,A)

Note that Other � ¬Blue is harmless: it is just a disjointness constraint

M.-L.Mugnier–UNILOGSchool– 2018 37

Page 38: Reasoning on Data

INSUMMARY

DL ontology (TBox) has axioms of the form

∀x (fol(C1) à fol(C2))

∀x∀y (fol(r1) à fol(r2)) wherefol(r)isapathofatomicrolesortheirinverses

With the new DLs: left and right parts of the implication are both existentially quantified conjunctions of atoms

called « Horn description logics »

DLs essentially satisfy the tree model property:

if a KB is satisfiable then it has a « tree-shaped » model

M.-L.Mugnier–UNILOGSchool– 2018 38

Page 39: Reasoning on Data

WHY«HORNDLS»ONANEXAMPLE

ELAxiom

Letusskolemize(uandvresp.replacedbyf1(x)andf2(x)):weobtainasetof3Hornclauses(withskolemterms)

HencethenameHorndescripFonlogics

FOLtransla4on

prenexform

M.-L.Mugnier–UNILOGSchool– 2018 39

Page 40: Reasoning on Data

X,Y,Z:setsofvariables

∀x(actor(x)à∃zplay(x,z))

EXISTENTIALRULES

∀X∀Y(Body[X,Y]à∃ZHead[X,Z])

any positive conjunction (without functional symbols except constants)

Key point: ability to assert the existence of unknown entities

Crucial for representing ontological knowledge in open domains

See « value invention » in databases

∀x ∀y ( siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)) )

we often simplify by omitting universal quantifiers

M.-L.Mugnier–UNILOGSchool– 2018 40

Page 41: Reasoning on Data

DATA/FACTS

m1m2?x

RelaSonaldatabase RDF

rdf:type

am1am2c?x

ex:actor

AbstracFoninfirst-orderlogic(FOL)

∃x(movie(m1)∧movie(m2)∧movie(x)actor(a)∧actor(b)∧actor(c)play(a,m1)∧play(a,m2)∧play(c,x))

Etc.

ex:play Movie PlayActor

abc

ex:b

rdf:type

rdf:type

ex:movie

rdf:type ex:a ex:m1

ex:c

rdf:type

ex:m2

ex:play

WegeneralizeheretheclassicalnoSonofafactbyallowingexistenSalvariablesfact/factbase=existenFallyclosedconjuncFonofatoms

rdf:type

......

...

...

m_id a_id m_id a_id

_:x

M.-L.Mugnier–UNILOGSchool– 2018 41

Page 42: Reasoning on Data

LABELLEDHYPERGRAPH/GRAPHREPRESENTATION¢  Afactorasetoffactscanbeseenasasetofatoms

movie(m1),movie(m2),movie(x),actor(a),actor(b),actor(c),play(a,m1),play(a,m2),play(c,x)àhenceahypergraphoritsassociatedbiparFte(mulF-)graph

pvie

12

34

p(x,y,a,x),r(x,y)

•  one(labelled)nodeperterm•  one(labelled)nodeperatom(~hyperedge)•  totallyorderededges

1

2

a

r

x

y

M.-L.Mugnier–UNILOGSchool– 2018 42

Page 43: Reasoning on Data

movievie

m1 m2

a b c

movievie

actorvie

playvie

movievie

1 1 1

1 11

2 2 2

1 1 1

Ifpredicatesareatmostbinary:atomnodescanbereplacedbylabelsanddirectededges

a

m1 m2

b c

play

actor actoractor

movie movie movie

actor

actor

play

play

movie(m1),movie(m2),movie(x),actor(a),actor(b),actor(c),play(a,m1),play(a,m2),play(c,x)

M.-L.Mugnier–UNILOGSchool– 2018 43

Page 44: Reasoning on Data

GRAPHHOMOMORPHISMS(1)

•  LetG1=(V1,E1)toG2=(V2,E2)beclassicalgraphs.HomomorphismhfromG1toG2: mappingfromV1toV2s.t.

foreveryedge(u,v)inE1,(h(u),h(v))isinE2

mapsto mapsto

M.-L.Mugnier–UNILOGSchool– 2018 44

Page 45: Reasoning on Data

a

m1 m2

b c play

actor actoractor

movie movie movie

GRAPHHOMOMORPHISMS(2)

•  LetG1=(V1,E1)toG2=(V2,E2)beclassicalgraphs.HomomorphismhfromG1toG2: mappingfromV1toV2s.t.

foreveryedge(u,v)inE1,(h(u),h(v))isinE2

•  Iftherearelabels:theyhavetobe``kept’’aswell

play

movie

q F

M.-L.Mugnier–UNILOGSchool– 2018 45

Page 46: Reasoning on Data

GRAPHHOMOMORPHISMS(3)

•  LetG1=(V1,E1)toG2=(V2,E2)beclassicalgraphs.HomomorphismhfromG1toG2: mappingfromV1toV2s.t.

foreveryedge(u,v)inE1,(h(u),h(v))isinE2

•  Iftherearelabels:theyhavetobe``kept’’aswell

play 1

2

1 movie

qF

movievie

m1 m2

a b c

movievie

actorvie

playvie

movievie

1 1 1

1 11

2 2 2

1 1 1

actor

actor

play

play

M.-L.Mugnier–UNILOGSchool– 2018 46

Page 47: Reasoning on Data

∀x ∀y ( siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)) )

graph graph

P

P 2

S

1

2

2

1

1

GRAPHVIEWOFEXISTENTIALRULES

∀X∀Y(Body[X,Y]à∃ZHead[X,Z])

s

p

p

Theruleheadhas2kindsofvariables:-fronFer:sharedwiththebody-existenFal(new``blank’’nodes)

x x

y

y

zz

M.-L.Mugnier–UNILOGSchool– 2018 47

Page 48: Reasoning on Data

F = siblingOf(a,b)

R = ∀x ∀y (siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)))

F�=∃z0(siblingOf(a,b)∧parentOf(z0,a)∧parentOf(z0,b))

s

p

p

a

b

s

RisapplicabletoFifthereisahomomorphismhfrombody(R)toF

xàayàb

ApplyingRtoFw.r.t.hproducesF∪h(head(R))whereanewvariableiscreatedforeachexistenSalvariableinR

a

b

s

p

p

GENERATIONOFFRESH(UNKNOWN)INDIVIDUALS

R F

M.-L.Mugnier–UNILOGSchool– 2018 48

Page 49: Reasoning on Data

EXISTENTIALRULEFRAMEWORK(LOGICAL/GRAPHICAL)

q(x)=∃y(movie(y)∧play(x,y))

Data/Facts

« Pure » existential rules

Equality rules

Negative Constraints

Conjunctive Queries

∀x ( actor(x) à ∃ z (movie(z) ∧ play(x,z)) )

∀x ∀y ∀z ( movie(y) ∧ director(x,y) ∧ director(z,y) à x = z )

∀x ( movie(x) ∧ person(x) à ⊥ )

movie(m1) play(a,m1) play(c, x) ...

M.-L.Mugnier–UNILOGSchool– 2018 49

Page 50: Reasoning on Data

MULTIPLETHEORETICALFOUNDATIONS

Datalog(70-80s)

ConceptualGraphs[Sowa1984]

logical translation

+ « value invention »

��-rules,existenFalRules[Baget+IJCAI2009]

Datalog+/- [Cali+PODS2009]

[CheinMugnier1992,2009]

•  Samelogicalformas«Tuple-GeneraSngDependencies»(TGDs)longstudiedinrelaSonaldatabases

LightweightDescripSonLogics,e.g.OWL2tractableprofilesMoregenerally,HornDLs

+ «unrestricted cycles » on variables + unbounded arity

M.-L.Mugnier–UNILOGSchool– 2018 50

Page 51: Reasoning on Data

u  TheFOLtranslaSonofHornDLsyieldsexistenSalrulesu  ExistenSalrulesarestrictlymoreexpressive:

siblingOf(x,y)à∃z(parentOf(z,x)∧parentOf(z,y))cannotbeexpressedinmostDLsbecauseofthe«cycleonvariables»(needsrolecomposi4on: s�pop)

u  Theunboundedpredicatearityallowsformoreflexibility:àdirecttranslaSonofdatabaserelaSonsàaddingcontextualinformaSoniseasy(provenance,trust,etc.)

EXISTENTIALRULESAREMOREEXPRESSIVETHANHORN-DLS

s p

p

x

y

z

Morecomplexinterac4onsbetweenvariablescannotbeexpressedatallinDLs

Unsurprisingly,thisaddedexpressivityhasacost

M.-L.Mugnier–UNILOGSchool– 2018 51

Page 52: Reasoning on Data

¢  Fundamentaldecisionproblem

Input:K=(F,R)knowledgebase qBooleanconjuncSvequeryQuesSon:isqentailedbyK ?¢  Thisproblemisnotdecidable

f.i.[BeeriVardiICALP1981]onTGDsevenwithasinglerule[Baget&al.KR2010]à  find«decidable»classesofrules

withgoodexpressivity/tractabilitytradeoff

EXISTENTIALRULEFRAMEWORK

Data/Facts

« Pure » existential rules

Equality rules

Negative Constraints

Conjunctive Queries

M.-L.Mugnier–UNILOGSchool– 2018 52

Page 53: Reasoning on Data

atomic body

frontier-1

weakly- guarded

weakly frontier-guarded

datalog

guarded

weakly- acyclic

acyclic Graph of Rule Dependencies

wa-GRD jointly- acyclic

frontier- guarded

jointly-fg

sticky-join

w-sticky-join

sticky

w-sticky

1970s

2003 2004

Since 2008

(PARTIAL)MAPOFDECIDABLECLASSES

DL-LiteR EL

M.-L.Mugnier–UNILOGSchool– 2018 53

Page 54: Reasoning on Data

FUNDAMENTALNOTIONSFORREASONINGINFOL(�,∧)

¢  BacktotheposiFveconjuncFveexistenFalfragmentofFOL:FOL(�,∧)

¢  Allowstoexpressfactsand(Boolean)conjuncFvequeries¢  Suchformulascanbeseenassetsofatoms,labelledgraphs,relaSonal

structures,...¢  HomomorphismisafundamentalnoSoninthisfragment:

�  AninterpretaSonI isamodelofasentencefiffthereisahomomorphism

fromftoI

�  OnecandefinehomomorphismsbetweeninterpretaSons.Then:IfI1mapstoI2then,foranyf,I1modeloff�I2modeloff

�  Toaformulaf,weassignitsisomorphicmodelM(f)(akacanonicalmodel)

M.-L.Mugnier–UNILOGSchool– 2018 54

Page 55: Reasoning on Data

MODELISOMORPHICTOAFOL(�,∧)FORMULA

ToaformulafinFOL(�,∧),weassignitsisomorphicmodelM(f) alsocalledcanonicalmodel

f=�x�y�z(p(x,y)∧p(y,z)∧r(x,z,a))

M(f): D={dx,dy,dz,a} pM(f)={(dx,dy),(dy,dz)} rM(f)={(dx,dz,da)}

ThecanonicalmodelM(f)isuniversal:forallM’modeloff,M(f)mapstoM’

foranyfandginFOL(�,∧),g�fiffM(g)isamodeloffifffmapstog

M.-L.Mugnier–UNILOGSchool– 2018 55

Page 56: Reasoning on Data

ADDINGRANGE-RESTRICTED(=DATALOG)RULESTOFACTS

K = (F, R) whereRisasetofrange-restrictedrules(i.e.,var(head)var(body))

Fisafactbase(ruleswithanemptybody):groundatomsByapplyingrulesfromRstarSngfromF,auniqueresultisobtained:thesaturaFonofF(denotedbyF*)F*isfinitesincenonewvariableiscreatedF*isacore(noredundancies)

F

q

R

F*

TheniceproperSesofFOL(�,∧)arekept:

F*isauniversalmodelofK Hence:foranyCQq,K ⊨ q iff q maps to F*

M.-L.Mugnier–UNILOGSchool– 2018 56

Page 57: Reasoning on Data

KNOWLEDGEBASESWITHEXISTENTIALRULES

K = (F, R) whereRisasetofexistenFalrulesFisafactbase(ruleswithanemptybody):existenSalconjuncSonsofatoms

Mainchange:F*canbeinfinite

R=person(x)à�yhasParent(x,y)∧person(y)

∧person(y0)∧hasParent(a,y0)

∧person(y1)∧hasParent(y0,y1) Etc.

F=person(a)

butitremainsauniversalmodelhence K ⊨ q iff q mapstoF*

M.-L.Mugnier–UNILOGSchool– 2018 57

Page 58: Reasoning on Data

F

q

R

« bottom-up » « chase » (TGDs)

APPROACH1TORULES:FORWARDCHAINING/MATERIALISATION

K= (F, R)

K ⊨ q iff q maps by homomorphism to F*

Pros: materialisation offline, then online query answering is fast Cons: volume of the materialisation

needs writing access rights to the data not feasible if data is distributed among several databases not adapted if data change frequently

F*

M.-L.Mugnier–UNILOGSchool– 2018 58

Page 59: Reasoning on Data

EXAMPLE(MATERIALIZATION)

x = a y = m1 x = a y = m2 x = b y = z0 x = c y = x0

movie(m1)movie(m2)movie(x0)movieActor(a)movieActor(b)play(a,m1)play(a,m2)play(c,x0)

∀x(movieActor(x)à∃z(movie(z)∧play(x,z)))

q(x) = ∃ y (movie(y) ∧ play(x, y)) « find those who play in a movie »

SaturaFon

movie(z0) play(b,z0)

M.-L.Mugnier–UNILOGSchool– 2018 59

Page 60: Reasoning on Data

« top-down » decomposition into 2 steps [DL-Lite]

APPROACH2TORULES:BACKWARDCHAINING/QUERYREWRITING

R

Rewriting into a set of CQs, seen as a union of conjunctive queries (UCQ) and more generally into a « first-order » query (core SQL query)

F

q K= (F, R)

Query rewriting is independant from any factbase. For any F, F,R�qiffF�Q(i.e.,ifQisaUCQ:thereisqi�QwithF�qi)

Q

Pros: independent from the data Cons: rewriting done at query time, easily leads to huge and unusual queries

M.-L.Mugnier–UNILOGSchool– 2018 60

Page 61: Reasoning on Data

EXAMPLE

Rewq(x) = ∃ y (movie(y) ∧ play(x, y)) � movieActor(x) Query rewriting

x = a y = m1 x = a y = m2 x = c y = x0

x = a x = b

movie(m1)movie(m2)movie(x0)movieActor(a)movieActor(b)play(a,m1)play(a,m2)play(c,x0)

∀x(movieActor(x)à∃z(movie(z)∧play(x,z)))

q(x) = ∃ y (movie(y) ∧ play(x, y)) « find those who play in a movie »

M.-L.Mugnier–UNILOGSchool– 2018 61

Page 62: Reasoning on Data

BACKWARD CHAINING SCHEME!

UnificaSonbyaunifieru(ofq’andh’)

Body Headq’

QueryrewriSng

Body

Basicstep:Queryq RuleR

Newquery

h’

Direct rewriting of q with R and u = u(q \ q’) � u(body(R))

M.-L.Mugnier–UNILOGSchool– 2018 62

Page 63: Reasoning on Data

BASICPROPERTIES(1)

LetF2beobtainedfromF1bytheapplicaSonofRuleRLetaqueryQ1thatmapstoF2byahomomorphismthatusesatleastoneatom

broughtbyRThenthereisQ2,adirectrewriSngofQ1withR,suchthatQ2mapstoF1

F1 F2application of R

Q1

h1

Q2

h2

direct rewriting with R

and h1usesF2\F1

The reciprocal property holds

M.-L.Mugnier–UNILOGSchool– 2018 63

Page 64: Reasoning on Data

BASICPROPERTIES(2)

F1 F2application of R

Q1

h1

Q2

h2

direct rewriting with R

LetQ2beadirectrewriSngofQ1withRuleRLetF1beafactbasesuchthatQ2mapstoF2ThenthereisanapplicaSonofRtoF1thatproducesF2suchthatQ2mapstoF1

M.-L.Mugnier–UNILOGSchool– 2018 64

Page 65: Reasoning on Data

ForanyconjuncSvequeryq,foranyfactbaseF,foranysetofrules:thereisahomomorphismfromqtoF’,whereF’isobtainedfromFbyaruleapplicaSonsequenceoflength≤niffthereisahomomorphismfromq’toF,whereq’isobtainedfromqbyarewriSngsequenceoflength≤n

EQUIVALENCEDERIVATION/REWRITINGSEQUENCES

F1 F2

Q1

h1

Q2

h2

F1 F2

Q1

h1

Q2

h2

s.t. h1 usesF2\F1

M.-L.Mugnier–UNILOGSchool– 2018 65

Page 66: Reasoning on Data

TAKINGINTOACCOUNTEXISTENTIALVARIABLESINRULEHEADS(1)

¢  WewantacompletesetofsoundrewriSngs(setofCQs):qis.t.foranyF,ifF⊨qithenF,R⊨q

R=person(x)à�yhasParent(x,y)q=hasParent(v,w),denSst(w)u={x�v,y�w}rew(q,R,u)=qi=person(v),denSst(w)

qiisunsound:F=person(Maria),denSst(Giorgos)F⊨qihowever(F,{R})doesnotentailq

(1)IfwinqisunifiedwithanexistenSalvariableofR,thenallatomsinwhichwoccurmustbepartoftheunificaSon

M.-L.Mugnier–UNILOGSchool– 2018 66

Page 67: Reasoning on Data

TAKINGINTOACCOUNTEXISTENTIALVARIABLESINRULEHEADS(2)

R=p(x)à�z1�z2r(x,z1),r(x,z2),s(z1,z2)q=r(v,w),s(w,w)u={x�v,z1�w,z2�w}rew(q,R,u)=qi=p(v)(2)AnexistenSalvariableofRcannotbeunifiedwithanotherterminhead(R)

qiisunsound:F=p(a)F⊨ qi however(F,{R})doesnotentailq

M.-L.Mugnier–UNILOGSchool– 2018 67

Page 68: Reasoning on Data

PIECE-UNIFIER(FORBOOLEANCQS)

Apiece-unifieruofq’�qandh’head(R)isasubsStuSonofvar(q’+h’)byterms(q’+h’)[ifxisunchanged,wewriteu(x)=x]suchthat:¢  u(q’)=u(h’)

¢  existenSalvariablesofh’areunifiedonlywithvariablesofq’thatdonotoccur in(q\q’)(i.e.,ifxisexistenSalandu(x)=u(t),thentisavariableofq’andnotof(q\q’))

Queryq RuleR

q’ body head

h’

variablessharedbyq’and(q\q’)

ToextendthenoSontogeneralCQs:universalvariablescannotbeunifiedwithanswervariables

M.-L.Mugnier–UNILOGSchool– 2018 68

Page 69: Reasoning on Data

EXAMPLE

R=twin(x,y)à�zmotherOf(z,x)∧motherOf(z,y)

q=motherOf(v,w)∧motherOf(v,t)∧Female(w)∧Male(t)?

R=twin(x,y)à�zmotherOf(z,x)∧motherOf(z,y)

q=motherOf(v,w)∧motherOf(v,t)∧Female(w)∧Male(t)?

piece-unifieru1 = {z � v,x� w, y � w}

rewrite(q,R,u1)=motherOf(v,t) ∧Female(w)∧Male(w) ∧ twin(w,w)

R=twin(x,y)à�zmotherOf(z,x)∧motherOf(z,y)

q=motherOf(v,w)∧motherOf(v,t)∧Female(w)∧Male(t)?

piece-unifieru2 = {z � v,x � w,y � t}

rewrite(q,R,u2)=twin(w,t)∧Female(w)∧Male(t)

Ifwerewriteagainthisquerywecouldremovethefirstatom

M.-L.Mugnier–UNILOGSchool– 2018 69

Page 70: Reasoning on Data

WHATIFWESKOLEMIZEDRULES?

qiisunsound:F=person(Maria),denSst(Giorgos)F⊨qihowever(F,{R})doesnotentailq

R=person(x)à�yhasParent(x,y)q=hasParent(v,w),denSst(w)u = { x � v, y � w } rew(q,R,u)=qi=person(v),denSst(w)

Skolem(R)=person(x)àhasParent(x,f(x))

ClassicalmostgeneralunifierofhasParent(x,f(x))andhasParent(v,w):v�xandw�f(x)rew(q,R,u)=denSst(f(x))�person(x) whichcannotbeunifiedwitharulehead

(wouldnotbekeptintheouputsinceitcontainsaskolemfuncSon

Wecouldskolemizetherulesandrelyonusualm.g.u.thenkeeponlyrewriSngswithoutskolemfuncSon

butthiswouldcreateuselessintermediaterewriSngs

M.-L.Mugnier–UNILOGSchool– 2018 70

Page 71: Reasoning on Data

WHY«PIECES»?

Apieceisaunitofknowledgebroughtbyarule:¢  FronFervariables(andconstants)actascutpointstodecomposeruleheads

intopieces(«minimalnon-emptysubsetsgluedbyexistenSalvariables»)

R=b(x)à�y�zp(x,y)∧p(y,z)∧p(z,x)∧q(x,x) x y

z

¢  Arulewithkpiecescanbedecomposedintokrules,oneforeachpiece,whilekeepingthesamebody

¢  Itcannotbefurtherdecomposed(exceptbyintroducingnewpredicates)

b(x)à�y�z p(x,y)∧p(y,z)∧p(z,x)

b(x)àq(x,x)

M.-L.Mugnier–UNILOGSchool– 2018 71

Page 72: Reasoning on Data

DECOMPOSITIONOFRULESINTOATOMICHEADRULES(1)

R:b(x)à�y�z p(x,y)∧p(y,z)∧p(z,x) rule with single-piece head

Decomposition into rules with atomic head by introducing a fresh predicate

R0:b(x)à�y�z pR(x,y,z)R1:pR(x,y,z)àp(x,y)R2:pR(x,y,z)àp(y,z)R3:pR(x,y,z)àp(z,x)

We lose the structure of the head •  much less efficient query rewriting

•  may even lead to lose the property of having a finite universal model

(if the set of rules has this property)

M.-L.Mugnier–UNILOGSchool– 2018 72

Page 73: Reasoning on Data

DECOMPOSITIONOFRULESINTOATOMICHEADRULES(2)

R:p(x,y)→�zp(y,z),p(z,y)

F:p(a,b)

a b z0 ...F1

F2

z1

z2

F2≡ F1 (F2 mapstoF1 ) henceF*≡ F1

Finiteuniversalmodel

AÅerdecomposiSonintoatomicheadrules:

R0:p(x,y)→�zpR(y,z)R1:pR(y,z)àp(y,z)R2:pR(y,z)àp(z,y)

a b z0 ...F1

F2

z1

z2

F2F1

Nofiniteuniversalmodel

M.-L.Mugnier–UNILOGSchool– 2018 73

Page 74: Reasoning on Data

OVERVIEWOFTHELECTURE

Part1:Basics

Part2:KRformalismsandalgorithmicapproaches

Part3:DecidabilityissuesintheexistenFalruleframework

UndecidabilityofthefundamentalproblemGenericproperSesthatensuredecidability

Main«concrete»decidableclassesofexistenSalrules

M.-L.Mugnier–UNILOGSchool– 2018 74

Page 75: Reasoning on Data

However,here:queryrewriSngwithRisfiniteforanyq

SATURATIONMAYNOTHALT

R=person(x)àhasParent(x,y)∧person(y)

∧person(y0)∧hasParent(a,y0)

∧person(y1)∧hasParent(y0,y1)

F=person(a)

NoredundanciesareaddedTheKBhasnofiniteuniversalmodel

M.-L.Mugnier–UNILOGSchool– 2018 75

Page 76: Reasoning on Data

R=friend(u,v)∧friend(v,w)àfriend(u,w)

q=friend(Giorgos,Maria)

q1=friend(Giorgos,v0)∧friend(v0,Maria)

q2=friend(Giorgos,v1)∧friend(v1,v0)∧friend(v0,Maria)

q2’=friend(Giorgos,v0)∧friend(v0,v1)∧friend(v1,Maria)

q2andq2’areequivalent

Etc.q3=friend(Giorgos,v2)∧friend(v2,v1)∧friend(v1,v0)∧friend(v1,Maria)

QUERYREWRITINGMAYNOTHALT

However,here:saturaSonwithRisfiniteforanyF

There are cases where both processes do not halt (even if the factbase is known)

Thereisaninfinitenumberofnon-redundantrewriSngs

M.-L.Mugnier–UNILOGSchool– 2018 76

Page 77: Reasoning on Data

UNDECIDABILITYOFTHEFUNDAMENTALPROBLEM

FundamentaldecisionproblemInput:K=(F,R)knowledgebase,qBooleanconjuncSvequeryQuesSon:isqentailedbyK?

This problem is undecidable (only semi-decidable)

E.g. proof by reduction from the word problem in a semi-Thue system

There is a one-step derivation from a word w to w’ if there is a rule wi à wj in G, and w = w1wiw2, w' = w1wjw2 w’ is derived from w if

there is a (finite) sequence of one-step derivations from w to w’

Input: a set G of rules of the form wi à wj, 2 words w0 and wf

Question: is it possible to derive (exactly) wf from w0 using the rules in G?

M.-L.Mugnier–UNILOGSchool– 2018 77

Page 78: Reasoning on Data

REDUCTIONFROMTHEWORDPROBLEMFromG,w0andwfwebuildaKB(F, R)andaBooleanCQq

Vocabulary constants:thelekersoccuringinG,w0andwf +twospecialconstantsBandE binarypredicates:succandval

FactbaseF=T(w0,B,E)

SetofrulesRisobtainedbytranslaSngeachrulewiàwjintotheexistenSalrule �x �y (T(wi,x,y)à T(wj,x,y))

Toawordw=a1...anweassignthefollowinggraphT(w,x,y) wheretheziareexistenSalvariablesandx,yarefree

x y

Queryq=T(wf,B,E)

Key:anywordwderivablefromw0withGcorrespondstoapathT(w, B, E) inthesaturaSonofFbyR,andreciprocally

M.-L.Mugnier–UNILOGSchool– 2018 78

Page 79: Reasoning on Data

finite saturation

bounded treewidth saturation

finite UCQ rewriting

atomic body (linear)

frontier-1

weakly- guarded

weakly frontier-guarded

datalog

guarded

weakly- acyclic

acyclic GRD

wa-GRD jointly- acyclic

frontier- guarded

jointly-fg

sticky-join

w-sticky-j

sticky

weakly-sticky

(PARTIAL)MAPOFDECIDABLECASES

DL-Lite

EL

M.-L.Mugnier–UNILOGSchool– 2018 79

Page 80: Reasoning on Data

GENERICPROPERTIESTHATENSUREDECIDABILITY

ThreegenerickindsofproperSesensuringdecidability:-  SaturaSonbyForwardChaininghaltsforanyfactbase

(«finiteexpansionset»,fes)

-  QueryrewriSnghaltsforanyconjuncSvequery(«finiteunificaSonset»,fus,orUCQ-rewritability)

-  SaturaSonbyForwardChainingmaynothaltbutforanyfactbasethegeneratedfactshaveatree-likestructure(«boundedtreewidthset»,bts)

NoneoftheseproperSesisrecognizable[Baget+KR10]buttheseproperSesprovidegenericalgorithmicschemes

M.-L.Mugnier–UNILOGSchool– 2018 80

Page 81: Reasoning on Data

No existential variables

MainClasseswithFiniteSaturaFon(fes)

Datalog

Acyclic position dependency graph

Weak-acyclicity

Acyclic Graph of Rule Dependencies Acyclic

GRD

Joint-acyclicity

Acyclic existential dependency graph

GRD with fes strongly connected components

fes-GRD

[BagetKR�04]

[BagetKR�04]

[Deutsch+ICDT�03][Fagin+ICDT03]

[Krötzsch+IJCAI�11]

Position dependency graph: nodes are positions in predicates edges show how existential variables are propagated

Graph of rule dependencies: nodes are rules edges express that a rule may lead to trigger a rule

M.-L. Mugnier – UNILOG School – 2018 81

Page 82: Reasoning on Data

WEAK-ACYCLICITY

R1:p(x)→�yr(x,y)�q(y)R2:r(x,y)→p(x)

PosiFondependencygraphnodes:posiSons(p,i)inpredicatesedges:foreachfronServariablexinposiSon(p,i)inarulebody -anedgefrom(p,i)toeachposiSon(q,j)ofxintherulehead

-aspecialedgefrom(p,i)toeachposiSonofanexistenSalintheruleheadRisweakly-acyclicifitsposiSongraphcontainsnocircuitwithaspecialedge(*)

weaklyacyclic

notweaklyacyclicspecialedge(p,1)à(r,1)duetoR1edge(r,1)à(p,1)duetoR2

R1:p(x)→�y�zr(x,y)�r(y,z)�r(z,x)R2:r(x,y)�r(y,x)→p(x)

M.-L.Mugnier–UNILOGSchool– 2018 82

Page 83: Reasoning on Data

ACYCLICGRAPHOFRULEDEPENDENCYGraphofRuleDependenciesnodes:therulesedges:anedgefromRitoRjifanapplicaSonofRimayleadtotriggeranew

applicaSonofRj(«RjdependsonRi»)DependencycanbeeffecSvelycomputedbycheckingifthereisapiece-unifierof

body(Rj)andhead(Ri)

Theseexamplesshowthatweak-acyclicityandacyclicGRDareincomparablecriteriaCommongeneralizaSonsofthesetwonoSonshavebeendefined

CyclicGRDsinceR1andR2dependoneachother

R1:p(x)→�yr(x,y)�q(y)R2:r(x,y)→p(x)

R1:p(x)→�y�zr(x,y)�r(y,z)�r(z,x)R2:r(x,y)�r(y,x)→p(x)

M.-L.Mugnier–UNILOGSchool– 2018 83

Page 84: Reasoning on Data

E.g. inclusion dependencies, necessary properties of concepts / relations

MainClasseswithFiniteQueryRewriFng(fus)

Atomic- body Sticky Domain-

restricted

Sticky-join Each head atom contains

all or none of the body variables

E.g. concept product Elephant(x) ∧ Mouse(y) à bigger-than(x,y)

[Cali+PVLDB2010]

[Baget+IJCAI�09][Baget+IJCAI�09]

[Cali+RR�10]

= linear Datalog+ [Cali+PODS2009]

Body restricted to a single atom

Restricts multiple occurrences of body variables that do not occur in all head atoms

Each head atom contains all the body variables

E.g. Human(x) àparentOf(y,x) ∧Human(y) is atomic-body, sticky and domain-restricted

M.-L. Mugnier – UNILOG School – 2018 84

Page 85: Reasoning on Data

WidthofatreedecomposiSon=maxnumberofnodesinabag(minus1)Treewidthofagraph=minwidthoveralldecomposiSontreesofthisgraph

r

a

p(a,b)q(b,z0)r(a,b,t0)p(b,t0)q(t0,z1)r(b,t0,t1)p(t0,t1)

DecomposiFontree1)eachnode(term)appearsinabag2)eachhyperedge(atom)hasallitsnodesinabag3)foreachnodex,thesubgraphinducedbythebagscontainingxisconnected

b

r

t0

z0 ab

node

hyperedge

DecomposiFonTree/Treewidth

p p p

q q

z1

t1bz0 abt0

t0z1 bt0t1

p(a,b)

q(b,z0)

r(a,b,t0) p(b,t0)

r(b,t0,t1) p(t0,t1)

q(t0,z1)

M.-L. Mugnier – UNILOG School – 2018 85

Page 86: Reasoning on Data

The decidability proof does not provide a halting algorithm (relies on the bounded treewidth model property [Courcelle90])

R is bts if the forward chaining with R generates facts with bounded treewidth: i.e., for any factbase F, there is an integer b s.t. any factbase R -derived from F has treewidth bounded by b

F

BoundedTreewidthoftheDerivedFacts(bts)

EssenSally[CaliGoklobKiferKR’08]

fes (finite saturation) is included in bts (bound given by the number of terms in the finite saturation)

M.-L. Mugnier – UNILOG School – 2018 86

Page 87: Reasoning on Data

An atom in the body guards all the body variables

Guard only the frontier

Guard only affected variables (i.e.possibly mapped to new existentials)

Guard only affected variables from the frontier

Frontier: variables shared by the body and the head

SomeRecognizablebts(andnotfes)ClassesofRules

guarded [Cali+KR’08]

weakly guarded

[Cali+KR�08]The frontier has size 1

[Baget+KR�10]

frontier guarded

[Baget+KR�10]

weakly frontier guarded

[Baget+IJCAI�09]

frontier 1

r(x,y) ∧ r(y,z) ∧ s(x,y,z) à �u r(y,u) ∧ r(z,u) r(x,y) ∧ r(y,z) ∧ r(x,z) à �u r(z,u)

r(x,y) ∧ r(y,z) à r(y,u) ∧ r(z,u)

datalog

These classes are moreover « greedy bts » => a halting algorithm [Baget+IJCAI�11]M.-L. Mugnier – UNILOG School – 2018 87

Page 88: Reasoning on Data

Greedybts

R1 = p(x,y) à ∃z p(y,z) R2 = p(x,y) ∧ q(x,z) à ∃t r(x,y,t) ∧ p(y,t)

F = p(a,b) ab

bz0 abt0

t0z1 bt0t1

p(a,b)

q(b,z0)r(a,b,t0)p(b,t0)

r(b,t0,t1)p(t0,t1)q(t0,z1)

R1 R2

R1 R2

Greedy construction of a decomposition tree of derived facts

with bounded width

Etc.

M.-L. Mugnier – UNILOG School – 2018 88

Page 89: Reasoning on Data

The«Greedybts»Property[Baget+ IJCAI�11]

T0 T0 = terms(F) + {constants} F

T0 ∪ var(h(H))

B

H

h

Derived facts Decomposition tree

All bags contain T0 F

h(H)

Foranyfactbase,foreachruleapplicaSon,fronServariablesnotbeingmappedtoiniSaltermsarejointlymappedtovariablesoccurringinatomsaddedbyasinglepreviousruleapplicaSon

M.-L. Mugnier – UNILOG School – 2018 89

Page 90: Reasoning on Data

MainIdeasoftheAlgorithmforgbts(1)

1.  Bagpakern={homomorphismsfrompartofarulebodyto«currentfact»

thatusesometermsofthebag}

•  Aruleisapplicabletothecurrentfactbaseiffabagpakerncontainsitsbody•  FCcanbeperformedonthedecoratedtree

2.  EquivalencerelaSononbags

OnlyonebagperequivalenceclassisdevelopedTheothernodesareblocked

Boundednumberofequivalenceclassesàfinite«fullblockedtree»T*

BuildafinitedecomposiSontreethatencodesapotenSallyinfinitefact

M.-L. Mugnier – UNILOG School – 2018 90

Page 91: Reasoning on Data

MainIdeasoftheAlgorithmforgbts(2)

[Baget+IJCAI2011]qaddedasarule«qàmatch»

qisentailediffmatchoccursinabagpa=erni.e.,qmapsbyhomomorphismtoatoms(T*)

[Thomazo+KR2012]offline/onlineseparaSon

(1)compilaSon:treeT*builtindependentlyfromanyquery

(2)querying:anyqisentailediffitmapsby*-homomorphismtoT* i.e.qmapsbyhomomorphismtoabounded«development»ofT*

QuerythisfinitedecomposiSontree

M.-L. Mugnier – UNILOG School – 2018 91

Page 92: Reasoning on Data

DataComplexityofgbtsClasses

guarded

weakly guarded

frontier guarded

weakly frontier guarded

frontier 1

Datalog (fes)

ExpTime-c

PTime-c

Previousalgorithmisworst-caseopSmalongbtsfordata/combinedcomplexity.

CanbespecializedtobeopSmalonthesegbtssubclassesM.-L. Mugnier – UNILOG School – 2018 92

Page 93: Reasoning on Data

FES GBTS

FUS

linear

frontier-1

weakly- guarded

weakly frontier-guarded

Datalog

guarded weakly- acyclic

aGRD jointly- acyclic frontier-

guarded

jointly-fg

sticky-join

w-sticky-join

sticky

weakly-sticky

DL-Lite

EL

MFA

glut-fg (BTS)

domain- restricted

super-weak- acyclic

M.-L.Mugnier–UNILOGSchool– 2018 93

Page 94: Reasoning on Data

CONCLUSION

•  Reasoningwithontologiesisbecomingcentralinmanydata-centricapplicaSons

•  SolidtheoreScalfoundaSonswitharangeofontologicalformalisms

thatoffervarioustradeoffexpressivity/complexity

•  Ongoingresearch•  Gobeyond(unionsof)conjuncSvequeries,e.g.combinethemwith

navigaSonalquerieslikeregularpathqueries•  NewqueryrewriSngtechniquesthattargetmorepowerful

langages,e.g.Datalog•  NewqueryansweringtechniquesthatcombinematerialisaSonand

queryrewriSng•  StudytheinteracSonoftheontologywithmappings,whichiskey

toefficientqueryansweringoverheterogeneousdata•  RepresenSngandreasoningwithtemporalandspaSaldata

•  Dealingwithdatainconsistencies ....M.-L.Mugnier–UNILOGSchool– 2018 94

Page 95: Reasoning on Data

(Small) Bibliography BienvenuM.,LeclèreM.,Mugnier,M.-L.andRousset,M.-C., Reasoning with Ontologies, chapter6,volume1in«AguidedtourofarSficialintelligenceresearch»,Springer,toappear.IntroducFonstoseveralaspectsofontology-mediatedqueryansweringwithdescripFonlogicsorexistenFalrulesintheReasoningWebsummerschoolbooks:inparScular:Bienvenu,M.andOrSz,M.(2015).Ontology-mediatedqueryansweringwithdatatractabledescripSonlogics.11thInternaSonalReasoningWebSummerSchool,volume9203ofLNCS,pages218–307.Springer.Mugnier,M.andThomazo,M.(2014).AnintroducSontoontology-basedqueryansweringwithexistenSalrules.10thInternaSonalReasoningWebSummerSchool,volume8714ofLNCS,pages245–278.Springer.Goklob,G.,Orsi,G.,Pieris,A.,andSimkus,M.(2012).DataloganditsextensionsforsemanScwebdatabases.10thInternaSonalReasoningWebSummerSchool,volume7487ofLNCS,pages54–77.Springer.

ThesesynthesesprovidefurtherreferencesM.-L.Mugnier–UNILOGSchool– 2018 95

Page 96: Reasoning on Data

APPENDIX:FURTHERDETAILS

¢  FundamentaldefiniSonsandproperSesfortheFOL(�,∧)fragment

¢  Piece-unifiers

M.-L.Mugnier–UNILOGSchool– 2018 96

Page 97: Reasoning on Data

INTERPRETATIONS/MODELS(1)

¢  VocabularyV=(P,C),where P=finitesetofpredicates C=setofconstants

¢  InterpretaFonI=(DI ,.I)ofV,where DI≠ø (domain) forallcinC,cIinDI

forallpinPwitharityk,pIDIk

¢  Furthermore,uniquenameassumpSon:forallcanddinC,cI≠dI

¢  SimplifyingassumpSon(inlinewiththeuniquenameassumpSon):CDIandforallcinC,cI=c

V = ( {p/2, r/3 }, {a, b} ) I: DI = {a, b, d1} pI = { (b, a), (b, d1), (d1, b) }

rI = { (d1, d1, a) }

¢  Iisamodeloff(builtonV)iffistrueinI

M.-L.Mugnier–UNILOGSchool– 2018 97

Page 98: Reasoning on Data

INTERPRETATIONS/MODELS(2)

¢  LetfinFOL(�,∧).Iisamodeloffiff thereisamappingvfromterms(f)toDIsuchthat

forallp(e1,...,ek)inf,(v(e1),...,v(ek))inpI

I: DI = {a, b, d1} pI = {(b, a), (b, d1), (d1, b)} rI = {(d1, d1, a)} f = �x�y�z ( p(x, y) ∧ p(y, z) ∧ r(x, z, a) )

v: x↦ d1 y↦ b z↦ d1

¢  InterpretaSonscanbeseenassetsofatoms

(withelementsfromD\Cseenasvariables)

p(b,a), p(b,x1), p(x1,b), r(x1, x1,a)¢  I isamodeloffiffthereisahomomorphismfromftoI

M.-L.Mugnier–UNILOGSchool– 2018 98

Page 99: Reasoning on Data

HOMOMORPHISMSAGAINANDAGAIN

¢  OnecandefinehomomorphismsbetweeninterpretaFons¢  Wehave:

IfI1mapsI2then,foranyf,I1modeloff�I2modeloff

¢  ToaformulafinFOL(�,∧),weassignitsisomorphicmodelM(f) (alsocalledcanonicalmodel)

f=�x�y�z(p(x,y)∧p(y,z)∧r(x,z,a))

M(f): D={dx,dy,dz,a} pM(f)={(dx,dy),(dy,dz)} rM(f)={(dx,dz,da)}

M.-L.Mugnier–UNILOGSchool– 2018 99

Page 100: Reasoning on Data

NICESEMANTICPROPERTIESOFFOL(�,∧)

¢  ThecanonicalmodelM(f)isuniversal,i.e.,forallM’modeloff,M(f)mapstoM’

Proof:LetM’modeloff.Then,fmapstoM’.SinceM(f)isomorphictof,M(f)mapstoM’

¢  g⊨f(i.e.,everymodelofgisamodeloff)ifffmapsbyhomomorphismtoM(g)ifffmapsbyhomomorphismtogProof:⇒Assumeg⊨f.InparScularM(g)isamodeloff,hencefmapstoM(g)

⇐AssumefmapstoM(g).SinceM(g)isuniversal:foranyM’modelofg,fmapstoM’,i.e.,M’isamodeloff,henceg⊨f

M.-L.Mugnier–UNILOGSchool– 2018 100

Page 101: Reasoning on Data

WHY«PIECES»?(CONT’D)–PIECE-UNIFIERS

¢  UnificaSonmust«map»partsofqaccordingto«pieces»thatcanbeprovidedbyaruleapplicaSon

(otherwiseitisunsoundoruseless) body head

body head

specializa4onofthefron4er

homomorphism

ThetermsofqunifiedwiththefronSer(orwithconstants)cutqinto«pieces»⇒ enSre«pieces»ofqmustbemappedtothepiecesoftherule

R

q

M.-L.Mugnier–UNILOGSchool– 2018 102

Page 102: Reasoning on Data

PIECE-UNIFIER:ALTERNATIVEDEFINITION

Letu1:fronSer(R)àfronSer(R)�constants(u1isaspecializaSonofthefronSerofR)Letu2beahomomorphismfromq’�qtou1(head(R))Cutpoints:termsofq’mappedtou1(fron4er(R))

ortoconstantsThecutpointscutqinto«pieces»u1+u2isapiece-unifierifq’iscomposedofpieces

body head

body head

u1

u2

M.-L.Mugnier–UNILOGSchool– 2018 103