Datalog - cis.upenn.edususan/cis700/Slides/Lec-DatalogB.pdf · •Non-recursive Datalog with...
Transcript of Datalog - cis.upenn.edususan/cis700/Slides/Lec-DatalogB.pdf · •Non-recursive Datalog with...
© 2017 A. Alawini, S. Davidson
Datalog
SusanB.DavidsonCIS700:AdvancedTopicsinDatabases
MW1:30-3
Towne309
http://www.cis.upenn.edu/~susan/cis700/homepage.html
• Signuptopresentapaper(theGoogledoclinkwassentonFriday)
• Classscheduleisbeingupdatedbasedonthis.
Homework for this week
UniversityofPennsylvania 2
• Firstsummaryisonthefollowingpaper:• “BigDataAnalyticswithDatalogQueriesonSpark”SIGMOD2016
• Whatisasummary(printandbringtoclass)?• Shortparagraphdescribingpaper• 1-3“strengths”,1-3“weaknesses”
• Atleastonequestionyouhaveaboutthepaper.
First paper summary due 2/5
UniversityofPennsylvania 3
• Facts(EDB)andrules(IDB)• Safequeries• Negationcanbetricky…
Last time: Datalog
UniversityofPennsylvania 4
5
The Bachelor problem Suppose we have an EDB relation married(x,y)
and want to calculate the bachelors.
Not correct (and not safe):
bachelor(y) :- NOT married(x,y)
bachelor(y) :- person(x), person(Y), NOT married(x,y)
notBachelor(y):- married(x,y) notBachelor(x):- married(x,y) bachelor(y) :- person(y), NOT notBachelor(y)
Also not correct (but safe):
Correct (and safe):
• Asimplerecursiveprogramandnaïveevaluation
• EvaluatingDatalog+programs
• Negationcanstillbetricky…
This time: Datalog+ with Recursion
UniversityofPennsylvania 6
• Non-recursiveDatalogwithnegationisacleaned-upcoreofSQL
• Unionsofconjunctivequeries• Formsthecoreofqueryoptimization,whatweknowhowtoreasonovereasily
• Youcantranslateeasilybetweennon-recursiveDatalogwithnegationandSQL.
• Takethejoinofthenonnegated,relationalsubgoalsandselect/deletefromthere.
Datalog versus SQL
UniversityofPennsylvania 7
• Recursion
• RulesexpressthingsthatgooninbothFROMandWHEREclauses,andletusstatesomegeneralprinciples(e.g.,containmentofrules)thatarealmostimpossibletostatecorrectlyinSQL.
Why Datalog?
UniversityofPennsylvania 8
Simple recursive Datalog program
UniversityofPennsylvania 9
1 2
2 1
2 3
1 4
3 4
4 5
R=
R encodes a graph.
T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y)
What does T compute?
Simple recursive Datalog program
UniversityofPennsylvania 11
1 2
2 1
2 3
1 4
3 4
4 5
R=
R encodes a graph. T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y)
Alternate ways to compute transitive closure:
T(x,y):- R(x,y) T(x,y):- T(x,z), R(z,y)
T(x,y):- R(x,y) T(x,y):- T(x,z), T(z,y)
Right linear
Left linear
Non-linear
Another Interesting Program
UniversityofPennsylvania 12
1 2
2 1
2 3
1 4
3 4
4 5
R= ODD(x,y):- R(x,y) ODD(x,y):- R(x,z), EVEN(z,y) EVEN(x,y):-R(x,z), ODD(z,y) Q:- ODD(x,x)
Non 2-colorability: R encodes a graph.
13
Evaluating Datalog+ Programs
1. Nonrecursiveprograms.
2. Naïveevaluationofrecursiveprogramswithoutnegation.
3. Semi-naïveevaluationofrecursiveprogramswithoutnegation.§ Eliminatessomeredundantcomputation.
14
Nonrecursive Evaluation
• If(andonlyif!)aDatalogprogramisnotrecursive,thenwecanordertheIDBpredicatessothatinanyruleforp(i.e.,pistheheadpredicate),theonlyIDBpredicatesinthebodyprecedep.
15
Why?
• Considerthedependencygraphwith:• Nodes=IDBpredicates.
• Arcpàqiffthereisaruleforpwithqinthebody.
• Cycleinvolvingnodepmeanspisrecursive.
• Nocycles:usetopologicalordertoevaluatepredicates.
16
Applying Rules
ToevaluateanIDBpredicatep:1. Applyeachruleforptothecurrentrelationscorrespondingto
itssubgoals.
§ “Apply”=Ifanassignmentofvaluestovariablesmakesthebodytrue,insertthetuplethattheheadbecomesintotherelationforp(noduplicates).
§ Alsothinkofthe“product”oftherelationscorrespondingtothesubgoalswithselection/joinconditions
2. Taketheunionoftheresultforeachp-rule.
17
Example
• Assignmentsmakingthebodytrue:
(x,y,z)=(1,5,2),(3,9,4)
• SoP={(1,5),(3,9)}.
Q={(1,2), (3,4)} R={(2,5),(4,9),(4,10), (6,7)}
P(x,y) :- Q(x,z), R(z,y), y<10
18
Algorithm for Nonrecursive
FOR(eachpredicatePintopologicalorder)DO
ApplytherulesforPto previouslycomputed
relations tocomputerelationP;
19
Naïve Evaluation for Recursive
makeallIDBrelationsempty;
WHILE(changestoIDB)DO
FOR(eachIDBpredicateP)DO
EvaluatePusingcurrentvaluesofallrelations;
20
Important Points
• AslongasthereisnonegationofIDBsubgoals,theneachIDBrelation“grows,”i.e.,oneachrounditcontainsatleastwhatitusedtocontain.• monotonicity
• Sincerelationsarefinite,theloopmusteventuallyterminate.
• Resultistheleastfixedpoint(minimalmodel)ofrules.
• Thesamefactsarediscoveredoverandoveragain.
• Thesemi-naïvealgorithmtriestoreducethenumberoffactsdiscoveredmultipletimes.• Thereisasimilaritytoincrementalviewmaintenance
Problem with Naïve Evaluation
UniversityofPennsylvania 21
Background: Incremental View Maintenance
UniversityofPennsylvania 22
V(x,y):-R(x,z),S(z,y)
If Rß R U ΔR then what is ΔV(X,Y)?
ΔV(x,y):-ΔR(x,z),S(z,y)
If Rß R U ΔR and Sß S U ΔS then what is ΔV(X,Y)?
ΔV(x,y):-ΔR(x,z),S(z,y)ΔV(x,y):-R(x,z),ΔS(z,y)ΔV(x,y):-ΔR(x,z),ΔS(z,y)
Background: Incremental View Maintenance
UniversityofPennsylvania 23
V(x,y):-T(x,z),T(z,y)
If Tß T U ΔT then what is ΔV(X,Y)?
ΔV(x,y):-ΔT(x,z),T(z,y)ΔV(x,y):-T(x,z),ΔT(z,y)ΔV(x,y):-ΔT(x,z),ΔT(z,y)
24
Semi-naïve Evaluation
• Keyidea:togetanewtupleforrelationPononeround,theevaluationmustusesometupleforsomerelationofthebodythatwasobtainedonthepreviousround.
• MaintainΔP=newtuplesaddedtoPonpreviousround.
• “Differentiate”rulebodiestobeunionofbodieswithoneIDBsubgoalmade“Δ.”
• SeparatetheDatalogprogramintothenon-recursive,andtherecursivepart.
• EachPidefinedbynon-recursive-SPJUiand(recursive-)SPJUi.
Semi-naïve Evaluation
UniversityofPennsylvania 25
P1=ΔP1=non-recursive-SPJU1,P2=ΔP2=non-recursive-SPJU2,…LoopΔP1 = Δ SPJU1 – P1; ΔP2 = ΔSPJU2 – P2; … if(ΔP1=∅andΔP2=∅and…)thenbreakP1 = P1 ∪ ΔP1; P2 = P2 ∪ ΔP2; …Endloop
Semi-naïve Evaluation
UniversityofPennsylvania 26
P1=ΔP1=non-recursive-SPJU1,P2=ΔP2=non-recursive-SPJU2,…LoopΔP1 = Δ SPJU1 – P1; ΔP2 = ΔSPJU2 – P2; … if(ΔP1=∅andΔP2=∅and…)thenbreakP1 = P1 ∪ ΔP1; P2 = P2 ∪ ΔP2; …Endloop
T(x,y):- R(x,y)T(x,y):- R(x,z), T(z,y)
T(x,y) = ΔT(x,y) =?(nonrecursiverule)LoopΔT(x,y) = ?(recursiveΔ-rule) if (ΔT = ∅)thenbreak
T = T∪ΔTEndloop
Semi-naïve Evaluation
UniversityofPennsylvania 27
P1=ΔP1=non-recursive-SPJU1,P2=ΔP2=non-recursive-SPJU2,…LoopΔP1 = Δ SPJU1 – P1; ΔP2 = ΔSPJU2 – P2; … if(ΔP1=∅andΔP2=∅and…)thenbreakP1 = P1 ∪ ΔP1; P2 = P2 ∪ ΔP2; …Endloop
T(x,y):- R(x,y)T(x,y):- R(x,z), T(z,y)
T(x,y) = ΔT(x,y) =R(x,y)LoopΔT(x,y) = (R(x,z),ΔT(z,y))-R(x,y) if (ΔT = ∅)thenbreak
T = T∪ΔTEndloop
• Avoidsrecomputingsome(butnotall)tuples
• Easytoimplement,nodisadvantageovernaïve
• AruleiscalledlinearifitsbodycontainsonlyonerecursiveIDBpredicate:
• Alinearrulealwaysresultsinasingleincrementalrule
• Anon-linearrulemayresultinmultipleincrementalrules
Discussion of Semi-Naïve Algorithm
UniversityofPennsylvania 28
29
Recursion and Negation Don’t Like Each Other
• WhenruleshavenegatedIDBsubgoals,therecanbeseveralminimalmodels.
• Recall:model=setofIDBfacts,plusthegivenEDBfacts,thatmaketherulestrueforeveryassignmentofvaluestovariables.• Ruleistrueunlessbodyistrueandheadisfalse.
S(x):-R(x),notT(x)T(x):-R(x),notS(x)
SupposeR(a).WhatareSandT?