The Relational Model - Stanford...
Transcript of The Relational Model - Stanford...
Today’sLecture
1. TheRelationalModel&RelationalAlgebra
2. RelationalAlgebraPt.II[Optional:mayskip]
2
Lecture16
Whatyouwilllearnaboutinthissection
1. TheRelationalModel
2. RelationalAlgebra:BasicOperators
3. Execution
4. ACTIVITY:FromSQLtoRA&Back
4
Lecture16>Section1
Motivation
TheRelationalmodelisprecise,implementable,andwecanoperateonit
(query/update,etc.)
Databasemapsinternallyintothisprocedurallanguage.
Lecture16>Section1>TheRelationalModel
ALittleHistory
• RelationalmodelduetoEdgar“Ted”Codd,amathematicianatIBMin1970• ARelationalModelofDataforLargeSharedDataBanks". CommunicationsoftheACM 13 (6):377–387
• IBMdidn’twanttouserelationalmodel(takemoneyfromIMS)• Apparentlyusedinthemoonlanding…
WonTuringaward1981
Lecture16>Section1>TheRelationalModel
TheRelationalModel:Schemata
• RelationalSchema:
Lecture16>Section1>TheRelationalModel
Students(sid: string, name: string, gpa: float)
AttributesString,float,int,etc.arethedomains oftheattributes
Relationname
8
TheRelationalModel:Data
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Student
Anattribute (orcolumn)isatypeddataentrypresentineachtupleintherelation
Thenumberofattributesisthearity oftherelation
Lecture16>Section1>TheRelationalModel
9
TheRelationalModel:Data
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Student
Atuple orrow (orrecord)isasingleentryinthetablehavingtheattributesspecifiedbytheschema
Thenumberoftuplesisthecardinality oftherelation
Lecture16>Section1>TheRelationalModel
10
TheRelationalModel:DataStudent
Arelationalinstance isaset oftuplesallconformingtothesameschema
Recall:InpracticeDBMSsrelaxthesetrequirement,andusemultisets.
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Lecture16>Section1>TheRelationalModel
• Arelationalschema describesthedatathatiscontainedinarelationalinstance
ToReiterate
LetR(f1:Dom1,…,fm:Domm)bearelationalschema then,aninstanceofRisasubsetofDom1 xDom2 x…xDomn
Lecture16>Section1>TheRelationalModel
Inthisway,arelationalschema Risatotalfunctionfromattributenames totypes
• Arelationalschema describesthedatathatiscontainedinarelationalinstance
OneMoreTime
ArelationRofarity t isafunction:R:Dom1 x…xDomt à {0,1}
Lecture16>Section1>TheRelationalModel
Then,theschemaissimplythesignatureofthefunction
I.e.returnswhetherornotatupleofmatchingtypesisamemberofit
Noteherethatordermatters,attributenamedoesn’t…We’ll(mostly)workwiththeothermodel(lastslide)in
whichattributenamematters,orderdoesn’t!
Arelationaldatabase
• Arelationaldatabaseschema isasetofrelationalschemata,oneforeachrelation
• Arelationaldatabaseinstance isasetofrelationalinstances,oneforeachrelation
Twoconventions:1. Wecallrelationaldatabaseinstancesassimplydatabases2. Weassumeallinstancesarevalid,i.e.,satisfythedomainconstraints
Lecture16>Section1>TheRelationalModel
RemembertheCMS
• RelationDBSchema• Students(sid:string,name:string,gpa:float)• Courses(cid:string,cname:string,credits:int)• Enrolled(sid:string,cid:string,grade:string)
Sid Name Gpa101 Bob 3.2123 Mary 3.8
Students
cid cname credits564 564-2 4308 417 2
Coursessid cid Grade123 564 A
Enrolled
RelationInstances
14
Lecture16>Section1>TheRelationalModel
Notethattheschemasimposeeffectivedomain/typeconstraints,i.e.Gpacan’tbe“Apple”
2nd PartoftheModel:Querying
“FindnamesofallstudentswithGPA>3.5”
Wedon’ttellthesystem howorwhere togetthedata- justwhatwewant,i.e.,Queryingisdeclarative
Actually,Ishowedhowtodothistranslationforamuchricherlanguage!
Lecture16>Section1>TheRelationalModel
SELECT S.nameFROM Students SWHERE S.gpa > 3.5;
Tomakethishappen,weneedtotranslatethedeclarativequeryintoaseriesofoperators…we’llseethisnext!
Virtuesofthemodel
• Physicalindependence(logicaltoo),Declarative
• Simple,elegantclean:Everythingisarelation
• Whydidittakemultipleyears?• Doubteditcouldbedoneefficiently.
Lecture16>Section1>TheRelationalModel
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Declarativequery(fromuser)
Translatetorelationalalgebraexpresson
Findlogicallyequivalent- butmoreefficient- RAexpression
Executeeachoperatoroftheoptimizedplan!
Lecture16>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!
Lecture16>Section1 >RelationalAlgebra
• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division
RelationalAlgebra(RA)
We’lllookatthesefirst!
Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)
Lecture16>Section1 >RelationalAlgebra
Keepinmind:RAoperatesonsets!
• RDBMSsusemultisets,howeverinrelationalalgebraformalismwewillconsidersets!
• Also:wewillconsiderthenamedperspective,whereeveryattributemusthaveauniquename• àattributeorderdoesnotmatter…
Lecture16>Section1 >RelationalAlgebra
NowontothebasicRAoperators…
• Returnsalltupleswhichsatisfyacondition• Notation: sc(R)• Examples• sSalary >40000 (Employee)• sname =“Smith” (Employee)
• Theconditionccanbe=,<,£,>,³,<>
1.Selection(𝜎)
SELECT *FROM StudentsWHERE gpa > 3.5;
SQL:
RA:𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Lecture16>Section1 >RelationalAlgebra
sSalary >40000 (Employee)
SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000
SSN Name Salary5423341 Smith 6000004352342 Fred 500000
Anotherexample:
Lecture16>Section1 >RelationalAlgebra
• Eliminatescolumns,thenremovesduplicates• Notation:P A1,…,An (R)• Example:projectsocial-securitynumberandnames:• P SSN,Name (Employee)• Outputschema:Answer(SSN,Name)
2.Projection(Π)
SELECT DISTINCTsname,gpa
FROM Students;
SQL:
RA:Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Lecture16>Section1 >RelationalAlgebra
P Name,Salary (Employee)
SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000
Name SalaryJohn 200000John 600000
Anotherexample:
Lecture16>Section1 >RelationalAlgebra
NotethatRAOperatorsareCompositional!
SELECT DISTINCTsname,gpa
FROM StudentsWHERE gpa > 3.5;
Students(sid,sname,gpa)
HowdowerepresentthisqueryinRA?
Π45$67,"#$(𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
𝜎"#$&'.)(Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
Aretheselogicallyequivalent?
Lecture16>Section1 >RelationalAlgebra
• EachtupleinR1witheachtupleinR2• Notation:R1´ R2• Example:• Employee´ Dependents
• Rareinpractice;mainlyusedtoexpressjoins
3.Cross-Product(×)
SELECT *FROM Students, People;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
Lecture16>Section1 >RelationalAlgebra
ssn pname address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid sname gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
×
ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4
5423341 Bob 217 Rosse 001 John 3.4
1234545 John 216 Rosse 002 Bob 1.3
5423341 Bob 216 Rosse 002 Bob 1.3
People StudentsAnotherexample:
Lecture16>Section1 >RelationalAlgebra
• Changestheschema,nottheinstance• A‘special’operator- neitherbasicnorderived• Notation:r B1,…,Bn (R)
• Note:thisisshorthandfortheproperform(sincenames,notordermatters!):• r A1àB1,…,AnàBn (R)
Renaming(𝜌)
SELECTsid AS studId,sname AS name,gpa AS gradePtAvg
FROM Students;
SQL:
RA:𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Wecareaboutthisoperatorbecause weareworkinginanamedperspective
Lecture16>Section1 >RelationalAlgebra
sid sname gpa001 John 3.4
002 Bob 1.3
𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students
studId name gradePtAvg001 John 3.4
002 Bob 1.3
Students
Anotherexample:
Lecture16>Section1 >RelationalAlgebra
• Notation:R1⋈R2
• JoinsR1 andR2 onequalityofallsharedattributes• IfR1 hasattributesetA,andR2 hasattributesetB,andtheyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2
• OurfirstexampleofaderivedRA operator:• Meaning:R1⋈ R2 =PAUB(sC=D(𝜌J→L(R1)´ R2))• Where:
• Therename𝜌J→L renamesthesharedattributesinoneoftherelations
• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicate
commonattributes
NaturalJoin(⋈)
SELECT DISTINCTssid, S.name, gpa,ssn, address
FROM Students S,People P
WHERE S.name = P.name;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,name,gpa)People(ssn,name,address)
Lecture16>Section1 >RelationalAlgebra
ssn P.name address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid S.name gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
⋈
sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse
002 Bob 1.3 5423341 216 Rosse
PeoplePStudentsSAnotherexample:
Lecture16>Section1 >RelationalAlgebra
NaturalJoin
• GivenschemasR(A,B,C,D),S(A,C,E),whatistheschemaofR⋈S?
• GivenR(A,B,C),S(D,E),whatisR⋈S?
• GivenR(A,B),S(A,B),whatisR⋈S?
Lecture16>Section1 >RelationalAlgebra
Example:ConvertingSFWQuery->RA
SELECT DISTINCTgpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDsname = pname;
HowdowerepresentthisqueryinRA?
Π"#$,$AAC744(𝜎"#$&'.)(𝑆 ⋈ 𝑃))
Lecture16>Section1 >RelationalAlgebra
Students(sid,sname,gpa)People(ssn,sname,address)
LogicalEquivalece ofRAPlans
• GivenrelationsR(A,B)andS(B,C):
• Here,projection&selectioncommute:• 𝜎EM)(ΠE(𝑅)) = ΠE(𝜎EM)(𝑅))
• Whatabouthere?• 𝜎EM)(ΠP(𝑅))?= ΠP(𝜎EM)(𝑅))
We’lllookatthisinmoredepthlaterinthelecture…
Lecture16>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
WesawhowwecantransformdeclarativeSQLqueriesintoprecise,compositionalRAplans
Lecture16>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
We’lllookathowtothenoptimizetheseplanslaterinthislecture
Lecture16>Section1 >RelationalAlgebra
RDBMSArchitecture
HowistheRA“plan”executed?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Wealreadyknowhowtoexecuteallthebasicoperators!
Lecture16>Section1 >RelationalAlgebra
RAPlanExecution
• NaturalJoin/Join:• Wesawhowtousememory&IOcostconsiderationstopickthecorrectalgorithmtoexecuteajoin with(BNLJ,SMJ,HJ…)!
• Selection:• Wesawhowtouseindexestoaidselection• Canalwaysfallbackonscan/binarysearchaswell
• Projection:• Themainoperationhereisfindingdistinctvaluesoftheprojecttuples;webrieflydiscussedhowtodothiswithe.g.hashingorsorting
Wealreadyknowhowtoexecuteallthebasicoperators!
Lecture16>Section1 >RelationalAlgebra
Whatyouwilllearnaboutinthissection
1. SetOperationsinRA
2. FancierRA
3. Extensions&Limitations
42
Lecture16>Section2
• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division
RelationalAlgebra(RA)
We’lllookatthese
Andalsoatsomeofthesederivedoperators
Lecture16>Section2
1.Union(È) and2.Difference(–)
• R1È R2• Example:• ActiveEmployeesÈ RetiredEmployees
• R1– R2• Example:• AllEmployees -- RetiredEmployees
R1 R2
R1 R2
Lecture16>Section2 >SetOperations
WhataboutIntersection(Ç) ?
• Itisaderivedoperator• R1Ç R2=R1– (R1– R2)• Alsoexpressedasajoin!• Example
• UnionizedEmployeesÇ RetiredEmployees
R1 R2
Lecture16>Section2 >SetOperations
ThetaJoin(⋈q)
• Ajointhatinvolvesapredicate• R1⋈q R2=s q (R1´ R2)• Hereq canbeanycondition
SELECT *FROM
Students,PeopleWHERE q;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈R 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
Notethatnaturaljoinisathetajoin+aprojection.
Lecture16>Section2>FancierRA
Equi-join(⋈A=B)
• Athetajoinwhereq isanequality• R1⋈A=B R2=s A=B (R1´ R2)• Example:• Employee⋈SSN=SSN Dependents
SELECT *FROM
Students S,People P
WHERE sname = pname;
SQL:
RA:𝑆 ⋈45$67M#5$67 𝑃
Students(sid,sname,gpa)People(ssn,pname,address)
Mostcommonjoininpractice!
Lecture16>Section2>FancierRA
Semijoin (⋉)
• R⋉ S=P A1,…,An (R⋈ S)• WhereA1,…,An aretheattributesinR• Example:• Employee⋉Dependents
SELECT DISTINCTsid,sname,gpa
FROM Students,People
WHEREsname = pname;
SQL:
RA:
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋉ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
Lecture16>Section2>FancierRA
SemijoinsinDistributedDatabases• Semijoins areoftenusedtocomputenaturaljoinsindistributeddatabases
SSN Name. . . . . .
SSN Dname Age. . . . . .
Employee
Dependents
network
Employee ⋈ssn=ssn (s age>71 (Dependents))
T = P SSN s age>71 (Dependents)R = Employee ⋉T
Answer = R ⋈Dependents
Sendlessdatatoreducenetworkbandwidth!
Lecture16>Section2>FancierRA
RAExpressionsCanGetComplex!
PersonPurchasePersonProduct
sname=fred sname=gizmo
P pidP ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
P name
Lecture16>Section2>FancierRA
RecallthatSQLusesMultisets
53
Tuple
(1,a)
(1,a)
(1, b)
(2,c)
(2,c)
(2,c)
(1,d)
(1,d)
Tuple 𝝀(𝑿)
(1,a) 2
(1,b) 1
(2,c) 3
(1, d) 2EquivalentRepresentationsofaMultiset
Multiset X
Multiset X
Note:Inasetallcountsare{0,1}.
𝝀 𝑿 =“CountoftupleinX”(Itemsnotlistedhaveimplicitcount0)
Lecture16>Section2>Extensions&Limitations
GeneralizingSetOperationstoMultisetOperations
54
Tuple 𝝀(𝑿)
(1,a) 2
(1,b) 0
(2,c) 3
(1, d) 0
Multiset X
Tuple 𝝀(𝒀)
(1,a) 5
(1,b) 1
(2,c) 2
(1, d) 2
Multiset Y
Tuple 𝝀(𝒁)
(1,a) 2
(1,b) 0
(2,c) 2
(1, d) 0
Multiset Z
∩ =
𝝀 𝒁 = 𝒎𝒊𝒏(𝝀 𝑿 , 𝝀 𝒀 )Forsets,thisisintersection
Lecture16>Section2>Extensions&Limitations
55
Tuple 𝝀(𝑿)
(1,a) 2
(1,b) 0
(2,c) 3
(1, d) 0
Multiset X
Tuple 𝝀(𝒀)
(1,a) 5
(1,b) 1
(2,c) 2
(1, d) 2
Multiset Y
Tuple 𝝀(𝒁)
(1,a) 7
(1,b) 1
(2,c) 5
(1, d) 2
Multiset Z
∪ =
𝝀 𝒁 = 𝝀 𝑿 + 𝝀 𝒀Forsets,
thisisunion
GeneralizingSetOperationstoMultisetOperations
Lecture16>Section2>Extensions&Limitations
OperationsonMultisets
AllRAoperationsneedtobedefinedcarefullyonbags
• sC(R):preservethenumberofoccurrences
• PA(R):noduplicateelimination
• Cross-product,join:noduplicateelimination
Thisisimportant- relationalenginesworkonmultisets,notsets!
Lecture16>Section2>Extensions&Limitations