The Relational Model - Stanford...

57
The Relational Model Lecture 16

Transcript of The Relational Model - Stanford...

TheRelationalModel

Lecture16

Today’sLecture

1. TheRelationalModel&RelationalAlgebra

2. RelationalAlgebraPt.II[Optional:mayskip]

2

Lecture16

1.TheRelationalModel&RelationalAlgebra

3

Lecture16>Section1

Whatyouwilllearnaboutinthissection

1. TheRelationalModel

2. RelationalAlgebra:BasicOperators

3. Execution

4. ACTIVITY:FromSQLtoRA&Back

4

Lecture16>Section1

Motivation

TheRelationalmodelisprecise,implementable,andwecanoperateonit

(query/update,etc.)

Databasemapsinternallyintothisprocedurallanguage.

Lecture16>Section1>TheRelationalModel

ALittleHistory

• RelationalmodelduetoEdgar“Ted”Codd,amathematicianatIBMin1970• ARelationalModelofDataforLargeSharedDataBanks". CommunicationsoftheACM 13 (6):377–387

• IBMdidn’twanttouserelationalmodel(takemoneyfromIMS)• Apparentlyusedinthemoonlanding…

WonTuringaward1981

Lecture16>Section1>TheRelationalModel

TheRelationalModel:Schemata

• RelationalSchema:

Lecture16>Section1>TheRelationalModel

Students(sid: string, name: string, gpa: float)

AttributesString,float,int,etc.arethedomains oftheattributes

Relationname

8

TheRelationalModel:Data

sid name gpa

001 Bob 3.2

002 Joe 2.8

003 Mary 3.8

004 Alice 3.5

Student

Anattribute (orcolumn)isatypeddataentrypresentineachtupleintherelation

Thenumberofattributesisthearity oftherelation

Lecture16>Section1>TheRelationalModel

9

TheRelationalModel:Data

sid name gpa

001 Bob 3.2

002 Joe 2.8

003 Mary 3.8

004 Alice 3.5

Student

Atuple orrow (orrecord)isasingleentryinthetablehavingtheattributesspecifiedbytheschema

Thenumberoftuplesisthecardinality oftherelation

Lecture16>Section1>TheRelationalModel

10

TheRelationalModel:DataStudent

Arelationalinstance isaset oftuplesallconformingtothesameschema

Recall:InpracticeDBMSsrelaxthesetrequirement,andusemultisets.

sid name gpa

001 Bob 3.2

002 Joe 2.8

003 Mary 3.8

004 Alice 3.5

Lecture16>Section1>TheRelationalModel

• Arelationalschema describesthedatathatiscontainedinarelationalinstance

ToReiterate

LetR(f1:Dom1,…,fm:Domm)bearelationalschema then,aninstanceofRisasubsetofDom1 xDom2 x…xDomn

Lecture16>Section1>TheRelationalModel

Inthisway,arelationalschema Risatotalfunctionfromattributenames totypes

• Arelationalschema describesthedatathatiscontainedinarelationalinstance

OneMoreTime

ArelationRofarity t isafunction:R:Dom1 x…xDomt à {0,1}

Lecture16>Section1>TheRelationalModel

Then,theschemaissimplythesignatureofthefunction

I.e.returnswhetherornotatupleofmatchingtypesisamemberofit

Noteherethatordermatters,attributenamedoesn’t…We’ll(mostly)workwiththeothermodel(lastslide)in

whichattributenamematters,orderdoesn’t!

Arelationaldatabase

• Arelationaldatabaseschema isasetofrelationalschemata,oneforeachrelation

• Arelationaldatabaseinstance isasetofrelationalinstances,oneforeachrelation

Twoconventions:1. Wecallrelationaldatabaseinstancesassimplydatabases2. Weassumeallinstancesarevalid,i.e.,satisfythedomainconstraints

Lecture16>Section1>TheRelationalModel

RemembertheCMS

• RelationDBSchema• Students(sid:string,name:string,gpa:float)• Courses(cid:string,cname:string,credits:int)• Enrolled(sid:string,cid:string,grade:string)

Sid Name Gpa101 Bob 3.2123 Mary 3.8

Students

cid cname credits564 564-2 4308 417 2

Coursessid cid Grade123 564 A

Enrolled

RelationInstances

14

Lecture16>Section1>TheRelationalModel

Notethattheschemasimposeeffectivedomain/typeconstraints,i.e.Gpacan’tbe“Apple”

2nd PartoftheModel:Querying

“FindnamesofallstudentswithGPA>3.5”

Wedon’ttellthesystem howorwhere togetthedata- justwhatwewant,i.e.,Queryingisdeclarative

Actually,Ishowedhowtodothistranslationforamuchricherlanguage!

Lecture16>Section1>TheRelationalModel

SELECT S.nameFROM Students SWHERE S.gpa > 3.5;

Tomakethishappen,weneedtotranslatethedeclarativequeryintoaseriesofoperators…we’llseethisnext!

Virtuesofthemodel

• Physicalindependence(logicaltoo),Declarative

• Simple,elegantclean:Everythingisarelation

• Whydidittakemultipleyears?• Doubteditcouldbedoneefficiently.

Lecture16>Section1>TheRelationalModel

RelationalAlgebra

Lecture16>Section1 >RelationalAlgebra

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

Declarativequery(fromuser)

Translatetorelationalalgebraexpresson

Findlogicallyequivalent- butmoreefficient- RAexpression

Executeeachoperatoroftheoptimizedplan!

Lecture16>Section1 >RelationalAlgebra

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!

Lecture16>Section1 >RelationalAlgebra

• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-

• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division

RelationalAlgebra(RA)

We’lllookatthesefirst!

Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)

Lecture16>Section1 >RelationalAlgebra

Keepinmind:RAoperatesonsets!

• RDBMSsusemultisets,howeverinrelationalalgebraformalismwewillconsidersets!

• Also:wewillconsiderthenamedperspective,whereeveryattributemusthaveauniquename• àattributeorderdoesnotmatter…

Lecture16>Section1 >RelationalAlgebra

NowontothebasicRAoperators…

• Returnsalltupleswhichsatisfyacondition• Notation: sc(R)• Examples• sSalary >40000 (Employee)• sname =“Smith” (Employee)

• Theconditionccanbe=,<,£,>,³,<>

1.Selection(𝜎)

SELECT *FROM StudentsWHERE gpa > 3.5;

SQL:

RA:𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students(sid,sname,gpa)

Lecture16>Section1 >RelationalAlgebra

sSalary >40000 (Employee)

SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000

SSN Name Salary5423341 Smith 6000004352342 Fred 500000

Anotherexample:

Lecture16>Section1 >RelationalAlgebra

• Eliminatescolumns,thenremovesduplicates• Notation:P A1,…,An (R)• Example:projectsocial-securitynumberandnames:• P SSN,Name (Employee)• Outputschema:Answer(SSN,Name)

2.Projection(Π)

SELECT DISTINCTsname,gpa

FROM Students;

SQL:

RA:Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students(sid,sname,gpa)

Lecture16>Section1 >RelationalAlgebra

P Name,Salary (Employee)

SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000

Name SalaryJohn 200000John 600000

Anotherexample:

Lecture16>Section1 >RelationalAlgebra

NotethatRAOperatorsareCompositional!

SELECT DISTINCTsname,gpa

FROM StudentsWHERE gpa > 3.5;

Students(sid,sname,gpa)

HowdowerepresentthisqueryinRA?

Π45$67,"#$(𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

𝜎"#$&'.)(Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

Aretheselogicallyequivalent?

Lecture16>Section1 >RelationalAlgebra

• EachtupleinR1witheachtupleinR2• Notation:R1´ R2• Example:• Employee´ Dependents

• Rareinpractice;mainlyusedtoexpressjoins

3.Cross-Product(×)

SELECT *FROM Students, People;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,sname,gpa)People(ssn,pname,address)

Lecture16>Section1 >RelationalAlgebra

ssn pname address1234545 John 216 Rosse

5423341 Bob 217 Rosse

sid sname gpa001 John 3.4

002 Bob 1.3

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒

×

ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4

5423341 Bob 217 Rosse 001 John 3.4

1234545 John 216 Rosse 002 Bob 1.3

5423341 Bob 216 Rosse 002 Bob 1.3

People StudentsAnotherexample:

Lecture16>Section1 >RelationalAlgebra

• Changestheschema,nottheinstance• A‘special’operator- neitherbasicnorderived• Notation:r B1,…,Bn (R)

• Note:thisisshorthandfortheproperform(sincenames,notordermatters!):• r A1àB1,…,AnàBn (R)

Renaming(𝜌)

SELECTsid AS studId,sname AS name,gpa AS gradePtAvg

FROM Students;

SQL:

RA:𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students(sid,sname,gpa)

Wecareaboutthisoperatorbecause weareworkinginanamedperspective

Lecture16>Section1 >RelationalAlgebra

sid sname gpa001 John 3.4

002 Bob 1.3

𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Students

studId name gradePtAvg001 John 3.4

002 Bob 1.3

Students

Anotherexample:

Lecture16>Section1 >RelationalAlgebra

• Notation:R1⋈R2

• JoinsR1 andR2 onequalityofallsharedattributes• IfR1 hasattributesetA,andR2 hasattributesetB,andtheyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2

• OurfirstexampleofaderivedRA operator:• Meaning:R1⋈ R2 =PAUB(sC=D(𝜌J→L(R1)´ R2))• Where:

• Therename𝜌J→L renamesthesharedattributesinoneoftherelations

• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicate

commonattributes

NaturalJoin(⋈)

SELECT DISTINCTssid, S.name, gpa,ssn, address

FROM Students S,People P

WHERE S.name = P.name;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,name,gpa)People(ssn,name,address)

Lecture16>Section1 >RelationalAlgebra

ssn P.name address1234545 John 216 Rosse

5423341 Bob 217 Rosse

sid S.name gpa001 John 3.4

002 Bob 1.3

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse

002 Bob 1.3 5423341 216 Rosse

PeoplePStudentsSAnotherexample:

Lecture16>Section1 >RelationalAlgebra

NaturalJoin

• GivenschemasR(A,B,C,D),S(A,C,E),whatistheschemaofR⋈S?

• GivenR(A,B,C),S(D,E),whatisR⋈S?

• GivenR(A,B),S(A,B),whatisR⋈S?

Lecture16>Section1 >RelationalAlgebra

Example:ConvertingSFWQuery->RA

SELECT DISTINCTgpa,address

FROM Students S,People P

WHERE gpa > 3.5 ANDsname = pname;

HowdowerepresentthisqueryinRA?

Π"#$,$AAC744(𝜎"#$&'.)(𝑆 ⋈ 𝑃))

Lecture16>Section1 >RelationalAlgebra

Students(sid,sname,gpa)People(ssn,sname,address)

LogicalEquivalece ofRAPlans

• GivenrelationsR(A,B)andS(B,C):

• Here,projection&selectioncommute:• 𝜎EM)(ΠE(𝑅)) = ΠE(𝜎EM)(𝑅))

• Whatabouthere?• 𝜎EM)(ΠP(𝑅))?= ΠP(𝜎EM)(𝑅))

We’lllookatthisinmoredepthlaterinthelecture…

Lecture16>Section1 >RelationalAlgebra

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

WesawhowwecantransformdeclarativeSQLqueriesintoprecise,compositionalRAplans

Lecture16>Section1 >RelationalAlgebra

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

We’lllookathowtothenoptimizetheseplanslaterinthislecture

Lecture16>Section1 >RelationalAlgebra

RDBMSArchitecture

HowistheRA“plan”executed?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

Wealreadyknowhowtoexecuteallthebasicoperators!

Lecture16>Section1 >RelationalAlgebra

RAPlanExecution

• NaturalJoin/Join:• Wesawhowtousememory&IOcostconsiderationstopickthecorrectalgorithmtoexecuteajoin with(BNLJ,SMJ,HJ…)!

• Selection:• Wesawhowtouseindexestoaidselection• Canalwaysfallbackonscan/binarysearchaswell

• Projection:• Themainoperationhereisfindingdistinctvaluesoftheprojecttuples;webrieflydiscussedhowtodothiswithe.g.hashingorsorting

Wealreadyknowhowtoexecuteallthebasicoperators!

Lecture16>Section1 >RelationalAlgebra

Activity-16-1.ipynb

40

Lecture16>Section1>ACTIVITY

2.Adv.RelationalAlgebra

41

Lecture16>Section2

Whatyouwilllearnaboutinthissection

1. SetOperationsinRA

2. FancierRA

3. Extensions&Limitations

42

Lecture16>Section2

• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-

• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division

RelationalAlgebra(RA)

We’lllookatthese

Andalsoatsomeofthesederivedoperators

Lecture16>Section2

1.Union(È) and2.Difference(–)

• R1È R2• Example:• ActiveEmployeesÈ RetiredEmployees

• R1– R2• Example:• AllEmployees -- RetiredEmployees

R1 R2

R1 R2

Lecture16>Section2 >SetOperations

WhataboutIntersection(Ç) ?

• Itisaderivedoperator• R1Ç R2=R1– (R1– R2)• Alsoexpressedasajoin!• Example

• UnionizedEmployeesÇ RetiredEmployees

R1 R2

Lecture16>Section2 >SetOperations

FancierRA

Lecture16>Section2>FancierRA

ThetaJoin(⋈q)

• Ajointhatinvolvesapredicate• R1⋈q R2=s q (R1´ R2)• Hereq canbeanycondition

SELECT *FROM

Students,PeopleWHERE q;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈R 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,sname,gpa)People(ssn,pname,address)

Notethatnaturaljoinisathetajoin+aprojection.

Lecture16>Section2>FancierRA

Equi-join(⋈A=B)

• Athetajoinwhereq isanequality• R1⋈A=B R2=s A=B (R1´ R2)• Example:• Employee⋈SSN=SSN Dependents

SELECT *FROM

Students S,People P

WHERE sname = pname;

SQL:

RA:𝑆 ⋈45$67M#5$67 𝑃

Students(sid,sname,gpa)People(ssn,pname,address)

Mostcommonjoininpractice!

Lecture16>Section2>FancierRA

Semijoin (⋉)

• R⋉ S=P A1,…,An (R⋈ S)• WhereA1,…,An aretheattributesinR• Example:• Employee⋉Dependents

SELECT DISTINCTsid,sname,gpa

FROM Students,People

WHEREsname = pname;

SQL:

RA:

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋉ 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,sname,gpa)People(ssn,pname,address)

Lecture16>Section2>FancierRA

SemijoinsinDistributedDatabases• Semijoins areoftenusedtocomputenaturaljoinsindistributeddatabases

SSN Name. . . . . .

SSN Dname Age. . . . . .

Employee

Dependents

network

Employee ⋈ssn=ssn (s age>71 (Dependents))

T = P SSN s age>71 (Dependents)R = Employee ⋉T

Answer = R ⋈Dependents

Sendlessdatatoreducenetworkbandwidth!

Lecture16>Section2>FancierRA

RAExpressionsCanGetComplex!

PersonPurchasePersonProduct

sname=fred sname=gizmo

P pidP ssn

seller-ssn=ssn

pid=pid

buyer-ssn=ssn

P name

Lecture16>Section2>FancierRA

Multisets

Lecture16>Section2>Extensions&Limitations

RecallthatSQLusesMultisets

53

Tuple

(1,a)

(1,a)

(1, b)

(2,c)

(2,c)

(2,c)

(1,d)

(1,d)

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 1

(2,c) 3

(1, d) 2EquivalentRepresentationsofaMultiset

Multiset X

Multiset X

Note:Inasetallcountsare{0,1}.

𝝀 𝑿 =“CountoftupleinX”(Itemsnotlistedhaveimplicitcount0)

Lecture16>Section2>Extensions&Limitations

GeneralizingSetOperationstoMultisetOperations

54

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 0

(2,c) 3

(1, d) 0

Multiset X

Tuple 𝝀(𝒀)

(1,a) 5

(1,b) 1

(2,c) 2

(1, d) 2

Multiset Y

Tuple 𝝀(𝒁)

(1,a) 2

(1,b) 0

(2,c) 2

(1, d) 0

Multiset Z

∩ =

𝝀 𝒁 = 𝒎𝒊𝒏(𝝀 𝑿 , 𝝀 𝒀 )Forsets,thisisintersection

Lecture16>Section2>Extensions&Limitations

55

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 0

(2,c) 3

(1, d) 0

Multiset X

Tuple 𝝀(𝒀)

(1,a) 5

(1,b) 1

(2,c) 2

(1, d) 2

Multiset Y

Tuple 𝝀(𝒁)

(1,a) 7

(1,b) 1

(2,c) 5

(1, d) 2

Multiset Z

∪ =

𝝀 𝒁 = 𝝀 𝑿 + 𝝀 𝒀Forsets,

thisisunion

GeneralizingSetOperationstoMultisetOperations

Lecture16>Section2>Extensions&Limitations

OperationsonMultisets

AllRAoperationsneedtobedefinedcarefullyonbags

• sC(R):preservethenumberofoccurrences

• PA(R):noduplicateelimination

• Cross-product,join:noduplicateelimination

Thisisimportant- relationalenginesworkonmultisets,notsets!

Lecture16>Section2>Extensions&Limitations

RAhasLimitations!

• Cannotcompute“transitiveclosure”

• FindalldirectandindirectrelativesofFred• CannotexpressinRA!!!

• NeedtowriteCprogram,useagraphengine,ormodernSQL…

Name1 Name2 RelationshipFred Mary FatherMary Joe CousinMary Bill SpouseNancy Lou Sister

Lecture16>Section2>Extensions&Limitations