CompSci 516DataIntensiveComputingSystems
Lecture2SQL- 1
Instructor:Sudeepa Roy
1DukeCS,Fall2017 CompSci516:DatabaseSystems
Announcements• Ifyouareenrolledtotheclass,buthavenotreceivedtheemail
fromPiazza,pleasesendmeanemail• TAofficehoursinLSRCD301• HW1postedonsakai
– Dueon09/21(Thurs),11:55pm.Startearly– nolatedays• Markyourcalendarformidtermdays:10/11,11/29
– Midtermswon’tbeheldlaterifyoumissanexam– Missedexampolicy:gradewillbescaledafter20%deductionfromthe
otherexam(deductionwaivedwithwrittenlettersfromDukeauthority)– Bothexamsmissed->incompletegrade
• Yourgradiance accountshouldbeactivebynow– Bringalaptopinclassforoccasionalpopupquizzes
• Lecture“notes”(incomplete)willbeuploadedbeforetheclass,“slides”(completed)aftertheclass
DukeCS,Fall2017 CompSci516:DatabaseSystems 2
Recap:Lecture1
• WhyuseaDBMS• Structureddatamodel:Relationaldatamodel– table,schema,instance,tuples,attributes– bagandsetsemantic
• Logicalandphysicaldataindependence
DukeCS,Fall2017 CompSci516:DatabaseSystems 3
Today’stopic• OverviewofXML• SQLinanutshell– Readingmaterial:[RG]Chapters3and5– Additionalreadingforpractice:[GUW]Chapter6
• TrySQLfromtoday’slectureonatoydatasetorDBLPdatasetonPostGres!
DukeCS,Fall2017 CompSci516:DatabaseSystems 4
Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.
XML:anoverview
DukeCS,Fall2017 CompSci516:DatabaseSystems 5
Semi-structuredDataandXML
• XML:ExtensibleMarkupLanguage
• Willnotbecoveredindetailinclass,butmanydatasetsavailabletodownloadareinthisform– YouwilldownloadtheDBLPdatasetinXMLformatandtransformintorelationalform(inHW1)
• Datadoesnothaveafixedschema– “Attributes”arepartofthedata– Thedatais“self-describing”– Tree-structured
DukeCS,Fall2017 CompSci516:DatabaseSystems 6
XML:Example<articlemdate="2011-01-11”key="journals/acta/Saxena96">
<author>SanjeevSaxena</author><title>ParallelIntegerSortingandSimulationAmongstCRCW
Models.</title><pages>607-619</pages><year>1996</year><volume>33</volume><journal>Acta Inf.</journal><number>7</number><url>db/journals/acta/acta33.html#Saxena96</url><ee>http://dx.doi.org/10.1007/BF03036466</ee>
</article>
DukeCS,Fall2017 CompSci516:DatabaseSystems 7
Attributes
Elements
Attributevs.Elements
• Elementscanberepeatedandnested• Attributesareuniqueandatomic
DukeCS,Fall2017 CompSci516:DatabaseSystems 8
WhyXML?
+ Servesasamodelsuitableforintegrationofdatabasescontainingsimilardatawithdifferentschemas
– e.g.trytointegratetwostudentdatabases:S1(sid,name,gpa)andS2(sid,dept,year)
– Many“nulls”ifdoneinrelationalmodel,veryeasyinXML• NULL=Akeywordtodenotemissingorunknownvalues
+Flexible– easytochangetheschemaanddata
- Makesqueryprocessingmoredifficult
Whichoneiseasier?• XML(semi-structured)torelational(structured)or• relational(structured)toXML(semi-structured)?
DukeCS,Fall2017 CompSci516:DatabaseSystems 9
XMLtoRelationalModel
• Problem1:Repeatedattributes<book>
<author>Ramakrishnan</author><author>Gehrke</author><title>DatabaseManagementSystems</title><publisher>McGrawHill
</book>
Whatisagoodrelationalschema?
DukeCS,Fall2017 CompSci516:DatabaseSystems 10
XMLtoRelationalModel
• Problem1:Repeatedattributes<book>
<author>Ramakrishnan</author><author>Gehrke</author><title>DatabaseManagementSystems</title><pubisher>McGrawHill</publisher>
</book>
DukeCS,Fall2017 CompSci516:DatabaseSystems 11
Title Publisher Author1 Author2
Whatifthepaperhasasingleauthor?
XMLtoRelationalModel
• Problem1:Repeatedattributes<book>
<author>Garcia-Molina</author><author>Ullman</author><author>Widom</author><title>DatabaseSystems– TheCompleteBook</title><pubisher>PrenticeHall</publisher>
</book>
DukeCS,Fall2017 CompSci516:DatabaseSystems 12
Title Publisher Author1 Author2
Doesnotwork
XMLtoRelationalModel
DukeCS,Fall2017 CompSci516:DatabaseSystems 13
BookId Title Publisher
b1 DatabaseManagementSystems
McGrawHill
b2 DatabaseSystems– TheCompleteBook
PrenticeHall
BookBookId Author
b1 Ramakrishnan
b1 Gehrke
b2 Garcia-Molina
b2 Ullman
b2 Widom
BookAuthoredBy
XMLtoRelationalModel
• Problem2:Missingattributes
<book><author>Ramakrishnan</author><author>Gehrke</author><title>DatabaseManagementSystems</title><pubisher>McGrawHill<edition>Third</edition>
</book><book>
<author>Garcia-Molina</author><author>Ullman</author><author>Widom</author><title>DatabaseSystems– TheComplete
Book</title><pubisher>PrenticeHall</publisher>
</book>
DukeCS,Fall2017 CompSci516:DatabaseSystems 14
BookId
Title Publisher Edition
b1 DatabaseManagementSystems
McGrawHill
Third
b2 DatabaseSystems–TheCompleteBook
PrenticeHall
null
Summary:DataModels
• Relationaldatamodelisthemoststandardfordatabasemanagements– andisthemainfocusofthiscourse
• Semi-structuredmodel/XMLisalsousedinpractice– youwillusetheminhwassignments
• Unstructureddata(text/photo/video)isunavoidable,butwon’tbecoveredinthisclass
DukeCS,Fall2017 CompSci516:DatabaseSystems 15
SQL
DukeCS,Fall2017 CompSci516:DatabaseSystems 16
RelationalQueryLanguages
• Amajorstrengthoftherelationalmodel:supportssimple,powerfulquerying ofdata.
• Queriescanbewrittenintuitively,andtheDBMSisresponsibleforanefficientevaluation– Thekey:precisesemanticsforrelationalqueries.– Allowstheoptimizertoextensivelyre-orderoperations,andstillensurethattheanswerdoesnotchange.
DukeCS,Fall2017 CompSci516:DatabaseSystems 17
TheSQLQueryLanguage
• DevelopedbyIBM(systemR)inthe1970s• Needforastandardsinceitisusedbymanyvendors
• Standards:– SQL-86– SQL-89(minorrevision)– SQL-92(majorrevision)– SQL-99(majorextensions,currentstandard)
DukeCS,Fall2017 CompSci516:DatabaseSystems 18
PurposesofSQL
• DataManipulationLanguage(DML)– Querying:SELECT-FROM-WHERE–Modifying:INSERT/DELETE/UPDATE
• DataDefinitionLanguage(DDL)– CREATE/ALTER/DROP
19DukeCS,Fall2017 CompSci516:DatabaseSystems
TheSQLQueryLanguage
• Tofindall18yearoldstudents,wecanwrite:
SELECT *FROM Students SWHERE S.age=18
•To find just names and logins, replace the first line:
SELECT S.name, S.login
sid name login age gpa53666 Jones jones@cs 18 3.453688 Smith smith@ee 18 3.2
DukeCS,Fall2017 CompSci516:DatabaseSystems 20
allattributes
QueryingMultipleRelations• Whatdoesthefollowing
querycompute?SELECT S.name, E.cidFROM Students S, Enrolled EWHERE S.sid=E.sid AND E.grade=“A”
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
Given the following instances of Enrolled and Students:
we get: ??
sid name login age gpa53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
Students
Enrolled
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
QueryingMultipleRelations• Whatdoesthefollowing
querycompute?SELECT S.name, E.cidFROM Students S, Enrolled EWHERE S.sid=E.sid AND E.grade=“A”
S.name E.cid Smith Topology112
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
Given the following instances of Enrolled and Students:
we get:
sid name login age gpa53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
Students
Enrolled
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
CreatingRelationsinSQL• Createsthe“Students”relation
– thetype(domain)ofeachfieldisspecified
– enforcedbytheDBMSwhenevertuplesareaddedormodified
• Asanotherexample,the“Enrolled”tableholdsinformationaboutcoursesthatstudentstake
CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10),age INTEGER,gpa REAL)
CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))
DukeCS,Fall2017 CompSci516:DatabaseSystems 23
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
sid name login age gpa53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
StudentsEnrolled
DestroyingandAlteringRelations
• DestroystherelationStudents– Theschemainformationand thetuplesaredeleted.
DROP TABLE Students
• The schema of Students is altered by adding a new field; every tuple in the current instance is extended with a null value in the new field.
ALTER TABLE Students ADD COLUMN firstYear: integer
DukeCS,Fall2017 CompSci516:DatabaseSystems 24
AddingandDeletingTuples
• Caninsertasingletupleusing:INSERT INTO Students (sid, name, login, age, gpa)VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
• Can delete all tuples satisfying some condition (e.g., name = Smith):
DELETEFROM Students SWHERE S.name = ‘Smith’
DukeCS,Fall2017 CompSci516:DatabaseSystems 25
IntegrityConstraints(ICs)
• IC: conditionthatmustbetrueforanyinstanceofthedatabase– e.g.,domainconstraints– ICsarespecifiedwhenschemaisdefined– ICsarecheckedwhenrelationsaremodified
• AlegalinstanceofarelationisonethatsatisfiesallspecifiedICs– DBMSwillnotallowillegalinstances
• IftheDBMSchecksICs,storeddataismorefaithfultoreal-worldmeaning– Avoidsdataentryerrors,too!
DukeCS,Fall2017 CompSci516:DatabaseSystems 26
KeysinaDatabase
• Key/CandidateKey• PrimaryKey• SuperKey• ForeignKey
• Primarykeyattributesareunderlined inaschema– Person(pid,address,name)– Person2(address,name,age,job)
DukeCS,Fall2017 CompSci516:DatabaseSystems 27
PrimaryKeyConstraints
• Asetoffieldsisakey forarelationif:1.Notwodistincttuplescanhavesamevaluesinallkeyfields,and
2.Thisisnottrueforanysubsetofthekey
• Part2false?Asuperkey• Ifthereare>1keysforarelation,oneofthekeysischosen(byDBA=DBadmin)tobetheprimarykey– E.g.,sidisakeyforStudents– Theset{sid,gpa}isasuperkey.
• Isthereanypossiblebenefittorefertoatupleusingprimarykey(thananykey)?
DukeCS,Fall2017 CompSci516:DatabaseSystems 28
PrimaryandCandidateKeysinSQL
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY ???)
• “For a given student and course, there is a single grade.”
DukeCS,Fall2017 CompSci516:DatabaseSystems 29
• Possiblymanycandidatekeys– specifiedusingUNIQUE– oneofwhichischosenastheprimarykey.
PrimaryandCandidateKeysinSQL
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid) )
DukeCS,Fall2017 CompSci516:DatabaseSystems 30
• Possiblymanycandidatekeys– specifiedusingUNIQUE– oneofwhichischosenastheprimarykey.
• “For a given student and course, there is a single grade.”
PrimaryandCandidateKeysinSQL
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid) )
• “For a given student and course, there is a single grade.”
• vs.
• “Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade.”
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY ???,UNIQUE ??? )
DukeCS,Fall2017 CompSci516:DatabaseSystems 31
• Possiblymanycandidatekeys– specifiedusingUNIQUE– oneofwhichischosenastheprimarykey.
PrimaryandCandidateKeysinSQL
DukeCS,Fall2017 CompSci516:DatabaseSystems 32
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid) )
• “For a given student and course, there is a single grade.”
• vs.
• “Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade.”
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY sid,UNIQUE (cid, grade))
• Possiblymanycandidatekeys– specifiedusingUNIQUE– oneofwhichischosenastheprimarykey.
PrimaryandCandidateKeysinSQL
DukeCS,Fall2017 CompSci516:DatabaseSystems 33
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid) )
• “For a given student and course, there is a single grade.”
• vs.
• “Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade.”
• Used carelessly, an IC can prevent the storage of database instances that arise in practice!
CREATE TABLE Enrolled(sid CHAR(20)cid CHAR(20),grade CHAR(2),PRIMARY KEY sid,UNIQUE (cid, grade))
• Possiblymanycandidatekeys– specifiedusingUNIQUE– oneofwhichischosenastheprimarykey.
ForeignKeys,ReferentialIntegrity
• Foreignkey:Setoffieldsinonerelationthatisusedto`refer’toatupleinanotherrelation– Mustcorrespondtoprimarykeyofthesecondrelation– Likea`logicalpointer’
• E.g.sid isaforeignkeyreferringtoStudents:– Enrolled(sid:string,cid:string,grade:string)– Ifallforeignkeyconstraintsareenforced,referentialintegrity isachieved
– i.e.,nodanglingreferences
DukeCS,Fall2017 CompSci516:DatabaseSystems 34
ForeignKeysinSQL• OnlystudentslistedintheStudentsrelationshouldbeallowedtoenrollforcourses
CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2),PRIMARY KEY (sid,cid),FOREIGN KEY (sid) REFERENCES Students )
sid name login age gpa53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B
EnrolledStudents
DukeCS,Fall2017 CompSci516:DatabaseSystems 35
EnforcingReferentialIntegrity• ConsiderStudentsandEnrolled
– sidinEnrolledisaforeignkeythatreferencesStudents.
• WhatshouldbedoneifanEnrolledtuplewithanon-existentstudentidisinserted?– Rejectit!
• WhatshouldbedoneifaStudentstupleisdeleted?– ThreesemanticsallowedbySQL1. AlsodeleteallEnrolledtuplesthatrefertoit(cascadedelete)2. DisallowdeletionofaStudentstuplethatisreferredto3. SetsidinEnrolledtuplesthatrefertoittoadefaultsid4. (inadditioninSQL):SetsidinEnrolledtuplesthatrefertoittoaspecial
valuenull,denoting`unknown’or`inapplicable’
• SimilarifprimarykeyofStudentstupleisupdated
DukeCS,Fall2017 CompSci516:DatabaseSystems 36
ReferentialIntegrityinSQL
• SQL/92andSQL:1999supportall4optionsondeletesandupdates.– DefaultisNOACTION(delete/updateisrejected)
– CASCADE (alsodeletealltuplesthatrefertodeletedtuple)
– SETNULL/ SETDEFAULT (setsforeignkeyvalueofreferencingtuple)
CREATE TABLE Enrolled(sid CHAR(20),cid CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid),FOREIGN KEY (sid)REFERENCES StudentsON DELETE CASCADEON UPDATE SET DEFAULT )
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
WheredoICsComeFrom?• ICsarebaseduponthesemanticsofthereal-worldenterprise
thatisbeingdescribedinthedatabaserelations
• CanweinferICsfromaninstance?– WecancheckadatabaseinstancetoseeifanICisviolated,butwe
canNEVER inferthatanICistruebylookingataninstance.– AnICisastatementaboutallpossibleinstances!– Fromexample,weknownameisnotakey,buttheassertionthatsidis
akeyisgiventous.
• KeyandforeignkeyICsarethemostcommon;moregeneralICssupportedtoo
DukeCS,Fall2017 CompSci516:DatabaseSystems 38
ExampleInstances
• WewillusetheseinstancesoftheSailorsandReservesrelationsinourexamples
• IfthekeyfortheReservesrelationcontainedonlytheattributessidandbid,howwouldthesemanticsdiffer?
sid sname rating age22 dustin 7 4531 lubber 8 5558 rusty 10 35
sid bid day22 101 10/10/9658 103 11/12/96
Reserves
Sailor
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
Top Related