CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database...
Transcript of CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database...
CS133:Databases
Fall2019
Lec01–09/03
Introduction&RelationalModel
Prof.BethTrushkowsky
http://www.puntogeek.com/wp-content/uploads/2012/12/21168.strip_.gif
Data!
• Dataiseverywhere– Banking,airlinereservations– Socialmedia,clickinganythingontheinternet
“facebookfriendwheel”,https://www.flickr.com/photos/antjeverena/
http://www.bigdataanalyticstoday.com/category/infographics/
Needsystemsto
managethedata
GoalsforToday
• Whatisadatabaseanyway?
• ImportantDBMSfeatures
– andchallenges!
• Courselogistics
• Relationaldatamodel
– Whyit’sgreat
– Whatitlookslike(introtoSQL)
So,whatisadatabase?
Fromthetextbook:
• Database:acollectionofdata,typicallydescribingtheactivitiesofoneormorerelated
organizations
• Databasesystem,DataBaseManagementSystem
(DBMS):softwaredesignedtoassistin
maintainingandutilizinglargecollectionsofdata
DBMSdesiderata
• Askquestions(queries)aboutdata• Addandupdatedata• Persistthedata(keepitaround)
• E.g.,bankingapplication– Query:WhatisAlice’sbalance?
– Update:Alicedeposits$100– Persist:Alicehopeshermoneyisstillthereaftera
poweroutage…
Soundseasy!
• Storedataintextfiles– Accountsseparatedbynewlines– Fieldsseparatedbycommas
• Query:whatisAlice’sbalance?
Account,Branch,Name,Balance
45,Claremont,Alice,200
67,Claremont,Bob,100000
78,Pasadena,Carl,987654
.
.
.
Abstractingdatamanagement
• Cancomeupwithtrickstooptimizea
particularquery/application
– Endupredoingthisworkfornewapps
RelationalDBMStotherescue
PhysicalIndependence
• Applicationsneednotknowhowdataisphysicallystructuredandstored
• Instead,havelogicaldatamodel• LeavetheimplementationdetailsandoptimizationtoDBMS
EdgarF.Codd
Turingaward,1981
[Thereshouldbe]aclear
boundarybetweenthelogicalandphysicalaspectsofdatabasemanagement
1
1
http://en.wikiquote.org/wiki/E._F._Codd
RelationalDBMStotherescue
• Relationaldatamodel:dataisstoredinrelations
• Example:Bankinginfoaccount branch name balance
45 Claremont Alice 200
67 Claremont Bob 100000
78 Pasadena Carl 987654
• Adeclarativequerylanguage– Specifywhatanswersaqueryshouldreturn,butnothowthequeryisexecuted
– E.g.,SQL,Datalog(subsetofProlog)
Query:whatisAlice’sbalance?
SELECT balance FROM Banking WHERE name = “Alice”;
RelationalModel:
LevelsofAbstraction
• Conceptual/Logicalschema
Students(sid:string,name:string,login:string,gpa:real)
Courses(cid:string,cname:string,credits:integer)
Enrolled(sid:string,cid:string,grade:string)
• Physicalschema
– Storetherelationsasunsortedfiles– CreateindexesonStudents.sidandCourses.cid
• Externalschema(“views”)
– vieweachcourse’senrollment
CourseInfo(cid:string,enrollmnt:integer)
Entitiesand
relationships!
CREATE VIEW CourseInfo AS SELECT cid, COUNT (*) as enrollmnt FROM Enrolled GROUP BY cid;
DBAdesigns
these!
Allowcustomized
dataaccess
Describesdatain
termsofdatamodel
Specifies
storagedetails
DataIndependence
• Logicaldataindependence
– Protectedfromchanges
inconceptualschema
• Physicaldataindependence
– Protectedfromchanges
inphysicalschemaPhysicalschema
Conceptualschema
View1 View2 Viewn
ModernDBMSFeatures
• Logicaldatamodel
– Wefocusonrelationalinthiscourse
• Maytouchonothers,e.g.,XML,Document
– Dataindependence!
• Declarativelanguage– Queries– Updates
• PersistenceButwait,there’smore…
ConcurrentAccess
• Bankingexample:ATMwithdrawalpseudocodegetbalance;
ifbalance>amount
withdrawamount;
newBalance=balance-amount;
writebalance=newBalance;
• AliceandBobshareanaccount.– AlicegoestooneATM,withdraws$100
– BobgoestoanotherATM,withdraws$50
• Initialbalance=$400• Finalbalance?
ExamplefromJunYang
ConcurrentAccess
Alicewithdraws$100
getbalance;
ifbalance>amount
withdrawamount;
newBalance=balance-amount;
writebalance=newBalance;
Bobwithdraws$50
getbalance;
ifbalance>amount
withdrawamount;
newBalance=balance-amount;
writebalance=newBalance;
$400
$400
$350
Finalbalance=$300!!
$300Whatcan
wedo?
ExamplefromJunYang
SystemFailures
• Bankingexample:balancetransferdecrementaccountXby$100
incrementaccountYby$100
• Whatifpowergoesoutafterfirstinstruction?
• DBMSbuffersandupdatessomedatainmemory
beforewritingtodisk
– whatifpowergoesoutbeforewritetodisk?
• Keepalogofupdates,undo/redouponrecovery
ExamplefromJunYang
ModernDBMSFeatures(cntd)
• Logicaldatamodel
• Declarativelanguage• Persistence
• Concurrentaccess• Faulttolerance• Performance!
– Lotsofqueries– Lotsofdata
SimplifiedRDBMSArchitecture
Datarecords
Diskmanagement
Buffermanagement
Accessmethods
Application
Queryoptimizer
Queryexecutor
Concernedwith
concurrency
controland
recovery
Declarative
Query
Queryresults
CourseOverview
• DesignprinciplesbehindDBMS!
• “Bottom-up”orderoftopicstoshowroleof
abstractionandalgorithmsforefficiency/optimization– Physicaldataorganization– RelationalalgebraandSQL– Queryevaluationandoptimization
– Transactions,concurrencycontrol,recovery– Databasedesign
CourseObjectives
• Provideasolidbackgroundindatabasemanagement
systemdesignprinciples
• Promoteunderstandingoftheseprinciplesthroughhands-
onexercisesimplementingtheinternalsofarelationaldatabasemanagementsystem
• Furtherdevelopstudents'abilitytoreasonaboutalgorithm
andsoftwaredesign,optimization,andtradeoffsgenerallyapplicableincomputerscience
Labs:SimpleDB
• Labs1-4:Implementkeyfeaturesofa(simplified)
DBMSinJava
– Files,Storage– RelationalOperators– QueryOptimizer
– LockingwithTransactions
• Lab5:databasedesign
Lab1:Gettingstarted“due”nextWednesday
GradeComponents(seesyllabus)
• Weeklyproblemsets 14% 70pts
• (5)Labs 40% 200pts
• Midterm 20% 100pts
• Final 20% 100pts
• Participation 6% 30pts
Administrivia
• Coursewebsite:
https://www.cs.hmc.edu/~beth/courses/cs133/current
– Syllabus,calendar,labdescriptions
• Textbook:DatabaseManagementSystems3rdEdition,byRamakrishnanandGehrke
• Piazzaforquestionsaboutlabs,problemsets,etc.:
piazza.com/hmc/fall2019/cs133/home
• Assignmentsubmission
– ProblemsetsàSakai
– LabassignmentsàGradescope
• Grutors
– IvyLiu
TheRelationalModel
• ManyRDBMSvendors,includingopen-source
– Oracle– MySQL
– PostgreSQL– SQLite– DB2– SQLServer– …
• We’lltouchonotherdatamodelsaswell
KeyConcepts:RelationalModel
• Database:collectionofrelations
• Relation:listofattributes
• Relationshavesetsoftuples
• Schema(metadata)
– Specificationofhowdataistobestructuredlogically
– Containsattributetypes– Definedatset-up
CID Name Dept
121 SoftwareDev CS
70 DataStructures CS
Courses
StudentsSID name login gpa
45Alice alicious 3.4
67Bob bobtastic 3.9
RelationalModel:Synonyms
Moreformal ……… Lessformal
Relation TableTuple Row RecordAttribute Column FieldDomain Type
StructuredQueryLanguage(SQL)
• Datadefinitionlanguage(DDL)– Definetheschema(create,change,deleterelations)
– Specifyconstraints,userpermissions
• Datamodificationlanguage(DML)
– Finddatathatmatchescriteria
– Add,remove,updatedata
– TheDBMSisresponsibleforefficientevaluation!
• Co-inventedbyDonChamberlin(HMC‘66)!
Photo:http://researcher.watson.ibm.com/
researcher/view.php?person=us-dchamber
ARelationInstance
• Aninstanceofarelationisitscontentsatagiventime
– cardinality:#tuples– arity:#attributes
StudentsSID Name Login SSN GPA
45 Alice alicious 000-00-0000 3.4
67 Bob bobtastic 000-00-0001 3.9
78 Carl carl 000-00-0010 2.5
SQL:CreatingRelations
• CreateStudentsrelation:
• DomaininfoistypeofIntegrityconstraint(IC)
– IC:aconditiononthedatabaseschema,restrictsdata
thatcanbestored
CREATE TABLE Students ( sid CHAR(20),
name CHAR(20), login CHAR(100), SSN CHAR(12), gpa FLOAT);
CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2));
CreateEnrolledrelation:
AddingandRemovingTuples
• Insertasingletuple
INSERT INTO Students (sid, name, login, SSN, gpa) VALUES (45, ‘Alice’, ‘alicious’, ‘000-00-0000, 3.4);
• Deletetuplesthatsatisfycondition(predicate)
DELETE FROM Students S WHERE S.name = ‘Alice’;
IntegrityConstraints:Keys
• Superkeyisasetoffield(s)that– Uniquelyidentifiesatuple– Candidatekey:doessominimally
– Primarykey:achosencandidatekey
Students
SID Name Login SSN GPA
45Alice alicious 000-00-0000 3.4
67Bob bobtastic 000-00-0001 3.9
78Carl carl 000-00-0010 2.5
Primarykey
IntegrityConstraints:ForeignKeys
• Referentialintegrity,logical“pointer”– Setoffieldsinonerelationrefertoprimarykeyof
another
SID CID Grade
45CS133 A
45CS121 B
78CS5 A
Primarykey Foreignkey
SID Name Login SSN GPA
45Alice alicious 000-00-0000 3.4
67Bob bobtastic 000-00-0001 3.9
78Carl carl 000-00-0010 2.5
Students Enrolled
INSERT INTO Enrolled(sid,cid,grade) VALUES (43,CS133,D); ?
DefiningKeyConstraints
• Specifiedinschemadefinition
CREATE TABLE Enrolled (
sid CHAR(20), cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students
);
CREATE TABLE Students ( sid CHAR(20),
name CHAR(20),
login CHAR(10),
SSN CHAR(20),
gpa FLOAT,
PRIMARY KEY(sid),
UNIQUE (SSN) );
• Possiblymanycandidatekeys(specifiedusingUNIQUE),oneof
whichischosenastheprimarykey.
• Keysmustbeusedcarefully!
• Example:
“Foragivenstudentandcourse,thereisasinglegrade.”
CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))
CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))
vs.
PrimaryandCandidateKeysinSQL
“Studentscantakeonlyonecourse,andnotwostudentsinacoursereceivethesamegrade.”
SQL:SingleRelationQueries
SELECT name FROM StudentsWHERE gpa > 3.7;
name
Bob
SELECT * FROM Students SWHERE S.gpa > 3.7;
sid name login gpa
67Bob bobtastic 3.9
SID name login gpa45 Alice alicious 3.4
67 Bob bobtastic 3.9
78 Carl carl 2.5
Students
QueryExecution:Teaser
Queryoptimizertransformsadeclarativequery
intoapipelineofdataflowoperatorscalledaqueryexecutionplanSELECT name FROM StudentsWHERE gpa > 3.7;
Students
Filter
(gpa>3.7)
Project
(name)
Iterators!!
SQLiteDemo
Alsosee: “Resources”oncoursewebsiteandwww.sqlite.org
Thedatabaseisinthefileyouspecify.
Thefileiscreatedifitdoesn’texist.
SQLstatementsendwithasemicolon.
Capitalizationlooksnice,butnotrequired.
Thesetwosettingsformodeandheadermakequeryresultseasiertoread.