--The SQL Query Language DML--1 The SQL Query Language DML The SQL Query Language DML (SELECT)
Principles of Database Systems · Overview - detailed - SQL Fundamental constructs and concepts...
Transcript of Principles of Database Systems · Overview - detailed - SQL Fundamental constructs and concepts...
Principles of Database Systems
V. Megalooikonomou
Relational Model – SQL Part I
(based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos)
General Overview - rel. model
Formal query languagesrel algebra and calculi
Commercial query languagesSQL (combination of rel algebra and calculus)QBE, (QUEL)
Overview - detailed - SQLFundamental constructs and concepts
check users’ guide for particular implementationDML
select, from, where, renamingset operationsorderingaggregate functionsnested subqueries
other parts: DDL, view definition, embedded SQL, integrity, authorization, etc
DML
General formselect a1, a2, … anfrom r1, r2, … rmwhere P[order by ….][group by …][having …]
Reminder: our Mini-U db
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
TAKESSSN c-id grade
123 cis331 A234 cis331 B
CLASSc-id c-name unitscis331 d.b. 2cis321 o.s. 2
DML – e.g.:
find the ssn(s) of everybody called “smith”
select ssnfrom studentwhere name=“smith”
DML - observation
General formselect a1, a2, … anfrom r1, r2, … rmwhere P
equivalent rel. algebra query?
DML - observation
General formselect a1, a2, … anfrom r1, r2, … rmwhere P
))...21((,...2,1 rmrrPanaa ×××σπ
DML - observation
General formselect distinct a1, a2, … anfrom r1, r2, … rmwhere P
))...21((,...2,1 rmrrPanaa ×××σπ
select clause
select [distinct | all ] namefrom studentwhere address=“main”
where clause
find ssn(s) of all “smith”s on “main”select ssnfrom studentwhere address=“main” and
name = “smith”
where clause
boolean operators (and, or, not, …)comparison operators (<, >, =, …)and more…
What about strings?
find student ssns who live on “main” (stor str or street – i.e., “main st” or “main str” …)
What about strings?
find student ssns who live on “main” (stor str or street) select ssnfrom studentwhere address like “main%”
%: variable-length don’t care_: single-character don’t care
from clause
find names of people taking cis351
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
CLASSc-id c-name unitscis331 d.b. 2cis321 o.s. 2
TAKESSSN c-id grade
123 cis331 A234 cis331 B
from clause
find names of people taking cis351select namefrom student, takeswhere ???
from clause
find names of people taking cis351select namefrom student, takeswhere student.ssn = takes.ssn and
takes.c-id = “cis351”
renaming - tuple variables
find names of people taking cis351select namefrom ourVeryOwnStudent,
studentTakingClasseswhere ourVeryOwnStudent.ssn =
studentTakingClasses.ssnand studentTakingClasses.c-id = “cis351”
renaming - tuple variables
find names of people taking cis351select namefrom ourVeryOwnStudent as S,
studentTakingClasses as Twhere S.ssn =T.ssn
and T.c-id = “cis351”
renaming - self-join
self -joins: find Tom’s grandparent(s)
PCp-id c-idMary TomPeter MaryJohn Tom
PCp-id c-idMary TomPeter MaryJohn Tom
renaming - self-join
find grandparents of “Tom” (PC(p-id, c-id))
select gp.p-idfrom PC as gp, PCwhere gp.c-id= PC.p-id
and PC.c-id = “Tom”
renaming - theta join
find course names with more units than cis351
select c1.c-namefrom class as c1, class as c2where c1.units > c2.units
and c2.c-id = “cis351”
find course names with more units than cis351select c1.namefrom class as c1, class as c2where c1.units > c2.units
and c2.c-id = “cis351”
])}[2][][1][2
351][1(21|{
nameccnamectunitscunitsc
cisidccCLASScCLASSct
−=−∧>
∧=−∈∃∈∃
Overview - detailed - SQL
DMLselect, from, whereset operationsorderingaggregate functionsnested subqueries
other parts: DDL, view definition, embedded SQL, integrity, authorization, etc
set operations
find ssn of people taking both cis351 and cis331
TAKESSSN c-id grade
123 cis331 A234 cis331 B
set operations
find ssn of people taking both cis351 and cis331
select ssnfrom takeswhere c-id=“cis351” and
c-id=“cis331”?
set operations
find ssn of people taking both cis351 and cis331
select ssnfrom takeswhere c-id=“cis351” and
c-id=“cis331”
set operationsfind ssn of people taking both cis351 and cis331
(select ssn from takes where c-id=“cis351” )intersect(select ssn from takes where c-id=“cis331” )
other ops: union , except
Overview - detailed - SQL
DMLselect, from, whereset operationsorderingaggregate functionsnested subqueries
other parts: DDL, embedded SQL, auth etc
Ordering
find student records, sorted in name orderselect *from studentwhere
Ordering
find student records, sorted in name orderselect *from studentorder by name asc
asc is the default
Ordering
find student records, sorted in name order; break ties by reverse ssnselect *from studentorder by name, ssn desc
Overview - detailed - SQL
DMLselect, from, whereset operationsorderingaggregate functionsnested subqueries
other parts: DDL, embedded SQL, auth etc
Aggregate functions
find avg grade, across all studentsselect ??from takes
SSN c-id grade123 cis331 4234 cis331 3
Aggregate functions
find avg grade, across all studentsselect avg(grade)from takes
result: a single numberWhich other functions?
SSN c-id grade123 cis331 4234 cis331 3
Aggregate functions
A: sum, count, min, max (std)
Aggregate functionsfind total number of enrollments
select count(*)from takes
SSN c-id grade123 cis331 4234 cis331 3
Aggregate functions
find total number of students in cis331select count(*)from takeswhere c-id=“cis331”
SSN c-id grade123 cis331 4234 cis331 3
Aggregate functions
find total number of students in each courseselect count(*)from takeswhere ???
SSN c-id grade123 cis331 4234 cis331 3
Aggregate functions
find total number of students in each courseselect c-id, count(*)from takesgroup by c-id
SSN c-id grade123 cis331 4234 cis331 3
c-id countcis331 2
Aggregate functions
find total number of students in each courseselect c-id, count(*)from takesgroup by c-idorder by c-id
SSN c-id grade123 cis331 4234 cis331 3
c-id countcis331 2
Aggregate functions
find total number of students in each course, and sort by count, decreasingselect c-id, count(*) as popfrom takesgroup by c-idorder by pop desc
SSN c-id grade123 cis331 4234 cis331 3
c-id popcis331 2
Aggregate functions- ‘having’
find students with GPA > 3.0
SSN c-id grade123 cis331 4234 cis331 3
Aggregate functions- ‘having’
find students with GPA > 3.0select ???, avg(grade)from takesgroup by ??? SSN c-id grade
123 cis331 4234 cis331 3
Aggregate functions- ‘having’
find students with GPA > 3.0select ssn, avg(grade)from takesgroup by ssn???
SSN c-id grade123 cis331 4234 cis331 3
SSN avg(grade)123 4234 3
Aggregate functions- ‘having’
find students with GPA > 3.0select ssn, avg(grade)from takesgroup by ssnhaving avg(grade)>3.0
‘having’ <-> ‘where’ for groups
SSN c-id grade123 cis331 4234 cis331 3
SSN avg(grade)123 4234 3
Aggregate functions- ‘having’
find students and GPA,for students with > 5 courses
select ssn, avg(grade)from takesgroup by ssnhaving count(*) > 5
SSN c-id grade123 cis331 4234 cis331 3
SSN avg(grade)123 4234 3
Overview - detailed - SQL
DMLselect, from, where, renamingset operationsorderingaggregate functionsnested subqueries
other parts: DDL, view definition, embedded SQL, integrity, authorization, etc
DML
General formselect a1, a2, … anfrom r1, r2, … rmwhere P[order by ….][group by …][having …]
Reminder: our Mini-U db
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
CLASSc-id c-name unitscis331 d.b. 2cis321 o.s. 2
TAKESSSN c-id grade
123 cis331 A234 cis331 B
DML - nested subqueries
find names of students of cis351select namefrom studentwhere ...
“ssn in the set of people that take cis351”
DML - nested subqueries
find names of students of cis351select namefrom studentwhere ………...
select ssnfrom takeswhere c-id =“cis351”
DML - nested subqueries
find names of students of cis351select namefrom studentwhere ssn in (
select ssnfrom takeswhere c-id =“cis351”)
DML - nested subqueries
‘in’ compares a value with a set of values‘in’ can be combined with other boolean opsit is redundant (but user friendly!):select namefrom student …..where c-id = “cis351” ….
DML - nested subqueries
‘in’ compares a value with a set of values‘in’ can be combined with other boolean opsit is redundant (but user friendly!):select namefrom student, takeswhere c-id = “cis351” and
student.ssn=takes.ssn
DML - nested subqueries
find names of students taking cis351 and living on “main str”
select namefrom studentwhere address=“main str” and ssn in
(select ssn from takes where c-id =“cis351”)
DML - nested subqueries
‘in’ compares a value with a set of valuesother operators like ‘in’ ??
DML - nested subqueries
find student record with highest ssnselect *from studentwhere ssn
is greater than every other ssn
DML - nested subqueries
find student record with highest ssnselect *from studentwhere ssn greater than every
select ssn from student
DML - nested subqueries
find student record with highest ssnselect *from studentwhere ssn > all (
select ssn from student)almost correct
DML - nested subqueries
find student record with highest ssnselect *from studentwhere ssn >= all (
select ssn from student)
DML - nested subqueries
find student record with highest ssn -without nested subqueries?
select S1.ssn, S1.name, S1.addressfrom student as S1, student as S2where S1.ssn > S2.ssn
is not the answer (what does it give?)
DML - nested subqueries
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
S1 S2
S1. ssn S2.ssn ….123 123 …234 123 …123 234234 234
S1 x S2
S1.ssn>S2.ssn
CLASSc-id c-name unitscis331 d.b. 2cis321 o.s. 2
DML - nested subqueries
select S1.ssn, S1.name, S1.addressfrom student as S1, student as S2where S1.ssn > S2.ssn
gives all but the smallest ssn -aha!
DML - nested subqueries
find student record with highest ssn -without nested subqueries?
select S1.ssn, S1.name, S1.addressfrom student as S1, student as S2where S1.ssn < S2.ssn
gives all but the highest - therefore….
DML - nested subqueriesfind student record with highest ssn - without
nested subqueries?(select * from student) except(select S1.ssn, S1.name, S1.addressfrom student as S1, student as S2where S1.ssn < S2.ssn)
DML - nested subqueries
(select * from student) except(select S1.ssn, S1.name, S1.addressfrom student as S1, student as S2where S1.ssn < S2.ssn)
select *from studentwhere ssn >= all (select ssn from student)
DML - nested subqueries
Drill: Even more readable than select * from studentwhere ssn >= all (select ssn fromstudent)
DML - nested subqueries
Drill: Even more readable than select * from studentwhere ssn >= all (select ssn fromstudent)
select * from studentwhere ssn in(select max(ssn) from student)
DML - nested subqueries
Drill: find the ssn of the student with the highest GPA
STUDENTSsn Name Address
123 smith main str234 jones forbes ave
CLASSc-id c-name unitscis331 d.b. 2cis321 o.s. 2
grade
234 cis331 B
TAKESSSN c-id
123 cis331 A
DML - nested subqueries
Drill: find the ssn and GPA of the student with the highest GPA
select ssn, avg(grade) from takeswhere
DML - nested subqueries
Drill: find the ssn and GPA of the student with the highest GPA
select ssn, avg(grade) from takesgroup by ssnhaving avg( grade) …...
greater than every other GPA on file
DML - nested subqueries
Drill: find the ssn and GPA of the student with the highest GPA
select ssn, avg(grade) from takesgroup by ssnhaving avg( grade) >= all
( select avg( grade )from student group by ssn )
} all GPAs
DML - nested subqueries
‘in’ and ‘>= all’ compares a value with a set of valuesother operators like these?
DML - nested subqueries
<all(), <>all() ...‘<>all’ is identical to ‘not in’>some(), >= some () ...‘= some()’ is identical to ‘in’exists
DML - nested subqueries
Drill for ‘exists’: find all courses that nobody enrolled in
select c-id from class ….with no tuples in ‘takes’
CLASSc-id c-name unitscis331 d.b. 2cis321 o.s. 2
TAKESSSN c-id grade
123 cis331 A234 cis331 B
DML - nested subqueries
Drill for ‘exists’: find all courses that nobody enrolled in
select c-id from class where not exists
(select * from takeswhere class.c-id = takes.c-id)
DML - derived relations
find the ssn with the highest GPA
select ssn, avg(grade) from takesgroup by ssnhaving avg( grade) >= all
( select avg( grade )from student group by ssn )
DML - derived relations
find the ssn with the highest GPAQuery would be easier, if we had a table
like: helpfulTable (ssn, gpa):
then what?
HelpfulTableSsn Gpa123 3.5678 3.3
DML - derived relations
select ssn, gpafrom helpfulTablewhere gpa in (select max(gpa)
from helpfulTable)
HelpfulTableSsn Gpa123 3.5678 3.3
DML - derived relations
find the ssn with the highest GPA -Query for helpfulTable (ssn, gpa)?
DML - derived relations
find the ssn with the highest GPAQuery for helpfulTable (ssn, gpa)?
select ssn, avg(grade)from takesgroup by ssn
DML - derived relations
find the ssn with the highest GPA
helpfulTable(ssn,gpa)
select ssn, avg(grade)from takesgroup by ssn
select ssn, gpafrom helpfulTablewhere gpa = (select max(gpa)
from helpfulTable)
DML - derived relations
find the ssn with the highest GPAselect ssn, gpafrom (select ssn, avg(grade)
from takesgroup by ssn)as helpfulTable(ssn, gpa)
where gpa in (select max(gpa) from helpfulTable)
Views
find the ssn with the highest GPA -we can create a permanent, virtual table:
create view helpfulTable(ssn, gpa) asselect ssn, avg(grade)from takesgroup by ssn
Views
views are recorded in the schema, for ever (i.e., until ‘drop view…’)typically, they take little disk space, because they are computed on the fly(but: materialized views…)
Overview of a DBMSDBAcasual
user
DML parser
buffer mgr
trans. mgr
DMLprecomp.
DDL parser
catalog
create view..
Overview - detailed - SQL
DMLselect, from, where, renamingset operationsorderingaggregate functionsnested subqueries
other parts: DDL, embedded SQL, auth etc
Overview - detailed - SQL
DMLother parts:
modificationsjoinsDDLembedded SQLauthorization