1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…
-
Upload
jewel-sharp -
Category
Documents
-
view
221 -
download
0
Transcript of 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…
1.1
CAS CS 460/660CAS CS 460/660Introduction to Database SystemsIntroduction to Database Systems
Relational ModelRelational Modeland more…and more…
1.2
The Structure SpectrumThe Structure Spectrum
Structured (schema-
first)
Relational DatabaseFormatted Messages
Semi-Structured
(schema-later)
DocumentsXML
Tagged Text/Media
Unstructured (schema-
never)
Plain Text
Media
1.3
The Relational ModelThe Relational Model
The Relational Model is UbiquitousMySQL, PostgreSQL, Oracle, DB2, SQLServer, …l Foundational work done at
IBM Santa Teresa Labs (now IBM Almaden in SJ) – “System R” UC Berkeley CS – the “Ingres” System
Note: some Legacy systems use older models e.g., IBM’s IMS
Object-oriented concepts have been merged in Early work: POSTGRES research project at Berkeley Informix, IBM DB2, Oracle 8i
As has support for XML (semi-structured data)
1.4
Relational ModelRelational Model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.
Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.
1.5
Relational Database: DefinitionsRelational Database: Definitions
Relational database: a set of relations
Relation: made up of 2 parts:
Schema : specifies name of relation, plus name and type of each column
Students(sid: string, name: string, login: string, age: integer, gpa: real)
Instance : the actual data at a given time
#rows = cardinality #fields = degree / arity
1.7
Ex: Instance of Students RelationEx: Instance of Students Relation
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
• Cardinality = 3, arity = 5 , all rows distinct
• Do all values in each column of a relation instance have to be unique?
1.8
SQL - A language for Relational DBsSQL - A language for Relational DBs
Say: “ess-cue-ell” or “sequel” But spelled “SQL”
Data Definition Language (DDL) create, modify, delete relations specify constraints administer users, security, etc.
Data Manipulation Language (DML) Specify queries to find tuples that satisfy criteria add, modify, remove tuples
The DBMS is responsible for efficient evaluation.
1.9
The SQL Query LanguageThe SQL Query Language
The most widely used relational query language.
Originally IBM, then ANSI in 1986
Current standard is SQL-2011 2008 added x-query stuff, new triggers,…
2003 was last major update: XML, window functions, sequences, auto-generated IDs. Not fully supported yet
SQL-1999 Introduced “Object-Relational” concepts. Also not fully supported yet.
SQL92 is a basic subset Most systems support at least this
PostgreSQL has some “unique” aspects (as do most systems).
SQL is not synonymous with Microsoft’s “SQL Server”
1.10
Creating Relations in SQLCreating Relations in SQL
Creates the Students relation. Note: the type (domain) of each field is specified, and enforced by
the DBMS whenever tuples are added or modified.
CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT)
1.11
Table Creation (continued)Table Creation (continued) Another example: the Enrolled table holds information about
courses students take.
CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))
1.12
Adding and Deleting TuplesAdding and Deleting Tuples Can insert a single tuple using:
INSERT INTO Students (sid, name, login, age, gpa) VALUES ('53688', 'Smith', 'smith@ee', 18, 3.2)
• Can delete all tuples satisfying some condition (e.g., name = Smith):
DELETE FROM Students S WHERE S.name = 'Smith'
Powerful variants of these commands are available; more later!
1.13
KeysKeys Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC)
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B
Enrolled Students
PRIMARY KeyFOREIGN Key
1.14
Primary KeysPrimary Keys
A set of fields is a superkey if: No two distinct tuples can have same values in all key fields
A set of fields is a key for a relation if : It is a superkey
No subset of the fields is a superkey
what if >1 key for a relation? One of the keys is chosen (by DBA) to be the primary key.
Other keys are called candidate keys.
E.g. sid is a key for Students.
What about name?
The set {sid, gpa} is a superkey.
1.15
Primary and Candidate Keys in SQLPrimary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE),
one of which is chosen as the primary key.
• Keys must be used carefully!• “For a given student and course, there is a single
grade.”
“Students can take only one course, and no two students in a course receive the same grade.”
CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))
CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))
vs.
1.16
Foreign Keys, Referential IntegrityForeign Keys, Referential Integrity
Foreign key: a “logical pointer” Set of fields in a tuple in one relation
that `refer’ to a tuple in another relation. Reference to primary key of the other relation.
All foreign key constraints enforced? referential integrity! i.e., no dangling references.
1.17
Foreign Keys in SQLForeign Keys in SQL E.g. Only students listed in the Students relation should be allowed
to enroll for courses. sid is a foreign key referring to Students:
CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students);
sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B
Enrolled
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
Students
11111 English102 A
1.18
Next UpNext Up
We’ll talk a bit about the SQL DML
Then we’ll start describing the DBMS from storage on up
1.19
The SQL DMLThe SQL DML
Single-table queries are straightforward.
To find records for all 18 year old students with gpa’s above 2.0, we can write:
SELECT * FROM Students S WHERE S.age=18
AND S.gpa > 2.0
To get just names and logins, replace the first line:SELECT S.name, S.login
1.20
Basic SQL QueriesBasic SQL Queries
SELECT [DISTINCT] target-listFROM relation-listWHERE qualification• relation-list : A list of relation names
– possibly with a range-variable after each name
• target-list : A list of attributes of tables in relation-list
• qualification : Comparisons combined using AND, OR and NOT.– Comparisons are Attr op const or Attr1 op
Attr2, where op is one of =≠<>≤≥• DISTINCT: (optional) indicates that the answer should
have no duplicates.
– In SQL SELECT, the default is that duplicates are not eliminated! (Result is called a “multiset”)
1.21
Querying Multiple RelationsQuerying Multiple Relations
Can specify a join over two tables as follows:SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'
result = S.name E.cid Jones History105
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
Note: obviously no referential integrity constraints have been used here.
1.22
Basic Query SemanticsBasic Query Semantics
The Semantics of a SQL query are defined in terms of the following conceptual evaluation strategy:
1. do FROM clause: compute cross-product of tables (e.g., Students and Enrolled).
2. do WHERE clause: Check conditions, discard tuples that fail.
3. do SELECT clause: Delete unwanted fields.
4. If DISTINCT specified, eliminate duplicate rows.
Probably the least efficient way to compute a query! • A query optimizer will find more efficient strategies to get the same answer.
1.23
Step 1 – Cross ProductStep 1 – Cross Product
SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B
1.24
Step 2) Discard tuples that fail predicateStep 2) Discard tuples that fail predicate
S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B
SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'
1.25
Step 3) Discard Unwanted ColumnsStep 3) Discard Unwanted Columns
S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B
SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'
1.26
Aggregate OperatorsAggregate OperatorsFor calculation and analytics COUNT (*)
COUNT ([DISTINCT] A)SUM ([DISTINCT] A)AVG (A)MAX (A)MIN (A)
SELECT AVG (S.gpa)FROM Students SWHERE S.age=18;
SELECT COUNT (*)FROM Students;
SELECT COUNT (DISTINCT S.age)FROM Students SWHERE S.name=‘Bob’;
1.27
GROUP BY and HAVINGGROUP BY and HAVING
Sometimes, we want to apply aggs to each of several groups of tuples.
This query computes the average gpa per major (assume students have a “major” attribute)
If you want to exclude “small” majors, use Having:
SELECT S.major, AVG (S.gpa) as AvgGPAFROM Students SGROUP BY S.major ;
SELECT S.major, AVG (S.gpa) as AvgGPAFROM Students SGROUP BY S.majorHAVING COUNT (*) > 10 ;
1.28
(Slightly) Less Basic SQL Queries(Slightly) Less Basic SQL Queries
SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification
1.29
Conceptual EvaluationConceptual Evaluation
The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list.
One answer tuple is generated per qualifying group.
1.30
Conceptual Evaluation (cont.)Conceptual Evaluation (cont.)
The group-qualification is then applied to eliminate some groups.
Expressions in group-qualification must have a single value per group!
That is, attributes in group-qualification must be arguments of an aggregate op or must also appear in the grouping-list.
One answer tuple is generated per qualifying group.
1.31
Okay: Let’s start from the Okay: Let’s start from the bottom up…bottom up…
Query Optimizationand Execution
Relational Operators
Access Methods
Buffer Management
Disk Space Management
Student Records stored on disk
Database app
These layersmust considerconcurrencycontrol andrecovery