CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

37
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009

Transcript of CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Page 1: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

CS 405G: Introduction to Database Systems

Instructor: Jinze Liu

Fall 2009

Page 2: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 2

Review A data model is

a group of concepts for describing data.

What are the two terms used by ER model to describe a miniworld? Entity Relationship

What makes a good database design1010111101

Student (sid: string, name: string, login: string, age: integer,

gpa:real)

204/10/23

Page 3: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 3

Topics Next More case study Conversion of ER models to Schemas

304/10/23

Page 4: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 4

ER Case study

04/10/234

Design a database representing cities, counties, and states For states, record name and capital (city) For counties, record name, area, and location (state) For cities, record name, population, and location

(county and state) Assume the following:

Names of states are unique Names of counties are only unique within a state Names of cities are only unique within a county A city is always located in a single county A county is always located in a single state

Page 5: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 5

Case study : first design County area information is repeated for

every city in the county Redundancy is bad. What else?

State capital should really be a city Should “reference” entities through explicit

relationships

Cities In Statesname

capital

name

population

county_area

county_name

504/10/23

Page 6: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 6

Case study : second design Technically, nothing in this design could

prevent a city in state X from being the capital of another state Y, but oh well…

Cities

IsCapitalOf

name

population

Countiesname

areaname

In

In States

604/10/23

Page 7: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 7

Database Design

704/10/23

Page 8: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

A Relation is a Table

8

name manf Winterbrew Pete’s Bud Lite Anheuser-

Busch Beers

Attributes(columnheaders)

Tuples(rows)

Page 9: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Schemas

9

Relation schema = relation name + attributes, in order (+ types of attributes). Example: Beers(name, manf) or Beers(name:

string, manf: string)

Database = collection of relations. Database schema = set of all relation

schemas in the database.

Page 10: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Why Relations?

10

Very simple model. Often matches how we think about data. Abstract model that underlies SQL, the most

important database language today. But SQL uses bags, while the relational model is a

set-based model.

Page 11: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

From E/R Diagrams to Relations

11

Entity sets become relations with the same set of attributes.

Relationships become relations whose attributes are only: The keys of the connected entity sets. Attributes of the relationship itself.

Page 12: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Entity Set -> Relation

12

Relation: Beers(name, manf)

Beers

name manf

Page 13: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Relationship -> Relation

13

Drinkers BeersLikes

Likes(drinker, beer)Favorite

Favorite(drinker, beer)

Married

husband

wife

Married(husband, wife)

name addr name manf

Buddies

1 2

Buddies(name1, name2)

Page 14: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Combining Relations

14

It is OK to combine the relation for an entity-set E with the relation R for a many-one relationship from E to another entity set.

Example: Drinkers(name, addr) and Favorite(drinker, beer) combine to make Drinker1(name, addr, favBeer).

Page 15: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Risk with Many-Many Relationships

15

Combining Drinkers with Likes would be a mistake. It leads to redundancy, as:

name addr beerSally 123 Maple BudSally 123 Maple Miller

Redundancy

Page 16: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Handling Weak Entity Sets

16

Relation for a weak entity set must include attributes for its complete key (including those belonging to other entity sets), as well as its own, nonkey attributes.

A supporting (double-diamond) relationship is redundant and yields no relation.

Page 17: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Example

17

Logins HostsAt

name name

time

Page 18: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Example

18

Logins HostsAt

name name

Hosts(hostName)Logins(loginName, hostName, time)At(loginName, hostName, hostName2)

Must be the same

time

At becomes part ofLogins

Page 19: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

19

A (Slightly) Formal Definition A database is a collection of relations (or

tables) Each relation is identified by a name and a

list of attributes (or columns) Each attribute has a name and a domain (or

type) Set-valued attributes not allowed

19

Page 20: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 20

Schema versus instance Schema (metadata)

Specification of how data is to be structured logically

Defined at set-up Rarely changes

Instance Content Changes rapidly, but always conforms to the

schema Compare to type and objects of type in a

programming language

2004/10/23

Page 21: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 21

Example Schema

Student (SID integer, name string, age integer, GPA float)

Course (CID string, title string) Enroll (SID integer, CID integer)

Instance { h142, Bart, 10, 2.3i, h123, Milhouse, 10,

3.1i, ...} { hCPS116, Intro. to Database Systemsi, ...} { h142, CPS116i, h142, CPS114i, ...}

2104/10/23

Page 22: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 22

Relational Integrity Constraints Constraints are conditions that must hold on

all valid relation instances. There are four main types of constraints:

1. Domain constraints1. The value of an attribute must come from its

domain

2. Key constraints3. Entity integrity constraints4. Referential integrity constraints

2204/10/23

Page 23: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Primary Key Constraints A set of fields is a candidate key for a relation

if :1. No two distinct tuples can have same values in

all key fields, and2. This is not true for any subset of the key. Part 2 false? A superkey. If there’s >1 key for a relation, one of the keys is

chosen (by DBA) to be the primary key. E.g., given a schema Student(sid: string,

name: string, gpa: float) we have: sid is a key for Students. (What about name?)

The set {sid, gpa} is a superkey.

2304/10/23 Jinze Liu @ University of Kentucky

Page 24: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 24

Key Example CAR (licence_num: string, Engine_serial_num:

string, make: string, model: string, year: integer) What is the candidate key(s) Which one you may use as a primary key What are the super keys

2404/10/23

Page 25: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 25

Entity Integrity Entity Integrity: The primary key attributes PK

of each relation schema R in S cannot have null values in any tuple of r(R). Other attributes of R may be similarly

constrained to disallow null values, even though they are not members of the primary key.

2504/10/23

Page 26: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Foreign Keys, Referential Integrity Foreign key : Set of fields in one relation that

is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’.

E.g. sid is a foreign key referring to Students: Student(sid: string, name: string, gpa: float) Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced,

referential integrity is achieved, i.e., no dangling references.

Can you name a data model w/o referential integrity? Links in HTML!

2604/10/23 Jinze Liu @ University of Kentucky

Page 27: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Foreign Keys Only students listed in the Students relation

should be allowed to enroll for courses.

sid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade 53666 Carnatic101 C 53666 Reggae203 B 53650 Topology112 A 53666 History105 B

EnrolledStudents

Or, use NULL as the value for the foreign key in the referencing tuple when the referenced tuple does not exist

2704/10/23 Jinze Liu @ University of Kentucky

Page 28: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 28

In-Class Exercise

(Taken from Exercise 5.16)

Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course:

STUDENT(SSN, Name, Major, Bdate)

COURSE(Course#, Cname, Dept)

ENROLL(SSN, Course#, Quarter, Grade)

BOOK_ADOPTION(Course#, Quarter, Book_ISBN)

TEXT(Book_ISBN, Book_Title, Publisher, Author)

Draw a relational schema diagram specifying the foreign keys for this schema.

2804/10/23

Page 29: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky29

In-Class Exercise(Taken from Exercise 5.16)

Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course:

STUDENT(SSN, Name, Major, Bdate)

COURSE(Course#, Cname, Dept)

ENROLL(SSN, Course#, Quarter, Grade)

BOOK_ADOPTION(Course#, Quarter, Book_ISBN)

TEXT(Book_ISBN, Book_Title, Publisher, Author)

Draw a relational schema diagram specifying the foreign keys for this schema.

Page 30: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23 30

Other Types of Constraints Semantic Integrity Constraints:

based on application semantics and cannot be expressed by the model per se

e.g., “the max. no. of hours per employee for all projects he or she works on is 56 hrs per week”

A constraint specification language may have to be used to express these

SQL-99 allows triggers and ASSERTIONS to allow for some of these

3004/10/23

Page 31: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky31

Update Operations on Relations

Update operations INSERT a tuple. DELETE a tuple. MODIFY a tuple.

Constraints should not be violated in updates

Page 32: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky32

Example We have the following relational schemas

Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character)

We have the following sequence of database update operations. (assume all tables are empty before we apply any operations)

INSERT<‘1234’, ‘John Smith’, ‘3.5> into Student

sid name gpa

1234 John Smith 3.5

Page 33: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky33

Example (Cont.) INSERT<‘647’,

‘EECS’> into Courses

INSERT<‘1234’, ‘647’, ‘B’> into Enrolled

UPDATE the grade in the Enrolled tuple with sid = 1234 and cid = 647 to ‘A’.

DELETE the Enrolled tuple with sid 1234 and cid 647

sid name gpa

1234 John Smith 3.5

cid department

647 EECS

sid cid grade

1234 647 B

sid cid grade

1234 647 A

sid cid grade

Page 34: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky34

Exercise INSERT<‘108’,

‘MATH’> into Courses

INSERT<‘1234’, ‘108’, ‘B’> into Enrolled

INSERT<‘1123’, ‘Mary Carter’, ‘3.8’> into Student

sid name gpa

1234 John Smith 3.5

cid department

647 EECS

108 MATH

sid cid grade

1234 108 B

cid department

647 EECS

sid cid grade

sid name gpa

1234 John Smith 3.5

1123 Mary Carter 3.8

Page 35: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky35

Exercise (cont.) A little bit tricky INSERT<‘1125’,

‘Bob Lee’, ‘good’> into Student Fail due to domain

constraint INSERT<‘1123’,

NULL, ‘B’> into Enrolled Fail due to entity

integrity INSERT

<‘1233’,’647’, ‘A’> into Enrolled Failed due to

referential integrity

sid name gpa

1234 John Smith 3.5

1123 Mary Carter 3.8

cid department

647 EECS

108 MATH

sid cid grade

1234 108 B

Page 36: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky36

Exercise (cont.) A more tricky one UPDATE the cid in

the tuple from Course where cid = 108 to 109

sid name gpa

1234 John Smith 3.5

1123 Mary Carter 3.8

cid department

647 EECS

108 MATH

sid cid grade

1234 108 B

cid department

647 EECS

109 MATH

sid cid grade

1234 109 B

Page 37: CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

04/10/23Jinze Liu @ University of Kentucky37

Update Operations on Relations

In case of integrity violation, several actions can be taken: Cancel the operation that causes the violation

(REJECT option) Perform the operation but inform the user of

the violation Trigger additional updates so the violation is

corrected (CASCADE option, SET NULL option) Execute a user-specified error-correction

routine