Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P....

137
Matthew P. Johnson, OCL1, CISDD CUNY , F2004 1 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004

Transcript of Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P....

Page 1: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

1

ORCL1 Oracle 8i:SQL & PL/SQLSession #2

Matthew P. Johnson

CISDD, CUNY

Fall, 2004

Page 2: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

2

Agenda Last time: E/R models, some design issues This time: More design “carving at the joints”

Redundancy Whether an element should be an attribute or entity

set Replacing a relationships with entity sets

Constraints Identifying & specifying key attributes to an entity set Recognizing other types of single-valued constraints Representing referential integrity constraints Identifying & representing general constraints

Weak entity sets

Page 3: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

3

Design Principles Faithfulness Avoiding redundancy Simplicity Choice of relationships Picking elements

Page 4: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

4

Avoiding redundancy Say everything exactly once

Minimize database storage requirements More important: prevent possible update errors

simplest but not only e.g.: modify data one place but not the other – more later

Example: Spot the redundancy

Studios MoviesOwn

StudioName

Name

Length

Name

Address

Redundancy: Movies “knows” the studio two ways

Phone

Page 5: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

5

Spot more redundancy

Different redundancy: studio info listed for every movie!

Movies

StudioName

Name

Length

SAddress

SPhone

Name Length Studio SAddress SPhonePulp Fiction … Miramax NYC 212-…Sylvia … Miramax NYC 212-…Jay & Sil. Bob … Miramax NYC 212-…

Page 6: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

6

Don’t add relships that are implied

Students Courses

TAs

Enrolls

TA-of

Assist

Suppose each course again has <=1 TA

Q: Is the following good design?

A: If TAs other than the course’s TA can help students, then yes;

if not, then no: we can connect Students and TAs by going through Courses; redundant!

Page 7: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

7

Correct E/R models may contain loops

Person plays multiple roles: employee of company buyer of product

price

address name ssn

Person

buys

makes

employs

CompanyProduct

name category

stockprice

name

Page 8: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

8

More design

Repeating TA names & IDs – redundant TA is not TAing any course now lose TA’s data! TA should get its own ES

Students CoursesEnrolls

Q: What’s wrong with this design?

A:

TA-Name TA-ID

TA-Email

Course-ID CName

Page 9: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

9

Opposite problem: Entity or attribute? Some E/Rs improved by removing entities Can convert Entity E attributes of F

1. R:FE is many-one one-one counts because special case

2. Attributes for E are independent of each other knowing one att val doesn’t tell us another att val

Then remove E add all attributes of E to F

Page 10: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

10

Students CoursesEnrolls

TA-Name AssistsTA

Entity attributeCName

Room

Students CoursesEnrolls

CName

Room

TA-Name

Course-ID

Course-ID

Page 11: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

11

Convert TA entity again?

No! Multiple TAs allowed Violates condition (1) Redundant course data

Students CoursesEnrolls

AssistsTA

CName CID Room TA-NameDBMS 46 123 HowardDBMS 46 123 Wesley

CName

Room

Course-ID

TA-Name

Page 12: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

12

Convert TA entity again?

Students CoursesEnrolls

AssistsTA

CName

Room

Course-ID

TA-ID TA-Favorite-Color

No! TA has dependent fields Violates condition (2)

How can it tell? Redundant TA data

CName TA-Name TA-ID TA-ColorDBMS Ralph 678 GreenA.Soft. Ralph 678 Green

TA-Name

Page 13: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

13

Entity or attributes? Should student address be an entity or an attribute? If student may have multiple addresses, must be entity

campus address, permanent address attributes cannot be set-valued

If we need to examine structure of address, must be entity find all students from NYS but not NYC

If attribute, then it’s probably a simple string no structure! NB: this choice is a microcosm of entire miniworld (much) power of a DB comes from the structure

imposed on the data

Page 14: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

14

Larger example DB design Application: library database. Authors have written

books about various subjects; different libraries in the system may carry these books.

Entities (with attributes in parentheses): Authors (ssn, name, phone, birthdate) Books (ISDN, title) Subjects (sname, sid) Libraries (lname)

Relations [associating entities in square brackets]: Wrote-on [Authors, Subjects] Cover [Libraries, Subjects] On [Books, Subjects]

Page 15: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

15

E/R of DB designName

Author

ssn phone birthdate

wrote-on

SubjectSNameTitle

Carries

LibraryLName

On Book

ISBN

Page 16: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

16

Poor initial design First design is a poor model of this system Some info not captured:

How many copies does a lib. have of a given book? What edition of a book does the library have?

Design problems: no direct relship associating authors and books no direct relship associating libraries and books

Common queries complex and difficult/expensive What libraries carry books by a given author? What books has a given author written? Who is the author of a given book?

Page 17: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

17

Larger example DB design 2 Application: library database as before

Entities (with attributes in parentheses): Authors (ssn, name, phone, birthdate) Books (ISDN, title) Subjects (sname, sid) Libraries (lname)

Relations [associating entities in square brackets] (attributes in parentheses): Wrote [Authors, Books] Carries [Libraries, Books] (quantity, edition) On [Books , Subjects]

Page 18: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

18

E/R of improved DB design

Rule of thumb: often queried together make closely connected

Name

Author

ssn phone birthdate

wrote

BookISBN

TitleCarries

LibraryLName

Edition

Quantity

On Subject

SName

Page 19: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

19

Next topic: Constraints Review: programmer-defined rules stating what

should always be true about consistent databases Restrictions on data:

Keys (e.g. SSNs uniquely identify people) Single value constraints (e.g. everyone has 1 father) Referential Integrity (e.g. person’s record refers to father

father must exist) Domain constraints (e.g. gender in M/F, age in 0..150) General constraints (e.g. no more than 10 customers per

sales rep) Can’t infer constraints from data

may hold “accidentally” they are a part of the schema

Page 20: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

20

E/R keys Uniquely identifies entity in ES Attribute or set of attributes

Two entities cannot agree on all key attributes These attributes determine all others

Every ES should have a key possibly including all attributes

Primary key attributes underlined More than one possible key:

Candidate keys, primary key

Practical tip: create intentional key attribute E.g. SSN, course-id, employee-id, etc. SSN likely shorter than (name,address) Prevents quasi-redundancy

address

name ssn

Person

Page 21: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

21

Single-valued constraints “at most one” value

sharp arrows E.g. attributes: could be null or one Many-one relationships: the “one” part is

single-valued. Can think of key atts as (non-null) single-

valued

TACourse Assists

Page 22: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

22

Referential integrity “Exactly one value” NOT NULL attributes Relationships

Non-null value refers to entity that exists Refer to entity with foreign key HTML analogy: no broken links Programming analogy: no dangling pointers Ways of handling deletion:

Prevent deletion as long as referrer exist Enforce deletion of all referrers

InstructorCourse Taught

Page 23: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

23

Referential integrity – E/R e.g.

Insertion – must refer to existing entity Suppose need to add

course: “Oracle” instructor: MPJ

Q: Which order? Q: What if relship were exactly-exactly?

i.e., referential integrity in both directions? A: Put both inserts in one xact – later

Students CoursesEnrolls

Instructor

Taught

Page 24: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

24

Other kinds of constraints Domain constraints

E.g. date: must be after 1980 Enumerated type: grades A through F, no E No specific E/R notation: mention with attribute or

relationship General constraints:

A class may have no more than 100 students; a student may not have more than 6 courses:

Students CoursesEnroll <=6<=100

Page 25: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

25

Next topic: Weak entity sets (2.4) Definition:

Some or all key attributes belong to another ES Why:

An entity set is part of a hierarchy (not ISA) Connecting entity sets

The key consists of 0, 1 or more of its own attributes Key attributes of entity sets from supporting

relationships

Page 26: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

26

Conditions of Supporting relationships

Supporting relationship R:EF R is many-one (E-F) or one-one R is binary Referential integrity from E to F

i.e. a rounded arrow Those atts supplied to E are the key attributes of F F itself may be weak

Another entity set G, and so on recursively

A1

A2

RE F

Page 27: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

27

If several supporting relationships from E to F Keys of several different entities from F appear as

foreign key of E

Other many-one relationships Not necessarily supporting

Requirements for weak entity sets

From

By

Purchases A1

A2

A3

People

StoresAt-store

Page 28: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

28

Weak entity sets Example: Hierarchy – species & genus Idea: species name unique per genus only

Species

name

Belongs-to Genus

name

Page 29: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

29

Video store connecting entity sets e.g. was a weak entity set

Key: date, MID,SID, CID

Weak entity sets

MID

SID

CID

Rental

StoreOf

MovieOf

BuyerOf

date

Product

Store

Customer

Page 30: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

30

E/R design summary Subject/design choices:

Should a concept be modeled as an ES or an att? Should a concept be modeled as an ES or a

relship? Identifying relationships: binary or multiway?

Constraints in the ER Model: Important in determining the best design. Much data semantics can (and should) be captured Normalization improves further – later

Page 31: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

31

Agenda Last time: finished E/R models per se This time: Intro to relational model Converting ER diagrams to relations Functional dependencies

Keys and superkeys in terms of FDs Finding keys for relations Rules of FDs

Normalization

Page 32: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

32

Next topic: the Relational Data Model

Database Model(E/R, other)

Relational Schema

Physicalstorage

Diagrams (E/R) Tables: column names: attributes rows: tuples

Complex file organizationand index structures.

Page 33: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

33

Relations as tables

Name Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorksPower gizmo $29.99 gadgets GizmoWorksSingleTouch $149.99 photography CanonMultiTouch $203.99 household Hitachi

tuples/rows/records/entities

Attribute names Product

table/relation

Page 34: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

34

Relational terminology Relation is composed of tuples Tuples composed of attribute values

Attribute has atomic types

Relation schema: relation name + attribute names + attribute types

Relation instance: set of tuples order doesn’t matter

Database schema: set of relation schemas Database instance: relation instance for every

relation in the schema

Page 35: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

35

Relations as sets Remember: math relation is a subset of the cross-

product of the attribute value sets R subset-of S x T Product subset-of Name x Price x Cat x Mft

One member of Product relation: (gizmo, $19.99, gadgets, GizmoWorks) in Product

DB Relation instance = math relation

Q: If relations are sets, why call “instances”? A: R is a member of the powerset P(SxT)

powerset = set of all subsets

Page 36: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

36

More on tuples Formally, can also be a mapping

from attribute names to (correctly typed) values: name gizmo price $19.99 category gadgets manufacturer GizmoWorks

NB: ordered tuple is equiv to mapping Both ways supported in SQL

Sometimes we refer to a tuple by itself (note order of attributes) (gizmo, $19.99, gadgets, GizmoWorks) or Product(gizmo, $19.99, gadgets, GizmoWorks).

Page 37: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

37

Updates/modifications The database maintains a current database state Modifications of data:

add a tuple delete a tuple update an attribute value in a tuple

DB Relation instance = math relation Idea: we saw partic. Product DB instance

add, delete rows different DB rel. instances technically, different math relations to DBMS, still the same relation/table

Modifications to the data are frequent Updates to the schema are rare, painful (Why?)

Page 38: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

38

E/R models to relations Recall justification:

design is easier in E/R implementation is easier/faster in R

Parallel to program compilation: design is easier in C/Java/whatever implemen. is easier/faster in machine/byte code

Strategy1. apply semi-mechanical conversion rules

2. improve by combining some relations

3. improve by normalization involves finding functional dependencies

Page 39: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

39

E/R conversion rules Entity set … relation

attributes: attributes of entity set key: key of ES

Relationship relation attributes: keys of entity-sets/roles key: depends on multiplicity

NB: mapping of types is not one-one We’ll see: mapping one tokens is not one-one

Special treatment: Weak entity sets Isa relations & subclasses

Page 40: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

40

Entity Sets Entity set Students

ssn

name

address

Students

John

Howard

Name

South Carolina444-555-6666

Park Avenue111-222-3333

AddressSSN

Rel: Students

Page 41: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

41

Entity Sets

Course

CourseID

CourseName

Page 42: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

42

Binary many-to-many relationships Key: keys of both entities

Why we learned to recognize keys

C30.0046444-555-6666

C20.0056111-222-3333

C20.0046111-222-3333

CourseIDssn

Relation: Enrolls

EnrollsS_addr

S_NameStudents Course

Course-Name

CourseID

ssn

Page 43: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

43

Many-to-one relationships

Key: keys of many entity

Movies Studiosowns

2003SyliaM202

1999Mr. Ripley.M101

YearTitleMovieID

Movies

OrlandoDisneyS73

NYCMiramaxS35

AddressNameStudioID

Studios

S35

S73

StudioID

CN22222

CN11111

CopyrightNo

M202

M101

MovieIDOwns

CopyrightNo

MovieID

Title

Year StudioID

NameAddress

Page 44: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

44

Improving on many-one Note rules applied:

Movies Rel.: all atts from Movies ES Studios Rel: all atts from Studios ES Owns Rel: att key atts from Movies & Studios ESs

But: Owns:MoviesStudios is many-one for each row in Movies, there’s a(/no) row in Owns just add the Owns data to Movies

Page 45: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

45

Many-to-one: a better design

Q: What if a movie’s Owns row were missing?

2003SyliaM202

1999Mr. Ripley.M101

YearTitleMovieID

Movies

S35

S73

StudioID

CN22222

CN11111

CopyrightNo

M202

M101

MovieID

Owns

CN22222

CN11111

CopyrightNo

S35

S73

StudioID

2003

1999

Year

SyliaM202

Talent Mr. Ripley

M101

TitleMovieID

Movies’

Page 46: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

46

Many-to-many relationships again Won’t work for many-many relationships

acts

MovieID Title Year

M101 Mr. Ripley 1999

M202 Sylia 2003

M303 P.D. Love 2002

StarID Name Address

T400 Gwyneth P. Bev.Hills

T401 P.S. Hoffman Hollywood

T402 Jude Law Palm Springs

MovieID StarID

M101 T400

M202 T400

M101 T401

M101 T402

M303 T401

Movies

Stars

Acts

Movies Stars

Page 47: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

47

Many-to-many relationships again

MovieID Title Year StarID

M101 Talented Mr. Ripley 1999 T400

M101 Talented Mr. Ripley 1999 T401

M101 Talented Mr. Ripley 1999 T402

M202 Sylia 2003 T400

M303 Punch Drunk Love 2003 T401

And here’s why:

Page 48: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

48

Multiway relationships & roles

Different roles treated as different entity sets Key: keys of the many entities

Students Courses

TAs

tutors graders

enrolls

TA_SSN Name

SSN CourseID

Name Name

Page 49: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

49

Multiway relationships & roles

Enrolls(S_SSN, Course_ID, Tutor_SSN, Grader_SSN)

SSN Name

111-11-1111 George

222-22-2222 Dick

TA_SSN Name

333-33-3333 Wesley

444-44-4444 Howard

555-55-5555 John

Students TAsCourseID Name

C20.0046 Databases

C20.0056 Software

Courses

S_SSN CourseID Tutor_SSN Grader_SSN

111-11-1111 C20.0046 333-33-3333 444-44-4444

222-22-2222 C20.0046 444-44-4444 555-55-5555

Page 50: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

50

Converting weak ESs – differences

Atts of Crew Rel are: attributes of Crew key attributes of

supporting ESs

Crew Unit-of Studio

StudioNameCrew_ID

address

C2Miramax

C1Disney

C1Miramax

Crew_IDStudioName

Crew

Supporting relships are omitted (why?)

Page 51: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

51

Weak entity sets - relationships

Crew Studio

StudioNameCrew_ID

address

Insurance

IName

Address 1260 7th Av.NYBlueCross

1250 6th Av.NYAetna

AddressIName

InsuranceSubscribes

Unit-of

Page 52: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

52

Weak entity sets - relationships Non-support relationships for weak ESs are

converted keys include entire weak ES key

C21

C22

C21

Crew_ID

Aetna

BlueCross

Aetna

Insurer

Universal

Disney

Universal

StudioName

Subscribes

Page 53: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

53

Conversion example Video store rental example, plus some atts

Q: Conversion to relations?

Rental

VideoStore

Customer

Movie

date

yearMNameaddress

Cname

MID

Page 54: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

54

Conversion example, continued Resulting binary-relationship version

Q: Conversion to relations?

Rental

Customer

Store

Movie

StoreOf

MovieOf

BuyerOf

dateyearMName

address

Cname

MID

Page 55: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

55

Converting inheritance hierarchies No best way Several non-ideal methods:

E/R-style: each ES relation OO-style: each possible “object” relation nulls-style: each rooted hierarchy relation

non-applicable fields filled in with nulls

Pros & cons for each method, exist situations favoring it

Page 56: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

56

Converting inheritance hierarchies

Movies

Cartoons Murder-Mysteries

isa isaVoices

Weapon

stars

length title year

Lion King

Component

Page 57: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

57

Inheritance: E/R-style conversion Each ES relation

Root entity set: Movies(title, year, length)

1301993Lion King

1988

1990

1980

Year

110

115

120

length

Roger Rabbit

Scream

Star Wars

Title

Knife1990R. Rabbit

1988

Year

Knife

murderWeapon

Scream

Title

Subclass: MurderMysteries(title, year, murderWeapon)

Subclass: Cartoons(title, year)

1993Lion King

1990

Year

Roger Rabbit

Title

Page 58: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

58

E/R-style & quasi-redundancy Name and year of Roger Rabbit were listed in three

different rows (in different tables) Suppose title changes (“Roger” “Roget”)

must change all three places Q: Is this redundancy? A: No!

name and year are independent multiple movies may have same name

Real redundancy reqs. dependency two rows agree on SSN must agree on rest

conflicting hair colors in these rows is an error two rows agree on movie title may still disagree

conflicting years may be correct – or may not be Better: introduce “movie-id” key att

Page 59: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

59

Subclasses: object-oriented approach Every possible “subtree” (what’s this?):

1. Movies

2. Movies + Cartoons

3. Movies + Murder-Mysteries

4. Movies + Cartoons + Murder-Mysteries

Title Year length

Star Wars 1980 120

Title Year length Murder-Weapon

Scream 1988 110 Knife

Title Year length

Lion King 1990 115

Title Year length Murder-Weapon

Roger Rabbit 1988 110 Knife

1. 3.

2. 4.

Page 60: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

60

Subclasses: nulls approach One relation for entire hierarchy Any non-applicable fields are NULL

Q: How do we know if a movie is a MM? Q: How do we know if a movie is a cartoon?

Title Year length Murder-Weapon

Star Wars 1980 120 NULL

Lion King 1993 130 NULL

Scream 1988 110 Knife

Roger Rabbit 1990 115 Knife

Page 61: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

61

Agenda Last time: relational model Homework 1 is up, due next Tuesday This time:

1. Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations

2. Rules for combining FDs Next time: anomalies & normalization

Page 62: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

62

Next topic: Functional dependencies (3.4) FDs are constraints

part of the schema can’t tell from particular relation instances FD may hold for some instances “accidentally”

Finding all FDs is part of DB design Used in normalization

Page 63: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

63

Functional dependencies Definition:

Notation: Read: Ai functionally determines Bj

If two tuples agree on the attributes

A1, A2, …, AnA1, A2, …, An

then they must also agree on the attributes

B1, B2, …, BmB1, B2, …, Bm

A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm

Page 64: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

64

Typical Examples of FDs Product

name price, manufacturer

Person ssn name, age father’s/husband’s-name last-name zipcode state phone state (notwithstanding inter-state area codes)

Company name stockprice, president symbol name name symbol

Page 65: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

65

To check A B, erase all other columns; for each rows t1, t2

i.e., check if remaining relation is many-one no “divergences” i.e., if AB is a well-defined function thus, functional dependency

Functional dependencies

Bm...B1Am...A1

t1

t2

if t 1, t 2 agree here then t 1, t 2 agree here

Page 66: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

66

FDs Example

Product(name, category, color, department, price)

name colorcategory departmentcolor, category price

name colorcategory departmentcolor, category price

Consider these FDs:

What do they say ?

Page 67: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

67

FDs ExampleFDs are constraints:• On some instances they hold• On others they don’t

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Green Toys 99

Does this instance satisfy all the FDs ?

name colorcategory departmentcolor, category price

name colorcategory departmentcolor, category price

Page 68: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

68

FDs Example

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Black Toys 99

Gizmo Stationary GreenOffice-supp.

59

What about this one ?

name colorcategory departmentcolor, category price

name colorcategory departmentcolor, category price

Page 69: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

69

Q: Is PositionPhone an FD here?

A: It is for this instance, but no, presumably not in general

Others FDs? EmpID Name, Phone, Position but Phone Position

Recognizing FDs

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer

Page 70: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

70

Keys of relations {A1A2A3…An} is a key for relation R if

1. A1A2A3…An functionally determine all other attributes

Usual notation: A1A2A3…An B1B2…Bk

rels = sets distinct rows can’t agree on all Ai

2. A1A2A3…An is minimal No proper subset of A1A2A3…An functionally determines

all other attributes of R

Primary key: chosen if there are several possible keys

Page 71: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

71

Keys example Relation: Student(Name, Address, DoB,

Email, Credits) Which (/why) of the following are keys?

SSN Name, Address (on reasonable assumptions) Name, SSN Email, SSN Email

NB: minimal != smallest

Page 72: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

72

Superkeys A set of attributes that contains a key Satisfies first condition:

functionally determines every other attribute in the relation

Might not satisfy the second condition: minimality may be possible to peel away some attributes

from the superkey keys are superkeys

key are special case of superkey superkey set is superset of key set

name;ssn is a superkey but not a key

Page 73: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

73

Discovering keys for relations Relation entity set

Key of relation = (minimized) key of entity set Relation binary relationship

Many-many: union of keys of both entity sets Many(M)-one(O): only key of M (Why?) One-one: key of either entity set (but not both!)

Page 74: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

74

Example – entity sets Key of entity set = (minimized) key of relation

Student(Name, Address, DoB, SSN, Email, Credits)

Student

Name

Address

DoB

SSN

Email

Credits

Page 75: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

75

Example – many-many Many-many key: union of both ES keys

Student Enrolls Course

SSN Credits CourseID Name

Enrolls(SSN,CourseID)

Page 76: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

76

Example – many-one Key of many ES but not of one ES

keys from both would be non-minimal

Course MeetsIn Room

CourseID Name RoomNo Capacity

MeetsIn(CourseID,RoomNo)

Page 77: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

77

Example – one-one Keys of both ESs included in relation Key is key of either ES (but not both!)

Husbands Married Wives

SSN Name SSN Name

Married(HSSN, WSSN) or

Married(HSSN, WSSN)

Page 78: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

78

Discovering keys: multiway Multiway relationships:

Multiple ways – may not be obvious R:F,G,HE is many-one E’s key is included

but not part of key Recall that relship atts are implicitly many-one

Course Enrolls Student

CourseID Name SSN NameSection

RoomNo Capacity

Enrolls(CourseID,SSN,RoomNo)

Page 79: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

79

Combining FDs (3.5)If some FDs are satisfied, thenothers are satisfied too

If all these FDs are true:name colorcategory departmentcolor, category price

name colorcategory departmentcolor, category price

Then this FD also holds: name, category pricename, category price

Why?

Page 80: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

80

Splitting & combining FDsSplitting rule:

Combining rule:

Note: doesn’t apply to the left side

Q: Does it apply to the left side?

A1A2…An B1B2…BmA1A2…An B1B2…Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

Bm...B1Am...A1

t1

t2

Page 81: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

81

Reflexive rule: trivial FDs

FD A1A2…An B1B2…Bk may be Trivial: Bs are a subset of As Nontrivial: >=1 of the Bs is not among the As Completely nontrivial: none of the Bs is among the As

Trivial elimination rule: Eliminate common attributes from Bs, to get an equivalent

completely nontrivial FD

A1, A2, …, An AiA1, A2, …, An Ai with i in 1..n is a trivial FD

A1 … An

t

t’

Page 82: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

82

Transitive ruleIf

and

then

A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm

B1, B2, …, Bm C1, C2, …, CpB1, B2, …, Bm C1, C2, …, Cp

A1, A2, …, An C1, C2, …, CpA1, A2, …, An C1, C2, …, Cp

A1 … Am B1 … Bm C1 ... Cp

t

t’

Page 83: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

83

Example R(A,B,C) Each of three determines other two Q: What are the FDs?

Closure of singleton sets Closure of doubletons

Q: What are the keys? Q: What are the minimal bases?

Page 84: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

84

Examples of Keys Product(name, price, category, color)

name, category price

category color

Keys are: {name, category}

Enrollment(student, address, course, room, time)student address

room, time course

student, course room, time

Keys are: [in class]

Page 85: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

85

Agenda Last time: FDs Project part 2 up soon This time:

1. Anomalies

2. Normalization Next time: Relational Algebra

Page 86: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

86

Review examples: finding FDs Product(name, price, category, color)

name, category price

category color

Keys are: {name, category}

Enrollment(student, address, course, room, time)student address

room, time course

student, course room, time

Keys are: [in class]

Page 87: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

87

Another review example Relation R(A,B,C) Each of three attributes determines other two Q: What are the FDs?

Closure of singleton sets Closure of doubletons

Q: What are the keys? Q: What are the minimal bases?

Page 88: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

88

Next topic: Anomalies (3.6) Identify anomalies in existing schema How to decompose a relation Boyce-Codd Normal Form (BCNF) Recovering information from a decomposition Third Normal Form

Page 89: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

89

Types of anomalies Redundancy

Repeat info unnecessarily in several tuples Update anomalies:

Change info in one tuple but not in another Deletion anomalies:

Delete some values & lose other values too Insert anomalies:

Inserting row means NULL-ing some fields

Page 90: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

90

Example of anomalies

Redundancy: name, address Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones lose Bill! Underlying cause: SSN-phone is many-many Effect: partial dependency ssn name, address

Name SSN Mailing-address Phone

Michael 123 NY 212-111-1111

Michael 123 NY 917-111-1111

Hilary 456 DC 202-222-2222

Hilary 456 DC 914-222-2222

Bill 789 Chappaqua 914-222-2222

Bill 789 Chappaqua 212-333-3333

SSN Name, Mailing-addressSSN Name, Mailing-address SSN PhoneSSN Phone

Page 91: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

91

Decomposition by projection Soln: replace anomalous R with projections of

R onto two subsets of attributes Projection: an operation in Relational Algebra

projection = SELECT in SQL

Projecting R onto attributes (A1,…,An) means removing all other attributes Result of projection is another relation Yields tuples whose fields are A1,…,An

Resulting duplicates ignored

Page 92: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

92

Projection for decomposition

R1 = projection of R on A1, ..., An, B1, ..., Bm

R2 = projection of R on A1, ..., An, C1, ..., Cp

A1, ..., An B1, ..., Bm C1, ..., Cp = all attributes, usually disjoint setsR1 and R2 may (/not) be reassembled to produce original R.

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)

Page 93: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

93

Chappaqua789Bill

NY123Hilary

NY123Michael

Mailing-addressSSNName

Decomposition example

The anomalies are gone No more redundant data Easy to for Bill to move Okay for Bill to lose all phones

Break the relation into two:

Name SSN Mailing-address Phone

Michael 123 NY 212-111-1111

Michael 123 NY 917-111-1111

Hilary 456 DC 202-222-2222

Hilary 456 DC 914-222-2222

Bill 789 Chappaqua 914-222-2222

Bill 789 Chappaqua 212-333-3333

212-333-3333789

914-222-2222789

914-222-2222567

202-222-2222456

917-111-1111123

212-111-1111123

PhoneSSN

Page 94: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

94

Thus: high-level strategy

PersonbuysProduct

name

price name ssn

Conceptual Model:

Relational Model:plus FD’s

Normalization:Eliminates anomalies

Page 95: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

95

Using FDs to produce good schemas Start with set of relations Define FDs (and keys) for them based on real

world Transform your relations to “normal form”

(normalize them) Do this using “decomposition”

Intuitively, good design means No anomalies Can reconstruct all original information

Page 96: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

96

Decomposition terminology Projection: eliminating certain attributes from

relation Decomposition: separating a relation into two by

projection Join: (re)assembling two relations

Whenever a row from R1 and a row from R2 have the same value for att A join to form a row of R3

If the original data can be reproduced after a decomposition by joining the relations, then the decomposition was lossless We join on the attributes R1 and R2 have in common (As)

If it can’t, the decomposition was lossy

Page 97: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

97

A decomposition is lossless if we can recover:

R(A,B,C)

R1(B,C) R2(B,A)

R’(A,B,C) should be the same as R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Decompose

Recover

Lossless DecompositionsLossless Decompositions

Page 98: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

98

Lossless decomposition Sometimes the data can be reproduced:

(MSOffice, 100) + (MSOffice, WP) (MSOffice, 100, WP) (MSOffice, 100) + (MSOffice, DB) (MSOffice, 100, DB) (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB)

Name Price Category

MSOffice 100 WP

Oracle 1000 DB

MSOffice 100 DB

Name Price

MSOffice 100

Oracle 1000

MSOffice 100

Name Category

MSOffice WP

Oracle DB

MSOffice DB

Page 99: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

99

Lossy decomposition Sometimes it’s not:

(MSOffice, WP) + (100, WP) (MSOffice, 100, WP) (Oracle, DB) + (1000, DB) (Oracle, 1000, DB) (Oracle, DB) + (100, DB) (Oracle, 100, DB) (MSOffice, DB) + (1000, DB) (MSOffice, 1000, DB) (MSOffice, DB) + (100, DB) (MSOffice, 100, DB)

Name Price Category

MSOffice 100 WP

Oracle 1000 DB

MSOffice 100 DB

Name Category

MSOffice WP

Oracle DB

MSOffice DB

Price Category

100 WP

1000 DB

100 DB

What’swrong?

Page 100: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

100

Ensuring lossless decomposition

Examples: name price, so first decomposition was lossless name category, so second decomposition was lossy

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

If A1, ..., An B1, ..., Bm

Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)

Note: don’t necessarily need A1, ..., An C1, ..., Cp

Page 101: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

101

Quick lossless/lossy example

At a glance: can we decompose into R1(Y,X), R2(Y,Z)?

At a glance: can we decompose into R1(X,Y), R2(X,Z)?

X Y Z

1 2 3

4 2 5

Page 102: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

102

Next topic: Normal Forms First Normal Form = all attributes are atomic

As opposed to set-valued Assumed all along

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Page 103: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

103

Most important: BCNFA simple condition for removing anomalies from relations:

I.e.: The left side must always contain a keyI.e: If a set of attributes determines other attributes, it must determine all the attributes

A relation R is in BCNF if:

If As Bs is a non-trivial dependency

in R , then As is a superkey for R

A relation R is in BCNF if:

If As Bs is a non-trivial dependency

in R , then As is a superkey for R

Codd: Ted Codd, IBM researcher, inventor of relational model, 1970

Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s

Page 104: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

104

BCNF decomposition algorithmRepeat choose A1, …, Am B1, …, Bn that violates the BNCF condition

split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition

split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

A’s OthersB’s

R1 R2

//Heuristic: choose Bs as large as possible

Page 105: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

105

Boyce-Codd Normal Form Name/phone example is not BCNF:

{ssn,phone} is key FD: ssn name,mailing-address holds

Violates BCNF: ssn is not a superkey

Its decomposition is BCNF Only superkeys anything else

Name SSN Mailing-address Phone

Michael 123 NY 212-111-1111

Michael 123 NY 917-111-1111

Name SSN Mailing-address

Michael 123 NY

SSN PhoneNumber

123 212-111-1111

123 917-111-1111

Page 106: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

106

BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:

Title Year Studio Studio President President Pres-Address Studio President, Pres-Address (why?)

No many-many this time Problem cause: transitive FDs:

Title,year studio president

Page 107: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

107

BCNF Decomposition Illegal: As Bs, where As don’t include key Decompose: Studio President, Pres-Address

As = {studio} Bs = {president, pres-address} Cs = {title, year}

Result:1. Studios(studio, president, pres-address)2. Movies(studio, title, year)

Is (2) in BCNF? Is in (1) BCNF? Key: Studio FD: President Pres-Address Q: Does president studio? If so, president is a key But if not, it violates BCNF

Page 108: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

108

BCNF Decomposition Studios(studio, president, pres-address) Illegal: As Bs, where As don’t include key Decompose: President Pres-Address

As = {president} Bs = {pres-address} Cs = {studio}

{Studio, President, Pres-Address} becomes {President, Pres-Address} {Studio, President}

Page 109: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

109

BCNF and two-att relations Must a two-attribute relation be in BCNF?

Case 1: there are no non-trivial FDs Case 2: A B but not B A Case 3: B A but not A B Case 4: Both A B and B A

Note that relations may have multiple keys BCNF requires a key on the left, not all keys

Page 110: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

110

Agenda Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19th (Thurs.) This time:

1. Finish BCNF

2. 3NF

3. 4NF

Page 111: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

111

BCNF Review Q: What’s required for BCNF?

Q: What’s the slogan for BCNF?

Q: Who are B & C?

Q: What are the two types of violations?

Page 112: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

112

BCNF Review Q: How do we fix a non-BCNF relation?

Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy?

Q: Under what circumstances could a decomposition be lossy?

Q: How do we combine two relations?

Page 113: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

113

Decomposition algorithm example R(N,O,R,P) F = {N O, O R, R N}

Key: N,P Violations of BCNF: N O, OR, N OR

which kinds of violations are these? Pick N OR (on board) Can we rejoin? (on board) What happens if we pick N O instead? Can we rejoin? (on board)

Name Office Residence Phone

George Pres. WH 202-…

George Pres. WH 486-…

Dick VP NO 202-…

Dick VP NO 307-…

Page 114: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

114

Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A B (assume C!A) Key: A,C

Diff vars from text! Also goes through if assumption is false BCNF violation (which kind?): no key on the left Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C) Could this be lossy? We will join R1 and R2 on A to find out

Q: If C A, then what kind do we have?

Q: Since C ! A, what kind of bad FD do we have?

Page 115: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

115

Lossless BCNF decomposition Suppose R contains (b,a,c) and (b’,a,c’) In projection onto (B,A):

(b,a,c) (b,a), (b’,a,c’) (b’,a) In projection onto (A,C):

(b,a,c) (a,c), (b’,a,c’) (a,c’) In joining, (b’,a), (a,c) (b’,a,c) Q: Is/must/can this be correct? A: Yes! A B, so b = b’ So this was lossless We assumed C!A, but argument also goes

through when CA Moral: BCNF decomp alg is always lossless

Page 116: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

116

BCNF summary BCNF decomposition is lossless

Can reproduce original by joining Saw last time: Every 2-attribute relation is in

BCNF Final set of decomposed relations might be

different depending on Order of bad FDs chosen

Saw last time: But all results will be in BCNF

Page 117: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

117

A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:

Title,N’hood Theater Assume movie can’t play twice in same neighborhood

Theater N’hood Keys:

{Title, N’hood} {Theater, Title}

Title Theater N’hood

City of God Angelica Village

Fog of War Angelica Village

Page 118: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

118

A problem with BCNF BCNF violation: Theater N’hood Decompose:

{Theater, N’Hood} {Theater, Title}

Resulting relations:

VillageAngelica

N’hoodTheater

R1

Fog of WarAngelica

City of GodAngelica

TitleTheater

R2

Page 119: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

119

Problem - continued Suppose we add new rows to R1 and R2:

Their join:

City of GodVillageFilm Forum

Village

Village

N’hood

Fog of War

City of God

Title

Angelica

Angelica

Theater

(R’)

Theater N’hood

Angelica Village

Film Forum Village

Theater Title

Angelica City of God

Angelica Fog of War

Film Forum City of God

R1 R2

A and B could not enforce FD Title,N’hood Theater

Page 120: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

120

Third normal form: motivation There are some situations in which

BCNF is not dependency-preserving, and Efficient checking for FD violation on updates is

important

In these cases BCNF is too severe a req. Solution: define a weaker normal form, called

Third Normal Form in which FDs can be checked on individual relations

without performing a join (no inter-relational FDs) to which relations can be converted, preserving both

data and FDs

Page 121: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

121

Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form

Turns out: this example was already in 3NF

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime

Tradeoff:BCNF = no FD anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies

Page 122: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

122

BCNF: vices and virtues Be clear on the problem just described v. the

arg. that BCNF decomp is lossless BCNF decomp does not lose data

Resulting relations can be rejoined to obtain the original

But: it can can lose dependencies After decomp, possible to add rows whose

corresponding rows would be illegal in (rejoined) original

Page 123: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

123

Recap: goals of normalization When we decompose a relation R with FDs F into

R1..Rn we want:

1. lossless-join decomposition – no data lost

2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF

3. Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri:

F is the “sum” of the FDs of the new relations (F1 F2 F3 … Fn)+ = F+

Otherwise checking updates for violation of FDs may require computing joins, which is expensive

Page 124: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

124

Dependency preservation Saw that last req. didn’t hold in move-theater

example Did it hold in R(N,O,R,P) example?

(on board)

Page 125: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

125

Testing for 3NF For each dependency X Y, use attribute closure

to check if X is a superkey If X is not a superkey, verify that each attribute in Y

is prime This test is rather more expensive, since it involves finding

candidate keys Testing for 3NF is NP-complete (in what?) Interestingly, decomposition into 3NF can be done in

polynomial time Testing for 3NF is harder than decomposing into 3NF!

Optimization: need to check only FDs in F, need not check all FDs in F+ (why?)

Page 126: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

126

3NF Example R = (J, K, L) F = (JK L, L K) Two candidate keys: JK and JL R is in 3NF

JK L JK is a superkey L K K is prime

BCNF decomposition yields R1 = (L,K), R2 = (L,J)

testing for JK L requires a join There is some redundancy in R

Page 127: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

127

BCNF and 3NF Comparison Example of problems due to redundancy in 3NF

R = (J, K, L) F = (JK L, L K)

A schema that is in 3NF but not BCNF has the problems of: redundancy (e.g., the relationship between l1 and k1) need to use null values (if allowed!), e.g. to represent the

relationship between l2 and k2 when there is no corresponding value for attribute J

J K L

j1 k1 l1

j2 k1 l1

j3 k1 l1

NULL k2 l2

Page 128: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

128

Comparison of BCNF and 3NF It is always possible to decompose a relation

into relations in 3NF such that: the decomposition is lossless the dependencies are preserved

It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve

dependencies But may eliminate more redundancy

Page 129: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

129

The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: 1NF and no partial dependencies 3NF: for each FD X Y either

it is trivial, or X is a superkey, or Y is a part of some key

BCNF: 3NF and third 3NF option disallowed I.e, 2NF and no transitive dependencies

Page 130: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

130

Distinguishing examples 1NF but not 2NF: R(Name, SSN ,Mailing-

address,Phone) Key: SSN,Phone Partial: ssn name, address

2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr) Key: Title,Year Transitive: studio president

3NF but not BCNF: R(Title, Theater, N’hood) Title,N’hood Theater Prime-on-right: Theater N’hood

Page 131: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

131

Design Goals Goal for a relational database design is:

No redundancy Lossless Join Dependency Preservation

If we cannot achieve this, we accept one of dependency loss use of more expensive inter-relational methods to preserve

dependencies data redundancy due to use of 3NF

Interesting: SQL does not provide a direct way of specifying FDs other than superkeys can specify FDs using assertions, but they are expensive to test

Page 132: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

132

3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)

student, subject teacher (students not allowed in the same subject with two teachers)

teacher subject (each teacher teaches one subject) Subject is prime, so this is 3NF

But we have anomalies: Insertion: cannot insert a teacher until we have a

student taking his subject If we convert to BCNF, we lost student,

subject teacher

Page 133: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

133

BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:

1) subject X can be taught by teacher Y 2) student Z takes subject W from teacher V

What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another

(BCNF) relation SUBJECT-TAUGHT (teacher, subject)

Still (more!) redundancy, but no more insert and delete anomalies

Page 134: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

134

Normalization Review Q: What’s required for BCNF?

Q: What are the two types of violations?

Q: What’s the loophole for 3NF?

Q: How do we fix a non-BCNF relation?

Page 135: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

135

Normalization Review Q: If AsBs violates BCNF, what do we do?

Q: In this case, could the decomposition be lossy?

Q: How do we combine two relations?

Q: Can BCNF decomp. lose FDs?

Q: Can 3NF decomp. lose FDs?

Page 136: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

136

Redundancy in BCNF

Lots of redundancy! Key? All fields

None determined by others! Non-trivial FDs? None! In BCNF? Yes!

Name Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

Now what? New concept, leading

to another normal form: Multivalued

dependencies

Page 137: Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Matthew P. Johnson, OCL1, CISDD CUNY, F2004

137

E/R to relational model

courses Depts

Computer-allocation

room

number

givenBy

name chair