CS 222 Database Management System Spring 2010-11 Lecture 4 Database Design Theory

31
CS 222 Database Management System Spring 2010-11 Lecture 4 Database Design Theory Korra Sathya Babu Department of Computer Science NIT Rourkela

description

CS 222 Database Management System Spring 2010-11 Lecture 4 Database Design Theory. Korra Sathya Babu Department of Computer Science NIT Rourkela. Unit Overview. Database Design Problem Redundancy and Anomaly Functional Dependency - PowerPoint PPT Presentation

Transcript of CS 222 Database Management System Spring 2010-11 Lecture 4 Database Design Theory

Page 1: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

CS 222 Database Management System

Spring 2010-11

Lecture 4 Database Design Theory

Korra Sathya BabuDepartment of Computer Science

NIT Rourkela

Page 2: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Database Design Problem• Redundancy and Anomaly

• Functional Dependency• Axioms, Logical Implications of FDs, Redundant FDs, Closure, Equivalence

of FDS, extraneous attributes, covers

• Decomposition• Rules of Decomposition, Test for Lossless Join and Dependency

Preservation

• Normalization• Normal Forms, Multivalued Dependency, Join Dependency,

Denormalization

04/20/23 Database Design 2

Unit Overview

Page 3: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

04/20/23 Database Design 3

Modeling

Movies Stars-In Stars

length filmType

title year

OwnsStudios

name address

name

address

Page 4: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Automatic mappings from E/R to relations may not produce the best relational design possible

• Suggested Design Strategy• Real-world to E/R model to Relational schema to Better relational schema

to Relational DBMS

• Database designers sometimes go directly from Real-world to Relational schema, in which case the relational design could be really bad.

• Many problems may arise if the design is not careful

04/20/23 Database Design 4

Relational Database Design

Page 5: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Redundancy• Insertion Anomaly• Updation Anomaly• Deletion Anomaly

04/20/23 Database Design 5

Problems using bad design

Page 6: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Definition• A functional dependency (FD) has the form XY, where X

and Y are sets of attributes in a relation R. Formally, XY means that whenever two tuples in R agree on all the attributes of X, they must also agree on all the attributes of Y.

• Movies (title, year, length, filmtype,studioName, starName)• FDs we can reasonable assert are: Title, year length ; Title,

year filmType; Title, year studioName

• Trivial Dependency• Trivial: A fd A

1A

2…A

n B is said to be trivial if B is one of

the A’s. ex. title year title• Nontrivial: atleast one of the B’s not among A’s. ex. title

year year length• Completely nontrivial: none of the B’s are part of A’s. ex.

title year length04/20/23 Database Design 7

Functional Dependency

Page 7: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• An FD A1 A2 . . .An -> B1 B2 . . .Bm is trivial if the B’s are a subset of the A’s {B1,B2, . . . Bn} subset {A1,A2, . . . An}

• It’s non-trivial if at least one B is not among the A’s, i.e., {B1,B2, ... Bn} − {A1,A2, ...An} ≠ Ø

• It’s completely non-trivial if none of the B’s are among the A’s, {B1,B2, ... Bn} Intersect {A1,A2, ...An} = Ø

• Trivial dependency rule: The FD A1 A2 . . .An B1 B2 . . .Bm is equivalent to the FD A1 A2. . .An C1 C2 . . . Ck , where the C’s are those B’s that are not A’s, i.e., {C1, C2, . ., Ck} = {B1,B2, . ,Bm} − {A1,A2,.. ,An}

04/20/23 Database Design 8

Trivial Dependency

Page 8: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Reflexivity• (If X Y, then X Y)

• Augumentation• (If X Y, then XZ YZ for any Z)

• Transitivity• (If X Y and Y Z, then X Z)

04/20/23 Database Design 9

Armstrong Axioms

• Armstrong axioms are sound and complete

• Sound They generate only FDs in F+ when applied to a set of FDs

• Complete They when repeatedly applied, these rules will generate all

FDs in F+

Page 9: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Finding the closure of an FD Set may be tedious. So more rules may be derived from Armstrong Axioms

• The Union Rule, Pseudotransitive Rule and Decompostion Rule are Sound but not complete

• Union Rule (If X Y and X Z, then X YZ)

04/20/23 Database Design 10

More Inference Axioms

Given xy, xzAugument x to xy and y to xz

xx xy ; xy yz x xy ; xy yz x yz [using transitive Axiom; x xy, xy

yz]

Page 10: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Pseudotransitive Rule (If X Y and YW Z, then XW Z)

• Decomposition Rule (If X YZ then X Y and then X Z)

04/20/23 Database Design 11

More Inference Axioms

Given xy, ywzAugument w to xy

xw yw xw z [using transitive Axiom; xw yw, yw z]

Lets prove from the back onwardsAssume xy and xz is givenTake x y and augument with x

xx yx We already have xz, So replace x with z in determinee xx yx xyz

Page 11: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Let F be the following set of functional dependencies: {AB→CD, B→DE, C→F, E→G, A→B}. Use Armstrong’s axioms to show that {A→FG} is logically implied by F

04/20/23 Database Design 12

Logical Implications of FDs

Page 12: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Used to determine if a relation R satisfies or doesn’t satisfy a given FD: AB

• Input: Relation R and an FD: A B• Output: TRUE if R satisfies A B, otherwise FALSE

04/20/23 Database Design 13

The Satisfies Algorithm

The Satisfies Algorithm:Step 1: Sort the tuples of the relation R on the attribute(s) A (determinant) so that tuples with equal values under A are next to each otherStep 2: Check that tuples with equal values under A also have equal values under attribute(s) BStep 3: If any two tuples of R meet condition 1 but fail to meet condition 2 the output of the algorithm is FALSE. Otherwise, the relation satisfies the Functional Dependency and the output of the algorithm is TRUEIn short the satisfies algorithm can be stated as: The relation R satisfies the FD: AB if the following holds for every pair of tuples t1 and t2 in R, if t1.A = t2.A then t1.B = t2.B

Page 13: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Given a set F of FDs, a FD AB of F is said to be redundant w.r.t the FDs of F if and only if AB can be derived from the set of FDs F-{AB}

• Eliminating Redundant FDs allows us to minimize the set of FDs

• Membership Algorithm helps to determine the Redundant FDs.

• Input : A set F of FDs and a particular FD of F that is being tested• Output: FD is Redundant or not

04/20/23 Database Design 14

Redundant FDs

Page 14: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Assume F is a set of FDs with AB Є F

04/20/23 Database Design 15

Redundant FDs

The Membership Algorithm:Step 1: Remove temporarily AB from F and initialize the set of FDs G to F. ie. Set G=F-{AB}. If G ≠ Ø proceed to step 2; otherwise stop executing the algorithm since AB is non redundant

Step 2: Initialize the set of attributes Ti (with i=1) with the set of attribute(s) A(the determinant of the FD under consideration). ie. Set Ti = T1 = {A}. The set T1 is the current Ti

Step 3: In the set G search for FDs XY such that all the attributes of the determinant X are elements of the current set Ti. There are two possible outcomes

Step 3a: If such FD is found, add the attribute of Y (right hand side of FD) to set Ti and form a new Set Ti + 1= Ti U Y. The Set Ti + 1 is the current Ti . Check if all the attributes of B (the right hand side of FD under consideration) are members of Ti + 1. If this is the case, stop executing algorithm becos the FD:AB is redundant. If not all attributes of B are members of Ti + 1 , remove XY from G and repeat step 3 Step 3a: If G= Ø or there are no FDs in G that have all the attributes of its determinant in the current Ti then AB is not redundant

Page 15: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Given the set F={x YW, XW Z, Z Y, XY Z}. Determine if the FD XY Z is redundant in F

• Eliminate redundant FDs from F={X Y, Y X, Y Z, Z Y, X Z, Z X} using the Membership algorithm

• Find the redundant FDs in the set F={X YZ, ZW P, P Z, W XPQ, XYQ YW, WQ YZ}

04/20/23 Database Design 16

Redundant FDs

Page 16: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Definition

• The set of all FDs implied by a given set F of FDs is called the closure of F, and denoted as F+

• Armstrong Axioms can be applied repeatedly to infer all FDs implied by a set F of FDs

04/20/23 Database Design 17

Closure of FD Set

Given R = ABCD and F = {A → B, A → C, CD→ A}. Compute F+.

A+

F={ABC}

B+

F={B}

… AB+

F={ABC}

AC+

F={ABC}

… ABC+

F= …

Given R = XYZ and F = {XY → Z}. Compute F+.

F+= {X → X, Y → Y, Z →Z, XY →X, XY →Y, XY →XY, XY →Z XZ →X, XZ →Z, XZ →XZ, YZ →Y, YZ →Z, YZ →YY, XYZ →X, XYZ →Y, XYZ →Z, XYZ →XY, XYZ →XZ, XYZ →YZ, XYZ →XYZ,}

Consider a relation with schema R(A,B,C,D) and FD's F={AB → C, C → D, D → A} Compute F+

Page 17: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Finding all the attributes in the relation that the current attribute can determine by using inference axioms and given FD set. Its denoted by {A}+

• Given FDs set F={X YZ, ZW P, P Z, W XPQ, XYQ YW, WQ YZ}. Find the Closure of all the single attributes

04/20/23 Database Design 18

Attribute Closure

Page 18: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• A unique minimal set of attribute(s) that determine the set of other attributes in a relation

• Two properties of key are unique and minimalism • A superkey is a set of attributes that has the

uniqueness property but is not necessarily minimal• If a relation has multiple keys, specify one to be the

primary key• Convention: in a relational schema, underline the

attributes of the primary key. • If a key has only one attribute A, we say

that A rather than {A} is a key.

04/20/23 Database Design 19

Candidate Key

Page 19: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Given a relation R(ABC) and FDs set F={AB C, B D, D B}. Find the candidate keys of the relation

• Given a relation R(XYZWP) and FDs set F={Y Z, Z Y, Z W, Y P}. Find the number of candidate keys

• Consider a schema R={S,T,V,C,P,D} and F= {S → T, V → SC, SD → P}. Find keys for R

• Given a relation R(XYZWP) and FDs set F={x Z, YZ W, Z Y}. Find the number of candidate keys

04/20/23 Database Design 20

Candidate Key

Page 20: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Given two sets F and G of FDs defined over same relational schema

• A set of FD’s S ‘follows’ from a set of FD’s T if every relation instance that satisfies all the FD’s in T also satisfies all the FD’s in S

• A C follows from T = {A B, B C}.• Two sets of FD’s S and T are ‘equivalent’ if and only

if S follows from T, and T follows from S• S = {A B,B C,AC} and T={A B, B C} are

equivalent• These notions are useful in deriving new FDs from a

given set of FDs

04/20/23 Database Design 21

Equivalence of set of FDs

Page 21: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Two sets of FDs F and G defined over same relation schema are equivalent if• every FD in F can be inferred from G and • every FD in G can be inferred from F

• F Covers G if every FD in G can be inferred from F (ie if G+ is subset of F+)

• F and G are equivalent if F covers G and G covers F• If G covers F and no proper subset H of G exist such

that H+ = G+ we say G is a non-redundant cover of F

04/20/23 Database Design 22

Equivalence of set of FDs

Page 22: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

04/20/23 Database Design 23

The non-redundant cover algorithm

The Non-Redundant Cover Algorithm:Step 1: Initialize G to F. i.e. set G=FStep 2: Test every FD of G for redundancy using the Membership Algorithm until there are no more FDs of G to be testedStep 3: The set G is a non-redundant cover of F

Note:Given a set F, there may be more than one non-redundant cover since the order in which the FDs are considered is irrelevant

ProblemFind the non-redundant cover G for the setF={X YZ, ZW P, P Z, W XPQ, XYQ YW, WQ YZ}

Page 23: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Definition: A is extraneous in X Y if A can be removed fromthe left side or right side of X Y without changing the closure of F. ex. Let G = {A B C, B C, A B D }Attribute C is extraneous in the right side of A B C and attribute B is extraneous in the left side of A B D

• The set G = {A B C, B C, A B D } is neither left-reduced nor right-reduced. G1 = {A B C, B C, A D } is left-reduced but not right-reduced, while G2 = (A B. B C, A B D} is right-reduced but not left-reduced. The set G3 = {A B, B C, A D} is left and right-reduced, hence reduced

• One need to eliminate the extraneous attribute

04/20/23 Database Design 24

Extraneous Attribute

Page 24: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

04/20/23 Database Design 25

Left Reduced Algorithm

The Left-Reduced Algorithm:Step 1: Initialize a set of FDs to F. i.e. set G=FStep 2: For every A

1,A

2,…,A

i,…,A

n Y in G do step 3 until there

are no more FDS in G to which this step can be applied. The algorithm stops when all FDs of G have executed step 3

Step 3: For each attribute Ai in the determinant of FD selected in

the previous step do step 4 until all attributes have been tested. After finishing testing of all attributes of a particular FD repeat step 2Step 4: Test if all attributes of Y (the RHS of the FD under consideration) are elements of the closure of A

1,A

2,…,A

n (notice

that we have removed attribute Ai from the determinant of the

FD) with respect to the FDs of G. If this is the case remove attribute Ai from the determinant of the FD undergoing testing becos Ai is an extraneous left attribute. If not all attributes of Y are elements of the closure of A

1,A

2,…,A

n then attribute Ai is not

an extraneous left attribute and should remain in the determinant of the FD under consideration

When the algorithm finishes the set G contains a left-reduced cover set of T

Page 25: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Remove any extraneous left attributes from F={ABC, EC, DAEF, ABFBD}

• Reduce the set F={XZ, XYWP, XYZWQ, XZR} by removing extraneous left attribute

• Reduce the set F={XWY, XWZ, ZY, XYZ} by removing extraneous left attribute

04/20/23 Database Design 26

Extraneous Attribute

Tip : There is no need to consider FDs with determinant that consist of single attribute

Page 26: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• A set of FDs F is canonical if every FD in F is of the form X A and F is left-reduced and non-redundant

• Since a canonical set of FDs is non-redundant and every FD has a single attribute on the right side, it is right-reduced. Since it is also left-reduced, it is reduced

• Example: The set F = {A B, A C, A D, A E, B I J} is a canonical cover for G = {A B C E, A B D E, B I J}

04/20/23 Database Design 27

Canonical Cover

Page 27: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• A set of FDs is minimal if it satisfies the following conditions• every dependency in F has a single attribute for its RHS• we cannot remove any dependency from F and have a set of

dependencies that is equivalent to F• we cannot replace any dependency XA in F with a dependency

YA, where Y proper-subset-of X (Y subset-of X) and still have a set of dependencies that is equivalent to F

• Every set of FDs has an equivalent minimal set• There can be several equivalent minimal sets• There is no simple algorithm for computing a

minimal set of FDs that is equivalent to a set F of FDs

• To synthesize a set of relations, we assume that we start with a set of dependencies that is minimal set

04/20/23 Database Design 28

Minimal Cover

Page 28: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• We have been measuring our covers in terms of the number of FDs they contain. We can also measure them by the number of attribute symbols required to express them. example. (A B C, CD E, A C IJ> has size 10 under this measure

• Defiition: A set of FDs F is optimal if there is no equivalent set of FDs with fewer attribute symbols than F

• The set F = {EC D, AB E, E AB ) is an optimal cover for G=(ABCD,AB E, E AB }. Notice that G is reduced and minimum, but not optimal.

04/20/23 Database Design 29

Optimal Cover

Page 29: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Find canonical cover of 1. F={XZ, XYWP, XYZWQ, XZR} 2. F={XYW, XWZ, ZY, XYZ}3. F={ABC, EC, DAEF, ABFBD}4. G = {A C, A B C, C DI, CD I, EC AB, EICC}

• Find minimal cover of the following FD sets1. F={ABC, ABC, BC, AB} 2. F={AB, BA, BC, AC, CA}3. F={ABC, AB, BA}4. G = {ABCD E, E D, A B, AC D}5. F={ABC, BAC, C AB} 6. G={ABDC, CBE, AD BF, BF}

04/20/23 Database Design 30

Problems

Page 30: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Compound Functional Dependency (CFD) and Annular Covers

04/20/23 Database Design 31

Seminar change?

Page 31: CS 222  Database Management System Spring 2010-11   Lecture 4  Database Design Theory

• Good Design needs strategy• Armstrong Axioms are sound and complete• FDs are constraints • There may be a number of equivalent FD sets• FD sets may be minimized by checking the

coverage

04/20/23 Database Design 32

Summary