6 - Database Design Theory-1

download 6 - Database Design Theory-1

of 31

description

database design theory

Transcript of 6 - Database Design Theory-1

  • CS 222 Database Management System

    Spring 2010-11

    Lecture 4 Database Design Theory

    Korra Sathya BabuDepartment of Computer Science

    NIT Rourkela

    1

  • Database Design Problem Redundancy and Anomaly

    Functional Dependency Axioms, Logical Implications of FDs, Redundant FDs, Closure, Equivalence

    of FDS, extraneous attributes, covers

    Decomposition Rules of Decomposition, Test for Lossless Join and Dependency

    Preservation

    Normalization Normal Forms, Multivalued Dependency, Join Dependency,

    Denormalization

    Unit Overview

    2

  • Modeling

    Movies Stars-In Stars

    length filmType

    title year

    OwnsStudios

    name address

    name

    address

    3

  • Automatic mappings from E/R to relations may not produce the best relational design possible

    Suggested Design Strategy Real-world to E/R model to Relational schema to Better relational schema

    to Relational DBMS

    Database designers sometimes go directly from Real-world to Relational schema, in which case the relational design could be really bad.

    Many problems may arise if the design is not careful

    Relational Database Design

    4

  • Redundancy

    Insertion Anomaly

    Updation Anomaly

    Deletion Anomaly

    Problems using bad design

    5

  • Definition A functional dependency (FD) has the form XY, where X

    and Y are sets of attributes in a relation R. Formally, XYmeans that whenever two tuples in R agree on all the attributes of X, they must also agree on all the attributes of Y.

    Movies (title, year, length, filmtype,studioName, starName)

    FDs we can reasonable assert are: Title, year length ; Title, year filmType; Title, year studioName

    Trivial Dependency

    Trivial: A fd A1A

    2A

    n B is said to be trivial if B is one of

    the As. ex. title year title

    Nontrivial: atleast one of the Bs not among As. ex. title year year length

    Completely nontrivial: none of the Bs are part of As. ex. title year length

    Functional Dependency

    7

  • An FD A1 A2 . . .An -> B1 B2 . . .Bm is trivial if the Bs are a subset of the As {B1,B2, . . . Bn} subset {A1,A2, . . . An}

    Its non-trivial if at least one B is not among the As, i.e., {B1,B2, ... Bn} {A1,A2, ...An}

    Its completely non-trivial if none of the Bs are among the As, {B1,B2, ... Bn} Intersect {A1,A2, ...An} =

    Trivial dependency rule: The FD A1 A2 . . .An B1 B2 . . .Bm is equivalent to the FD A1 A2. . .An C1 C2 . . . Ck , where the Cs are those Bs that are not As, i.e., {C1, C2, . ., Ck} = {B1,B2, . ,Bm} {A1,A2,.. ,An}

    Trivial Dependency

    8

  • Reflexivity

    (If X Y, then X Y)

    Augumentation (If X Y, then XZ YZ for any Z)

    Transitivity (If X Y and Y Z, then X Z)

    Armstrong Axioms

    Armstrong axioms are sound and complete

    Sound They generate only FDs in F+ when applied to a set of FDs

    Complete They when repeatedly applied, these rules will generate all

    FDs in F+

    9

  • Finding the closure of an FD Set may be tedious. So more rules may be derived from Armstrong Axioms

    The Union Rule, Pseudotransitive Rule and Decompostion Rule are Sound but not complete

    Union Rule (If X Y and X Z, then X YZ)

    More Inference Axioms

    Given xy, xzAugument x to xy and y to xz

    xx xy ; xy yz x xy ; xy yz x yz [using transitive Axiom; x xy, xy yz]

    10

  • Pseudotransitive Rule (If X Y and YW Z, then XW Z)

    Decomposition Rule (If X YZ then X Y and then X Z)

    More Inference Axioms

    Given xy, ywzAugument w to xy

    xw yw xw z [using transitive Axiom; xw yw, yw z]

    Lets prove from the back onwardsAssume xy and xz is givenTake x y and augument with x

    xx yx We already have xz, So replace x with z in determinee xx yx xyz

    11

  • Let F be the following set of functional dependencies: {ABCD, BDE, CF, EG, AB}. Use Armstrongs axioms to show that {AFG} is logically implied by F

    Logical Implications of FDs

    12

  • Used to determine if a relation R satisfies or doesnt satisfy a given FD: AB

    Input: Relation R and an FD: A B

    Output: TRUE if R satisfies A B, otherwise FALSE

    The Satisfies Algorithm

    The Satisfies Algorithm:Step 1: Sort the tuples of the relation R on the attribute(s) A (determinant) so that tuples with equal values under A are next to each otherStep 2: Check that tuples with equal values under A also have equal values under attribute(s) BStep 3: If any two tuples of R meet condition 1 but fail to meet condition 2 the output of the algorithm is FALSE. Otherwise, the relation satisfies the Functional Dependency and the output of the algorithm is TRUE

    In short the satisfies algorithm can be stated as: The relation R satisfies the FD: AB if the following holds for every pair of tuples t1 and t2in R, if t1.A = t2.A then t1.B = t2.B

    13

  • Given a set F of FDs, a FD AB of F is said to be redundantw.r.t the FDs of F if and only if AB can be derived from the set of FDs F-{AB}

    Eliminating Redundant FDs allows us to minimize the set of FDs

    Membership Algorithm helps to determine the Redundant FDs.

    Input : A set F of FDs and a particular FD of F that is being tested

    Output: FD is Redundant or not

    Redundant FDs

    14

  • Assume F is a set of FDs with AB F

    Redundant FDs

    The Membership Algorithm:Step 1: Remove temporarily AB from F and initialize the set of FDs G to F. ie. Set G=F-{AB}. If G proceed to step 2; otherwise stop executing the algorithm since AB is non redundant

    Step 2: Initialize the set of attributes Ti (with i=1) with the set of attribute(s) A(the determinant of the FD under consideration). ie. Set Ti = T1 = {A}. The set T1 is the current Ti

    Step 3: In the set G search for FDs XY such that all the attributes of the determinant X are elements of the current set Ti. There are two possible outcomes

    Step 3a: If such FD is found, add the attribute of Y (right hand side of FD) to set Tiand form a new Set Ti + 1= Ti U Y. The Set Ti + 1 is the current Ti . Check if all the attributes of B (the right hand side of FD under consideration) are members of Ti + 1. If this is the case, stop executing algorithm becos the FD:AB is redundant. If not all attributes of B are members of Ti + 1 , remove XY from G and repeat step 3Step 3a: If G= or there are no FDs in G that have all the attributes of its determinant in the current Ti then AB is not redundant

    15

  • Given the set F={x YW, XW Z, Z Y, XY Z}. Determine if the FD XY Z is redundant in F

    Eliminate redundant FDs from F={XY, Y X, Y Z, Z Y, X Z, Z X} using the Membership algorithm

    Find the redundant FDs in the set F={XYZ, ZW P, P Z, W XPQ, XYQ YW, WQ YZ}

    Redundant FDs

    16

  • Definition

    The set of all FDs implied by a given set F of FDs is called the closure of F, and denoted as F+

    Armstrong Axioms can be applied repeatedly to infer all FDs implied by a set F of FDs

    Closure of FD Set

    Given R = ABCD and F = {A B, A C, CD A}. Compute F

    +.

    A+

    F={ABC}

    B+

    F={B}

    AB

    +

    F={ABC}

    AC+

    F={ABC}

    ABC

    +

    F=

    Given R = XYZ and F = {XY Z}. Compute F

    +.

    F+= {X X, Y Y, Z Z,

    XY X, XY Y, XY XY, XY ZXZ X, XZ Z, XZ XZ, YZ Y, YZ Z, YZ YY, XYZ X, XYZ Y, XYZ Z,XYZ XY, XYZ XZ, XYZ YZ,XYZ XYZ,}

    Consider a relation with schema R(A,B,C,D) and FD's F={AB C, C D, D A} Compute F

    +

    17

  • Finding all the attributes in the relation that the current attribute can determine by using inference axioms and given FD set. Its denoted by {A}+

    Given FDs set F={XYZ, ZW P, P Z, W XPQ, XYQ YW, WQ YZ}. Find the Closure of all the single attributes

    Attribute Closure

    18

  • A unique minimal set of attribute(s) that determine the set of other attributes in a relation

    Two properties of key are unique and minimalism

    A superkey is a set of attributes that has the uniqueness property but is not necessarily minimal

    If a relation has multiple keys, specify one to be the primary key

    Convention: in a relational schema, underline the attributes of the primary key.

    If a key has only one attribute A, we say

    that A rather than {A} is a key.

    Candidate Key

    19

  • Given a relation R(ABC) and FDs set F={ABC, B D, D B}. Find the candidate keys of the relation

    Given a relation R(XYZWP) and FDs set F={Y Z, Z Y, Z W, Y P}. Find the number of candidate keys

    Consider a schema R={S,T,V,C,P,D} and F= {S T, V SC, SD P}. Find keys for R

    Given a relation R(XYZWP) and FDs set F={x Z, YZ W, Z Y}. Find the number of candidate keys

    Candidate Key

    20

  • Given two sets F and G of FDs defined over same relational schema

    A set of FDs S follows from a set of FDs T if every relation instance that satisfies all the FDs in T also satisfies all the FDs in S

    A C follows from T = {A B, B C}.

    Two sets of FDs S and T are equivalent if and only if S follows from T, and T follows from S

    S = {A B,B C,AC} and T={A B, B C} are equivalent

    These notions are useful in deriving new FDs from a given set of FDs

    Equivalence of set of FDs

    21

  • Two sets of FDs F and G defined over same relation schema are equivalent if

    every FD in F can be inferred from G and

    every FD in G can be inferred from F

    F Covers G if every FD in G can be inferred from F (ie if G+ is subset of F+)

    F and G are equivalent if F covers G and G covers F

    If G covers F and no proper subset H of G exist such that H+ = G+ we say G is a non-redundant cover of F

    Equivalence of set of FDs

    22

  • The non-redundant cover algorithm

    The Non-Redundant Cover Algorithm:Step 1: Initialize G to F. i.e. set G=FStep 2: Test every FD of G for redundancy using the Membership Algorithm until there are no more FDs of G to be testedStep 3: The set G is a non-redundant cover of F

    Note:Given a set F, there may be more than one non-redundant cover since the order in which the FDs are considered is irrelevant

    ProblemFind the non-redundant cover G for the setF={X YZ, ZW P, P Z, W XPQ, XYQ YW, WQ YZ}

    23

  • Definition: A is extraneous in X Y if A can be removed from

    the left side or right side of X Y without changing the closure of F. ex. Let G = {A B C, B C, A B D }

    Attribute C is extraneous in the right side of A B C and attribute B is extraneous in the left side of A B D

    The set G = {A B C, B C, A B D } is neither left-reduced nor right-reduced. G1 = {A B C, B C, A D } is left-reduced but not right-reduced, while G2 = (A B. B C, A B D} is right-reduced but not left-reduced. The set G3 = {A B, B C, A D} is left and right-reduced, hence reduced

    One need to eliminate the extraneous attribute

    Extraneous Attribute

    24

  • Left Reduced Algorithm

    The Left-Reduced Algorithm:Step 1: Initialize a set of FDs to F. i.e. set G=FStep 2: For every A

    1,A

    2,,A

    i,,A

    n Y in G do step 3 until there are no

    more FDS in G to which this step can be applied. The algorithm stops when all FDs of G have executed step 3Step 3: For each attribute A

    iin the determinant of FD selected in the

    previous step do step 4 until all attributes have been tested. After finishing testing of all attributes of a particular FD repeat step 2Step 4: Test if all attributes of Y (the RHS of the FD under consideration) are elements of the closure of A

    1,A

    2,,A

    n(notice that we have removed

    attribute Aifrom the determinant of the FD) with respect to the FDs of G. If

    this is the case remove attribute Ai from the determinant of the FD undergoing testing becos Ai is an extraneous left attribute. If not all attributes of Y are elements of the closure of A

    1,A

    2,,A

    nthen attribute Ai is

    not an extraneous left attribute and should remain in the determinant of the FD under consideration

    When the algorithm finishes the set G contains a left-reduced cover set of T25

  • Remove any extraneous left attributes from F={ABC, EC, DAEF, ABFBD}

    Reduce the set F={XZ, XYWP, XYZWQ, XZR} by removing extraneous left attribute

    Reduce the set F={XWY, XWZ, ZY, XYZ} by removing extraneous left attribute

    Extraneous Attribute

    Tip : There is no need to consider FDs with determinant that consist of single

    attribute

    26

  • A set of FDs F is canonical if every FD in F is of the form X A and F is left-reduced and non-redundant

    Since a canonical set of FDs is non-redundant and every FD has a single attribute on the right side, it is right-reduced. Since it is also left-reduced, it is reduced

    Example: The set F = {A B, A C, A D, A E, B I J} is a canonical cover for G = {A B C E, A B D E, B I J}

    Canonical Cover

    27

  • A set of FDs is minimal if it satisfies the following conditions every dependency in F has a single attribute for its RHS

    we cannot remove any dependency from F and have a set of dependencies that is equivalent to F

    we cannot replace any dependency XA in F with a dependency YA, where Y proper-subset-of X (Y subset-of X) and still have a set of dependencies that is equivalent to F

    Every set of FDs has an equivalent minimal set

    There can be several equivalent minimal sets

    There is no simple algorithm for computing a minimal set of FDs that is equivalent to a set F of FDs

    To synthesize a set of relations, we assume that we start with a set of dependencies that is minimal set

    Minimal Cover

    28

  • We have been measuring our covers in terms of the number of FDs they contain. We can also measure them by the number of attribute symbols required to express them. example. (A B C, CD E, A C IJ> has size 10 under this measure

    Defiition: A set of FDs F is optimal if there is no equivalent set of FDs with fewer attribute symbols than F

    The set F = {EC D, AB E, E AB ) is an optimal cover for G=(ABCD,AB E, E AB }. Notice that G is reduced and minimum, but not optimal.

    Optimal Cover

    29

  • Find canonical cover of 1. F={XZ, XYWP, XYZWQ, XZR}

    2. F={XYW, XWZ, ZY, XYZ}

    3. F={ABC, EC, DAEF, ABFBD}

    4. G = {A C, A B C, C DI, CD I, EC AB, EICC}

    Find minimal cover of the following FD sets1. F={ABC, ABC, BC, AB}

    2. F={AB, BA, BC, AC, CA}

    3. F={ABC, AB, BA}

    4. G = {ABCD E, E D, A B, AC D}

    5. F={ABC, BAC, C AB}

    6. G={ABDC, CBE, AD BF, BF}

    Problems

    30

  • Compound Functional Dependency (CFD) and Annular Covers

    Seminar change?

    31

  • Good Design needs strategy

    Armstrong Axioms are sound and complete

    FDs are constraints

    There may be a number of equivalent FD sets

    FD sets may be minimized by checking the coverage

    Summary

    32