Theory of dependencies in relational database

27
THEORY OF DEPENDENCIES IN RELATIONAL DATABASE

Transcript of Theory of dependencies in relational database

Page 1: Theory of dependencies in relational database

THEORY OF DEPENDENCIES IN RELATIONAL DATABASE

Page 2: Theory of dependencies in relational database

Overview

• Introduction • Characteristics Of “BAD” Schema• What Is Functional Dependency?• Armstrong’s Reference Rules• Equivalence & Minimal Cover• Normalization• Normalization Types And Details• BCNF• Higher Normal Forms• De-Normalization• Multi-valued Dependencies(MVD)• Join Dependencies• Inclusion Dependencies• Conclusion• References

Page 3: Theory of dependencies in relational database

INTRODUCTION

• The main aim for Database Design is coming up with “GOOD” schema.• Problem- 1.How do we characterize the “GOODNESS” of a schema?

2.If two or more alternative schemas are available , how do we compare them?

3.What are the problems with “BAD” schema?

• An example-

Page 4: Theory of dependencies in relational database

Characteristics of “BAD” schema

• Redundant storage of DATA - Office Phone & HOD info – stored redundantly-wastage of disk space

• A program that updates Office Phone of a department must change it at several places - more running time & error prone

ANOMALIES-

a. Insertion anomaly - No way of inserting info about a new department unless we also enter details of a (dummy) student in department.

b. Deletion anomaly – If all students of a certain department leave and we delete their tuples , information about department itself is lost .

c. Update anomaly – Updating office phone of a department 1. value in several tuples need to be changed 2.if a tuple is missed-inconsistency in data

Page 5: Theory of dependencies in relational database

What is functional dependency?

• Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs• FDs and keys are used to define normal forms for relations.

NORMAL FORMS - 1. Each NF specifies certain conditions. 2. If the conditions are satisfied by the schema certain kind of problems are avoided

Consider the schemaStudent(s.name,rollno.,gender,dept,h.name,roomno.}

Since rollno. Is a key,Rollno. →{s.name,gender,dept, h.name,roomno.}

Let each student is given a hostel room,Then h.name,roomno. → rollno.

Page 6: Theory of dependencies in relational database

More about functional dependency…

Page 7: Theory of dependencies in relational database

Armstrong’s reference rules

Page 8: Theory of dependencies in relational database

Sound & complete inference rules

•Armstrong shows that Rules 1,2,3 are sound & Complete. •These are called Armstrong’s Axioms(AA)

SOUNDNESS-

•Every new FD X → Y Derived from a given set of FDs F using AA is such that F {X → Y)╞

Page 9: Theory of dependencies in relational database

Sound & complete inference rules(2)

COMPLETENESS-

• Any FD X→Y logically implied by F (i.e. F ╞ {X→Y} ) can be derived from F using AA

CLOSURE OF A SET OF FDs-

• Closure of a set of FDs is the set F+ of all the FDs that can be inferred from F.• Closure of a set of attributes X w.r.t F is the set of X + of all attributes that are Functionally determined by XEx- P{a, b, c, d, e, f} set of FDs F on it, as follows: F={a → d, b →{e, f}, {a, b }→ c} F+ :the closure of F a + ={a, d} b + ={b, e, f} {a, b} + ={a, b, c, d, e, f}

Page 10: Theory of dependencies in relational database

Equivalence & minimal cover

• EQUIVALENCE of sets of FDs: Two sets of FDs F & G are equivalent if F =G i.e. Every FD in F can be inferred from G & every FD in G can be inferred from F.

• EXTRANEOUS ATTRIBUTE: The removal of which attribute doesn’t change F + . Ex- Given F={A → C, AB → C} B is extraneous in AB → C as A → C logically implies AB → C .

• MINIMAL COVER: A minimal cover of a set of FDs G is a minimal set of dependencies F that is equivalent to E. Here F + =G +, if we modify G by deleting an FD or by deleting attribute From an FD in G, the closure changes. RHS of each FD in G is a single attribute. Ex-{A → B, ABCD → E, EF → GH, ACDF → EG} has the following minimal Cover: {A → B, ACD → E, EF → G, EF → H}

Page 11: Theory of dependencies in relational database

Normalization

Boyce-Codd and

Higher

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Full Functional dependencyof nonkey attributes on the primary key

No transitive dependency between nonkey attributes

All determinants are candidate keys - Single multivalued dependency

Page 12: Theory of dependencies in relational database

Normalization (2)

• Un-normalized relations: First step in normalization is to convert the data into 2D table. Data can be repeated within a column.

• First Normal Form (1 NF) Only atomic values at each row and column.

• Second Normal Form (2 NF) A relation is said to be in Second Normal Form when every non-key attribute is fully functionally dependent on the primary key.

Applicable for composite key & when there is composite key , there may exist partial FD, which 2NF denies, So to get 2NF we have to Decompose it into Relation schema.

After Decomposition , it is Lossless or NOT should be verified.

Page 13: Theory of dependencies in relational database

Normalization (3) – 2 NF

• Full Functional Dependency:

A FD X → Y is said to be a FULL FD if after removal of any attribute from X, the FD doesn’t hold good anymore.

• Partial Functional Dependency:

A FD X → Y is partial FD if {X-A} → Y is also true.

• Decomposition:

Let R=(A,B,C,D)

X=(P,Q,S,T) st. R= P υ Q υ S υ T

Replacing R by P,Q,S,T- process of decomposing R

Page 14: Theory of dependencies in relational database

Normalization(4)-2 NF

DESIRABLE PROPERTIES OF DECOMPOSITION:

• Not all Decomposition of a schema are useful.• We require two properties to be satisfied.

Lossless join property- The information in an instance r of R must be preserved in the instances .

* If R is decomposed into P , Q and P ∩ Q ≠ Φ , then it is lossless.

Dependency preserving property:- if a set F of dependencies hold on R it should be possible to enforcing appropriate dependencies on each r.

Page 15: Theory of dependencies in relational database

2 NF - Example

• EID → Name, Address, Birthdate• EID, Pname → StartDate• Candidate key is {EID, PName}. • The nonprime attributes are Name, Address, Birthdate, StartDate. • Nonprime attributes Name, Address, Birthdate violate 2NF because they are functionally dependent

Page 16: Theory of dependencies in relational database

Normalization(5)-3 NF

• 2NF, plus no transitive functional dependencies.• Given three attributes in a relation A, B, C, if A B and B C, this forms a transitive functional dependency.• Avoid transitive dependencies for 3NFEx-

Here, Customer_ID Salesperson, and Salesperson Region, cause a transitive dependency

Page 17: Theory of dependencies in relational database

Solution:

Page 18: Theory of dependencies in relational database

Boyce-codded normal form

• Most 3NF relations are also BCNF relations.• A 3NF relation is NOT in BCNF if:

Candidate keys in the relation are composite keys (they are not single attributes)

There is more than one candidate key in the relation, and The keys are not disjoint, that is, some attributes in the keys are

common

Patient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 19: Theory of dependencies in relational database

Multi-valued dependencies(MVD)

Page 20: Theory of dependencies in relational database

Higher normal forms

Fourth Normal Form ( 4 NF)• Any relation is in Fourth Normal Form if it is BCNF and any multi-valued dependencies are trivial• Eliminate non-trivial multi-valued dependencies by projecting into simpler tables

JOIN DEPENDENCIES• A join dependency denoted by JD (R1,R2,R3,……Rn), specified on relational schema R specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a non-additive join decomposition into R1,R2,….. Rn NOTE - An MVD is a special case of JD where n=2 i.e. a JD denoted as JD (R1,R2) implies an MVD (R1∩R2) →→(R1-R2) Fifth Normal Form• A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation.• Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

Page 21: Theory of dependencies in relational database

De-normalization

• De-normalization is the process of modifying a perfectly normalized database design for performance reasons.

• It is a natural and necessary part of database design, but must follow proper normalization.

• It always makes your system potentially less efficient and flexible.

So de-normalize as needed, but not frivolously.

Page 22: Theory of dependencies in relational database

De-normalization

Customer IDAddressNameTelephone

Order Order NoDate TakenDate DispatchedDate InvoicedCust ID

Before:Customer IDAddressNameTelephone

Order Order NoDate TakenDate DispatchedDate InvoicedCust IDCust Name

After:

Page 23: Theory of dependencies in relational database

Inclusion dependency

• The foreign key(or referential integrity)constraint can not be specified as a functional or multi-valued dependency because it relates attributes across relations.

• An ID R.X<S.Y between two sets of attributes – X of relation schema R & y of relation schema S – specifies the constraint that at any specific time when r is a relation state of R and s a relation state of S , we have

╥y(s(S)) ⊇ ╥x(r(R))  Condition

• X of R and Y of S must have same no. of attribute.• The domains for each pair of corresponding attribute should be compatible.

So far no normal form have been developed based on ID

Page 24: Theory of dependencies in relational database

Conclusion

• After we have the ER diagrams each relation in the schema must be independently reviewed and normalized when needed.

• Functional dependencies are the building blocks that enable the analysis of data redundancy and the elimination of anomalies caused by data redundancy through the process of normalization

• Normalization is a technique that facilitates systematic validation of participation of attributes in a relation schema from a perspective of data redundancy.

• This process gives us the final opportunity to correct errors and establish a robust design before implementing the database system

Page 25: Theory of dependencies in relational database

References

• Fundamentals of Database systems,5th edition by Ramez Elmasari, Shamkant B. Navathe

• Database system concepts by A. Seilberschatz, H. korth, S Sudersan

• An introduction to Database system by C.J. Date

• Lotito, J. (2001). Concepts of Database Design and Management. Retrived September 2007 from http://www.sitepoint.com/article/database-design-management

• Scamell, R.W., & Umanath N.S. (2007). Data Modeling and Database Design: Boston, MA: Thomson

Page 26: Theory of dependencies in relational database

Questions ???

Page 27: Theory of dependencies in relational database

Thank you…