Functional Dependency Theoryusers.cms.caltech.edu/~donnie/dbcourse/intro0607/lectures/Lecture... ·...

24
Functional Dependency Theory Winter 2006-2007 Lecture 20

Transcript of Functional Dependency Theoryusers.cms.caltech.edu/~donnie/dbcourse/intro0607/lectures/Lecture... ·...

FunctionalDependency Theory

Winter 2006-2007Lecture 20

Last Lecture• Normal forms specify “good schema” patterns• First normal form (1NF):

– All attributes must be atomic– Easy in relational model, harder in SQL

• Boyce-Codd normal form (BCNF):– Eliminates redundancy using functional dependencies– Given a relation R and a set of dependencies F– For all functional dependencies α → β in F+, whereα ∈ R and β ∈ R, at least one must hold:

• α → β is a trivial dependency• α is a superkey for R

Last Lecture (2)

• Can convert a schema into BCNF• If R is a schema not in BCNF:

– There is at least one nontrivial functional dependency α → β such that α is not a superkey for R

• Replace R with two schemas:(α ( β)(R – (β – α))

• May need to repeat decomposition process until all schemas are in BCNF

Database Constraints

• Enforcing database constraints can easily become very expensive– Especially CHECK constraints!

• Best to define database schema such that constraint enforcement is efficient

• Ideally, enforcing a functional dependency should involve only one relation– Can specify a key constraint instead of a

CHECK constraint!

Example: Personal Bankers• Give each customer a personal banker• Want to constrain customers to

one personal banker per branch– Multiple personal bankers overall

• Schema:works_in(emp_id, branch_name)cust_banker(cust_id, emp_id, type)

• Problem with design?– Doesn’t implicitly enforce this

constraint!• How to fix this problem?

customer

employee

branch

works_in

cust_banker

cust_id

cust_name

branch_name

emp_id

emp_name

branch_city

assets

type

Personal Bankers (2)• One solution: a ternary relationship

– Many-to-one mapping from(customer, employee) to branch

– Easy to enforce “one personalbanker per customer per branch”

• Schema:cust_banker_branch

(cust_id, emp_id, branch_name, type)• Problems with this design?

– Hint: employees can only work at one branch…– Branch names get repeated unnecessarily

customer

employee

branch

cust_banker_branch

cust_id

cust_name

branch_name

emp_id

emp_name

branch_city

assetstype

Personal Bankers (3)• Employees can only work at one branch

emp_id → branch_name• Schema: (cust_id, emp_id, branch_name, type)

– Not in BCNF: emp_id isn’t a superkey• Can decompose:

α = emp_id, β = branch_name(α ( β) = (emp_id, branch_name)(R – (β – α)) = (cust_id, emp_id, type)– Same as works_in and cust_banker schemas!– Can’t easily enforce “one personal banker per

customer per branch” anymore

Preserving Dependencies• Another functional dependency:

cust_id, branch_name → emp_id– (cust_id, emp_id, branch_name, type) actually implies

this constraint, but still isn’t BCNF-friendly• BCNF version of design makes it hard to enforce

this constraint– BCNF form: (emp_id, branch_name) and

(cust_id, emp_id, type)– Can’t enforce constraint on a single relation– Design is not dependency-preserving

• Solution: Third Normal Form

Third Normal Form• Slightly weaker than Boyce-Codd normal form

– Preserves more functional dependencies– Allows more repeated information

• Given:– Relation schema R– Set of functional dependencies F

• R is in 3NF with respect to F if:– For all functional dependencies α → β in F+, whereα ∈ R and β ∈ R, at least one of the following holds:

• α → β is a trivial dependency• α is a superkey for R• Each attribute A in β – α is contained in a candidate key for R

Third Normal Form (2)

• New condition:– Each attribute A in β – α is contained in a

candidate key for R• A general constraint:

– Doesn’t require a single candidate key to contain all attributes in β – α

– Just requires that each attribute in β – αappears in some candidate key in R

– …possibly even different candidate keys!

Personal Banker Example• Schema:

cust_banker_branch(cust_id, emp_id, branch_name, type)• Functional dependencies:

emp_id → branch_name– branch_name must appear in a candidate key, for the

schema to be in 3NF with respect to this dependencycust_id, branch_name → emp_id– Implies that (cust_id, branch_name) is also a

candidate key of this schema– Each employee can only work at one branch, and

(cust_id, emp_id) is a candidate key• cust_banker_branch schema is in 3NF

Multivalued Attributes and BCNF

• Functional dependencies aren’t always sufficient to eliminate all redundancy– Redundancy due to multivalued attributes– Can’t represent multivalued attributes with functional

dependencies, and eliminate unnecessary redundancy• Example: employee details• Generates schema:

(emp_id, dependent)(emp_id, phone_num)

• Could also create this schema:(emp_id, dependent, phone_num)

employee

emp_id

emp_name

dependent phone_num

Multivalued Attributes, BCNF (2)• Combined schema: (emp_id, dependent, phone_num)

– Storing multiple dependents and multiple phone numbers generates redundant values

– Example: (125623, Jeff, 555-8888)(125623, Bob, 555-8888)(125623, Jeff, 555-2222)(125623, Bob, 555-2222)

• Without functional dependencies, can’t use BCNF or 3NF to reason about multivalued attributes

• Need to define other kinds of dependencies to handle these situations– Other normal forms take them into account

Functional Dependency Theory• Important to be able to reason about functional

dependencies!• Main question:

– What functional dependencies are implied by a set Fof functional dependencies?

• Other useful questions:– Which attributes are functionally determined by a

particular attribute-set?– What set of functional dependencies must actually be

enforced in a database?– Is a particular schema decomposition lossless?– Does a decomposition preserve dependencies?

Rules of Inference• Given a set F of functional dependencies

– Actual dependencies in F may be insufficient– Must consider all dependencies logically implied by F

• For a relation schema R– A functional dependency f on R is logically implied by

F on R if every relation instance r(R) that satisfies Falso satisfies f

• Example:– Relation schema R(A, B, C, G, H, I)– Dependencies:

A → B, A → C, CG → H, CG → I, B → H– Logically implies: A → H, CG → HI, AG → I

Rules of Inference (2)• Axioms are rules of inference for dependencies• This group is called Armstrong’s axioms• Greek letters α, β, γ, … represent attribute sets• Reflexivity rule:

If α is a set of attributes and β ⊆ α, then α → β holds.• Augmentation rule:

If α → β holds, and γ is a set of attributes, then γα → γβholds.

• Transitivity rule:If α → β holds, and β → γ holds, then α → γ holds.

Computing Closure of FCan use Armstrong’s axioms to compute F+ from F• F is a set of functional dependencies

F+ = Frepeat

for each functional dependency f in F+

apply reflexivity, augmentation rules to fadd resulting functional dependencies to F+

for each pair of functional dependencies f1 , f2 in F+

if f1 and f2 can be combined using transitivityadd resulting functional dependency to F+

until F+ stops changing

Armstrong’s Axioms

• Axioms are sound– They don’t generate any incorrect functional

dependencies• Axioms are complete

– Given a set of functional dependencies F, repeated application generates all F+

• F+ could be very large– LHS and RHS of a dependency are subsets of R– A set of size n has 2n subsets– 2n × 2n = 22n possible functional dependencies in R !

More Rules of Inference

• Additional rules can be proven from Armstrong’s axioms– These make it easier to generate F+

• Union rule:If α → β holds, and α → γ holds, then α → βγ holds.

• Decomposition rule:If α → βγ holds, then α → β holds and α → γ holds.

• Pseudotransitivity rule:If α → β holds, and γβ → δ holds, then αγ → δ holds.

Attribute-Set Closure

• How to tell if an attribute-set α is a superkey?– If α → R then α is a superkey.– What attributes are functionally determined by

attribute-set α ?• Given:

– Attribute-set α– Set of functional dependencies F– The set of all attributes functionally determined by α

under F is called the closure of α under F– Written as α+

Attribute-Set Closure (2)• Easy to compute closure of attribute-set α

– Algorithm is simple but slow• Inputs:

– attribute-set α– set of functional dependencies F

α+ = αrepeat

for each functional dependency β → γ in Fif β ⊆ α+ then α+ = α+ ( γ

until α+ stops changing

Attribute-Set Closure (3)

• Can easily test if α is a superkey– Compute α+

– If α+ ⊇ R then α is a superkey of R• Can also use with functional dependencies

– α → β holds if β ⊆ α+

– Can compute F+ too:• For each γ ⊆ R, find closure γ+ under F• For each S ⊆ γ+, add functional dependency γ → S

Attribute-Set Closure Example

• Relation schema R(A, B, C, G, H, I)– Dependencies:

A → B, A → C, CG → H, CG → I, B → H• Is AG a superkey of R ?• Compute (AG)+

– Start with α+ = AG– A → B, A → C cause α+ = ABCG– CG → H, CG → I cause α+ = ABCGHI

• AG is a superkey of R !

Review• BCNF has a few weaknesses

– Doesn’t preserve all functional dependencies– Sometimes requires non-dependency-preserving

decompositions• 3NF is a weakened version of BCNF

– Allows non-trivial dependencies α → β if attributes in β – α also appear in candidate keys

– Allows more redundant data than BCNF• Began discussing functional dependency theory

– Rules of inference for functional dependencies– How to compute closures, and how to use them