Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)
-
Upload
beat-signer -
Category
Education
-
view
8.969 -
download
7
description
Transcript of Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)
2 December 2005
Introduction to Databases Relational Database Design
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
Beat Signer - Department of Computer Science - [email protected]
2 March 7, 2014
Relational Database Design
There are two major relational database design
approaches
Top-down design develop a conceptual model (e.g. ER model)
reduction (mapping) of the conceptual model to relation schemas
use normalisation as a validation technique to check the quality of the resulting relation schemas
- a relational database schema resulting from the mapping of a good ER model
(with the correct entity sets) normally requires no further normalisation
Bottom-up design design by decomposition
use normalisation to iteratively create (decompose) a set of relations starting with a single relation
Beat Signer - Department of Computer Science - [email protected]
3 March 7, 2014
Relational Database Design ...
A relation schema might contain certain dependencies in
which case it should be decomposed (normalised) into
multiple smaller relation schemas this normalisation process is based on functional dependencies
and multivalued dependencies
Sometimes multiple relations resulting from an ER to
relation schema reduction might be merged to save
some join query operations we have to ensure that the resulting larger relation schema does
not introduce new undesirable dependencies
Beat Signer - Department of Computer Science - [email protected]
4 March 7, 2014
Reduction
A conceptual ER model can be reduced to a set of
relation schemas (relational database schema)
The quality of the resulting set of relation schemas
depends on the quality of the original ER design
In the following we discuss the reduction of the different
ER model concepts introduced earlier
Beat Signer - Department of Computer Science - [email protected]
5 March 7, 2014
Strong Entity Sets
A strong entity set E with only simple attributes a1,..., an is
mapped to a relation R with attributes a1,..., an the primary key of the entity set E becomes the primary key of the
relation R
Employees
id name
Employee (id, name)
id name
1234 Beat Signer
1576 Lode Hoste
3212 Sandra Trullemans
... ...
relation schema
employee = (Employee)
Beat Signer - Department of Computer Science - [email protected]
6 March 7, 2014
Composite Attributes
For each component of a composite attribute, we create
an attribute ai in the relation R no special attribute is created for the composite attribute itself
Employee (id, name, street, city)
Employees
id name address
street city
Beat Signer - Department of Computer Science - [email protected]
7 March 7, 2014
Multivalued Attributes
Multivalued attributes are treated separately since a
relation should only contain attributes with atomic values for each multivalued attribute ai of an entity set E, we create a
new relation S containing the attribute ai as well as the primary key attributes of the relation R that is created for the entity set E
- define a foreign key constraint to the original relation R
Employees
id name phone
Phones (id, phone)
id phone
1234 032 2 612 1337
1234 032 2 612 3123
1576 032 2 623 8765
... ...
phones = (Phones)
Beat Signer - Department of Computer Science - [email protected]
8 March 7, 2014
Weak Entity Sets
A weak entity set E with attributes a1,..., an is mapped to a
relation R with attributes a1,..., an combined with the pri-
mary key attributes b1,..., bm of the identifying entity set F the primary key of R is defined by the primary key attributes of the
identifying entity set F combined with the discriminator of E
a foreign key constraint is defined from the attributes b1,..., bm to the primary key of the relation that is created for the identifying entity set F
Beat Signer - Department of Computer Science - [email protected]
9 March 7, 2014
Weak Entity Sets ...
Seat (id, number, colour)
id number colour
1 1 red
1 20 black
4 1 black
... ... ...
seat = (Seat)
Offers SeatsCinemas
id name number colour
Beat Signer - Department of Computer Science - [email protected]
10 March 7, 2014
Relationship Sets
A relationship set over the entity sets E1,..., En with the
optional descriptive attributes b1,..., bm is mapped to a
relation R with the primary key attributes of E1,..., En
combined with b1,..., bm
The primary key of relation R is defined as follows binary many-to-many relationship
- union of all primary key attributes of E1 and E2
binary one-to-one relationship
- choose the primary key of E1 or E2
binary one-to-many or many-to-one relationship
- choose the primary key of the entity set on the "many" side
Beat Signer - Department of Computer Science - [email protected]
11 March 7, 2014
Relationship Sets ...
The primary key of relation R is defined as follows ... n-ary relationship without cardinality constraints
- union of all primary key attributes of E1,..., En
n-ary relationship with one 0..1 or 1..1 cardinality constraint over the entity set Ej
- union of all primary key attributes of E1,..., En , except the primary key of Ej
- note that we allow only one such 0..1 or 1..1 cardinality constraint for
n-ary relationships
A foreign key constraint is defined for each set of primary
key attributes (provided by the entity set Ei) to the
primary key of the corresponding relation that is defined
for Ei
Beat Signer - Department of Computer Science - [email protected]
12 March 7, 2014
Relationship Sets ...
LocatedAt (id, name, address, duration)
id name address duration
1234 10F721 Pleinlaan 2 1
1576 10F733 Pleinlaan 2 1
... ... ... ...
locatedAt = (LocatedAt)
LocatedAt OfficesEmployees
id name name address
duration
size
Beat Signer - Department of Computer Science - [email protected]
13 March 7, 2014
Relationship Sets ...
LocatedAt (id, name, address, duration)
id name address duration
1234 10F721 Pleinlaan 2 1
1576 10F733 Pleinlaan 2 1
... ... ... ...
locatedAt = (LocatedAt)
LocatedAt OfficesEmployees
id name name address
duration
1..1size
0..*
Beat Signer - Department of Computer Science - [email protected]
14 March 7, 2014
Weak Entity Existence Relationship
The special relationship set from a weak entity set to its
defining entity set is always a many-to-one relationship the special weak entity existence relationship does not have to be
mapped to a separate relation since it is already covered by the relation that is created for the weak entity set
- e.g. potential Offers relation schema already covered by Seat relation schema
Offers SeatsCinemas
id name number colour
Seat (id, number, colour)
Beat Signer - Department of Computer Science - [email protected]
15 March 7, 2014
Combination of Schemas
Relations resulting from the mapping of a relationship set
with a total participation constraint can be integrated with
the relation over which the constraint is defined key of the relation with the constraint (1..1) used as primary key
also works for partial relationships (have to use null values)
LocatedAt OfficesEmployees
id name name address
duration
1..1size
0..*
Employee (id, employeeName, duration, name, address) Office (name, address, size)
Beat Signer - Department of Computer Science - [email protected]
16 March 7, 2014
Specialisation and Generalisation
Create a new relation R for each entity subset combine the attributes of the entity set with the primary key
attributes of the superclass
Personsid name
Students
ISA
Teachers teachinghours
studentID
Person (id, name)
Student (id, studentID)
Teacher (id, teachingHours)
Beat Signer - Department of Computer Science - [email protected]
17 March 7, 2014
Specialisation and Generalisation ...
For a disjoint and total ISA constraint we might omit the
separate superclass relation saves some join operations but it is no longer possible to define a
foreign key constraint on the id attribute (now at two places)
Personsid name
Students
ISA
Teachers teachinghours
studentID
disjoint
Student (id, name, studentID) Teacher (id, name, teachingHours)
Beat Signer - Department of Computer Science - [email protected]
18 March 7, 2014
Aggregations
Like the regular
relationship set
mapping
note that the name
attribute is the one from the Companies
entity set
WorksFor CompaniesEmployees
id name name address
Durationsfrom to
Manages
ManagersmId name
Manages (id, from, to, name, address, mId)
Beat Signer - Department of Computer Science - [email protected]
19 March 7, 2014
Relational Database Design
The goal of relational database design is to create a set
of relation schemas that can be used to store information without unnecessary redundancy
allow us to easily retrieve information
The quality of the set of schemas resulting from a
reduction (top-down design) depends on how good the
original ER design was
In a design by decomposition approach (bottom-up
design) we need a way to reduce any redundancy via a
decomposition process split large relations into multiple smaller relations
Beat Signer - Department of Computer Science - [email protected]
20 March 7, 2014
Update Anomalies
Insertion anomaly redundant information has to be kept consistent
- e.g. insertion of a new order for an already existing CD
information about a CD can only be inserted if there is an order or we have to populate the customer information (i.e. name and street) with null values
id name street cdName price
1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90
2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90
53 Albert Einstein Bergstrasse 18 Chromatic 16.50
5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50
Order (id, name, street, cdName, price)
order = (Order)
Beat Signer - Department of Computer Science - [email protected]
21 March 7, 2014
Update Anomalies ...
Modification anomaly if we want to modify information about a particular CD, we have to
ensure that the information is updated in all redudant entries
- e.g. modification of the price of the CD named "Falling into Place"
Deletion anomaly if we delete a customer who is the only buyer of a specific CD, we
also lose the information about that specific CD
- e.g. deletion of the customer "Albert Einstein"
id name street cdName price
1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90
2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90
53 Albert Einstein Bergstrasse 18 Chromatic 16.50
5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50
Beat Signer - Department of Computer Science - [email protected]
22 March 7, 2014
Normalisation
Normalisation is a formal method to analyse relation
schemas based on their keys, functional dependen-
cies (FD) as well as multivalued dependencies (MVD) remove redundancy
prevent certain update anomalies
- insertion, modification and deletion
There exists a set of rules
to check if a relation is in a
specific normal form
original normal forms
described by Codd
Fifth Normal Form (5NF)
Fourth Normal Form (4NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Second Normal Form (2NF)
First Normal Form (1NF)
str
onger
Beat Signer - Department of Computer Science - [email protected]
23 March 7, 2014
Normalisation ...
A relation that does not conform to a certain degree of
normalisation can be decomposed (lossless-join
decomposition) into multiple relations that are in the
desired normal form can be done automatically
Normalisation is often done in a stepwise manner a higher normal form means a more restricted format and less
problems with update anomalies
note that only the first normal form (1NF) is mandatory for the relational model and all the other normal forms are optional
Beat Signer - Department of Computer Science - [email protected]
24 March 7, 2014
First Normal Form (1NF)
As we have seen earlier, the ER model supports
complex attributes composite attributes
multivalued attributes
In the reduction process, we remove this substructure
from attributes to create a relational model with atomic
attribute values only
A relation schema R is in first normal form (1NF) if the
domains D1,..., Dn of all attributes a1,..., an of R are atomic no composite attributes or attributes with a set of values
the intersection of each row and column contains one and only one value
Beat Signer - Department of Computer Science - [email protected]
25 March 7, 2014
Functional Dependencies
In this example, there are various sets of attributes that uniquely identify a set of other attributes teacherID teacher
teacherID salary
teacherID {teacher, salary}
{teacherID, teacher} {salary}
department {building, budget}
...
We say that there is a functional dependency ()
between these two sets of attributes a functional dependency should always hold on a relation schema
and not just on a particular relation instance
TeacherDept (teacherID, teacher, salary, department, building, budget)
Beat Signer - Department of Computer Science - [email protected]
26 March 7, 2014
Functional Dependencies ...
A functional dependency can be used to express
constraints (generalisation of keys) over a set of
attributes (determinant) that uniquely identify a set of
other attributes (dependent attributes)
For a relation schema R with a R and b R the
functional dependency a b holds on R, if for any r(R) " t1,t2 r(R) with t1[a] = t2[a] t1[b] = t2[b]
Note that any K R is a superkey if K R we can use functional dependencies to check whether K is a
superkey
Beat Signer - Department of Computer Science - [email protected]
27 March 7, 2014
Functional Dependencies ...
The relation r(R) contains the follow-
ing set F of functional dependencies A B
C E
...
A functional dependency a b is trivial if b a trivial dependencies are satisfied by all relations
A full functional dependency has a minimal determinant if the determinant is not minimal, we talk about a partial functional
dependency (e.g. AD B in the example)
For a relation r(R) with a b and b we say that is
transitively dependent on a via b
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d1 e2
a2 b2 c3 d1 e3
a3 b2 c4 d3 e3
r(R)
Beat Signer - Department of Computer Science - [email protected]
28 March 7, 2014
Closure of Attributes
For a given relation schema R, a number of functional
dependencies and a set of attributes a R, the closure
a+ is defined by all attributes Bi such that a Bi
Computing the closure
If the closure a+ contains all attributes of the relation
schema R, then the attributes a form a superkey of R
Initialise the set s with the attributes of a
Repeat until the set s does not grow anymore { if there is a functional dependency b and b is in s, then add to the set s }
Beat Signer - Department of Computer Science - [email protected]
29 March 7, 2014
Computation of Candidate Keys
We can test whether a is a candidate key for a given
relation schema R by checking whether the closure a+
contains all attributes of R
We can further use this approach to find all the candidate
keys for a relation schema R and a given set of functional
dependencies check for each set a R of attributes whether the closure a+
contains all attributes
the search process can be slightly optimised by starting with the smallest possible subsets
Beat Signer - Department of Computer Science - [email protected]
30 March 7, 2014
Functional Dependency Inference
For a given set F of functional dependencies we can
derrive new functional dependencies based on a set of
axioms to compute the closure F+ of F the closure F+ includes all functional dependencies that are
logically implied by F
Three rules (Armstrong's axioms) can be used to
compute F+
reflexivity
- for a given set of attributes a and b a, a b holds (see trivial dependency)
augmentation
- for given a set of attributes ; if a b then a b holds
transitivity
- if a b and b , then a holds
Beat Signer - Department of Computer Science - [email protected]
31 March 7, 2014
Functional Dependency Inference ...
Armstrong's axioms are sound (produce only elements
of F+) and complete (produce all elements in F+) since it may take a lot of time to compute F+ with Armstrong's
axioms only, there exist some additional rules
Decomposition if a b, then a b and a hold
Union if a b and a , then a b holds
Trivial dependency rules if a b, then a a b holds
if a b, then a a b holds
Beat Signer - Department of Computer Science - [email protected]
32 March 7, 2014
Second Normal Form (2NF)
A relation schema R is in second normal form (2NF)
if it is in 1NF and if there exists no non-prime attribute that
is functionally dependent on a part of a candidate key every non-prime attribute has to be fully functionally dependent on
a candidate key
a non-prime attribute is an attribute that is not part of any candidate key
the Lecturer relation schema shown in the example is not in 2NF since the office attribute functionally depends on the teacher attribute
teacher course office
Beat Signer Databases 10G731d
Beat Signer WIS 10G731d
Lode Hoste Databases 10F716
Lode Hoste ATIS 10F716
Sandra Trullemans WIS 10G731e
Lecturer (teacher, course, office)
lecturer = (Lecturer)
Beat Signer - Department of Computer Science - [email protected]
33 March 7, 2014
Second Normal Form (2NF) ...
2NF normalisation process remove any partially dependent attributes from the relation and
put them in a new relation together with their determinant
The original Lecturer relation can be losslessly
decomposed into two relations which are both in 2NF relations with single attribute keys are automatically in 2NF
teacher office
Beat Signer 10G731d
Lode Hoste 10F716
Sandra Trullemans 10G731e
Lecturer (teacher, office) Course (teacher, course)
teacher course
Beat Signer Databases
Beat Signer WIS
Lode Hoste Databases
Lode Hoste ATIS
Sandra Trullemans WIS
lecturer = (Lecturer)
course = (Course)
Beat Signer - Department of Computer Science - [email protected]
34 March 7, 2014
Lossless Decomposition
Given a relation schema R and the two decompositions
R1 and R2 of R, we say that R1 and R2 form a lossless
decomposition if pR1 (r) ⋈ pR2
(r) = r
Let F be a set of functional dependencies on R R1 and R2 form a lossless decomposition of R if either R1 R2 R1
or R1 R2 R2 are in F+
- this means that R1 R2 is a superkey of R1 or R2
Beat Signer - Department of Computer Science - [email protected]
35 March 7, 2014
Third Normal Form (3NF)
A relation schema R is in third normal form (3NF) if it
is in 2NF and no non-prime attribute is transitively de-
pendent on a candidate key, i.e. for all functional
dependencies
a b in F+ one of the following has to hold a b is a trivial functional dependency (i.e. b a)
a is a superkey of R
each attribute Ai in b - a is contained in a candidate key of R
- note that each Ai can be in different candidate keys
Each non-key attribute "must provide a fact about the
key, the whole key, and nothing but the key" [Bill Kent]
Beat Signer - Department of Computer Science - [email protected]
36 March 7, 2014
Third Normal Form (3NF) ...
The Prize relation example schema is in 2NF
The Prize relation schema is not in 3NF since birthdate
is functionally dependent on winner and non of the three
conditions holds for this functional dependency birthdate is transitively dependent on the key (award, year)
award year winner birthdate
ACM Turing Award 1981 Edgar F. Codd 23.08.1923
Nobel Peace Prize 1979 Mother Teresa 26.08.1910
ACM Turing Award 1984 Niklaus Wirth 15.02.1934
Nobel Peace Prize 1984 Desmond Tutu 07.10.1931
prize = (Prize)
Prize (award, year, winner, birthdate)
Beat Signer - Department of Computer Science - [email protected]
37 March 7, 2014
Third Normal Form (3NF) ...
3NF normalisation process remove any transitively dependent attributes from the relation and
place them in a new relation together with their determinant
Decomposition of the Prize relation schema into two 3NF
relation schemas
winner birthdate
Edgar F. Codd 23.08.1923
Mother Teresa 09.01.1959
Niklaus Wirth 15.02.1934
Desmond Tutu 07.10.1931
prize = (Prize)
Prize (award, year, winner) Birthdate (winner, birthdate)
award year winner
ACM Turing Award 1981 Edgar F. Codd
Nobel Peace Prize 1992 Mother Teresa
ACM Turing Award 1984 Niklaus Wirth
Nobel Peace Prize 1984 Desmond Tutu
bdate = (Birthdate)
Beat Signer - Department of Computer Science - [email protected]
38 March 7, 2014
Boyce-Codd Normal Form (BCNF)
The Boyce-Codd normal form is a stronger form of 3NF
A relation schema R is in Boyce-Codd Normal
Form (BCNF) if it is in 3NF and if every determinant is a
candidate key, i.e. for all functional dependencies a b
in F+ one of the following holds a b is a trivial functional dependency (i.e. b a)
a is a superkey of R
Any relation that is in BCNF is also in 3NF since the
BCNF conditions are equivalent to the first two 3NF
conditions
Beat Signer - Department of Computer Science - [email protected]
39 March 7, 2014
BCNF Decomposition
If a relation R is not in BCNF, then there exists a least
one nontrivial functional dependency a b where a is
not a superkey of R the relation R can then be decomposed into the two relation
schemas R1 (a b) and R2 (R - (b - a))
We can for example apply the BCNF decomposition to
the previous Prize relation schema example with the
functional dependency winner birthdate a b = (winner, birthdate)
(R - (b - a)) = (award, year, winner)
Further details about the algorithms for BCNF and 3NF
decomposition can be found in the course book
Beat Signer - Department of Computer Science - [email protected]
40 March 7, 2014
Multivalued Dependencies
Some relation schemas that are in BCNF may still
contain redundant information
The fourth normal form (4NF) deals with some of these
problems based on multivalued dependencies for a given relation schema R with a R and b R the
multivalued dependency a ↠ b holds if for all pairs of tuples t1 and t2 in r(R) (with t1[a] = t2[a]) there exist tuples t3 and t4 in r(R) such that
- t1[a] = t2[a] = t3[a] = t4[a]
- t3[b] = t1[b]
- t3[R - b] = t2[R - b]
- t4[b] = t2[b]
- t4[R - b] = t1[R - b]
a b R - a - b
t1 a1...ai ai+1...aj aj+1...an
t2 a1...ai bi+1...bj bj+1...bn
t3 a1...ai ai+1...aj bj+1...bn
t4 a1...ai bi+1...bj aj+1...an
Beat Signer - Department of Computer Science - [email protected]
41 March 7, 2014
Multivalued Dependencies ...
Every functional dependency is also a multivalued
dependency, e.g. if a b then a ↠ b
Beat Signer - Department of Computer Science - [email protected]
42 March 7, 2014
Fourth Normal Form (4NF)
A relation schema R is in fourth normal fom (4NF) if
it is in BCNF and if any non-trivial multivalued depen-
dency is a dependency on a candidate key, i.e. for all
multivalued dependencies a ↠ b in D+ one of the
following has to hold a ↠ b is a trivial functional dependency (i.e. b a or b a = R)
a is a superkey of R
Note that the fourth normal form is very similar to BCNF
except that we use multivalued dependencies
4NF normalisation process remove any multivalued attributes from the relation and
place them in a new relation together with their determinant
Beat Signer - Department of Computer Science - [email protected]
43 March 7, 2014
Fifth Normal Form (5NF)
There are some forms of constraints called join
dependencies that generalise multivalued dependencies leads to the project-join normal form or fifth normal form (5NF)
not discussed in detail in this course
Beat Signer - Department of Computer Science - [email protected]
44 March 7, 2014
Normalisation Summary
Relations in higher normal forms are less vulnerable to
update anomalies generally it is recommended that relations are at least in 3NF
Fifth Normal Form (5NF)
Fourth Normal Form (4NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Second Normal Form (2NF)
First Normal Form (1NF)
str
onger
Unnormalised (UN) remove repeating groups
remove partial dependencies
remove transitive dependencies
every determinant has to be a candidate key
remove multivalued dependencies
remove join dependencies
Beat Signer - Department of Computer Science - [email protected]
45 March 7, 2014
Denormalisation
Sometimes a database designer decides to store
information in a redudant way to save join operations
and improve the performance may result in additional work for insert, update and delete
operations
An alternative is to keep the normalised schema and
introduce additional materialised views
Beat Signer - Department of Computer Science - [email protected]
46 March 7, 2014
Homework
Study the following chapter of the
Database System Concepts book chapter 7
- sections 7.6 and 7.8.6
- Reduction to Relation Schemas
chapter 8
- sections 8.1-8.9
- Relational Database Design
Beat Signer - Department of Computer Science - [email protected]
47 March 7, 2014
Exercise 4
Relational algebra
Relational database design ER to relational model reduction
Beat Signer - Department of Computer Science - [email protected]
48 March 7, 2014
References
A. Silberschatz, H. Korth and S. Sudarshan,
Database System Concepts (Sixth Edition),
McGraw-Hill, 2010
2 December 2005
Next Lecture Structured Query Language (SQL)