Logical and Physical Design
Transcript of Logical and Physical Design
-
7/28/2019 Logical and Physical Design
1/103
Logical And PhysicalDatabase Design
KRISNA ADIYARTA
PASCA SARJANA (MAGISTER KOMPUTER)
UNIVERSITAS BUDI LUHUR
JAKARTA
1. Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden. Modern Database Management. 4th
Edition. Upper Saddle River, New Jersey:Prentice Hall (Pearson Educational, Inc), 2005
2. A. Silberschatz, H.F. Korth, S. Sudarshan. "Database System Concepts (4th Edition). McGraw-Hill, 2002.
-
7/28/2019 Logical and Physical Design
2/103
Objectives Definition of terms
List five properties of relations State two properties of candidate keys Define first, second, and third normal form
Describe problems from merging relationsTransform E-R and EER diagrams to relations Create tables with entity and relational integrity
constraints Use normalization to convert anomalous tables
to well-structured relations
-
7/28/2019 Logical and Physical Design
3/103
Definition of terms
Describe the physical database design process Choose storage formats for attributes
Select appropriate file organizations
Describe three types of file organization Describe indexes and their appropriate use
Translate a database model into efficientstructures
Know when and how to use denormalization
Objectives
-
7/28/2019 Logical and Physical Design
4/103
Relation Definition: A relation is a named, two-dimensional table of
data
Table consists of rows (records) and columns (attribute orfield)
Requirements for a table to qualify as a relation: It must have a unique name
Every attribute value must be atomic (not multivalued, notcomposite)
Every row must be unique (cant have two rows with exactly thesame values for all their fields)
Attributes (columns) in tables must have unique namesThe order of the columns must be irrelevantThe order of the rows must be irrelevant
NOTE: all relations are in 1st Normal form
-
7/28/2019 Logical and Physical Design
5/103
Correspondence with E-R Model
Relations (tables) correspond with entity typesand with many-to-many relationship types
Rows correspond with entity instances and with
many-to-many relationship instances Columns correspond with attributes
NOTE: The word relation (in relationaldatabase) is NOT the same as the word
relationship (in E-R model)
-
7/28/2019 Logical and Physical Design
6/103
Key Fields Keys are special fields that serve two main purposes:
Primary keys are unique identifiers of the relation in question.
Examples include employee numbers, social security numbers,etc. This is how we can guarantee that all rows are unique
Foreign keys are identifiers that enable a dependent relation(on the many side of a relationship) to refer to its parent relation
(on the one side of the relationship) Keys can be simple (a single field) or composite (more
than one field)
Keys usually are used as indexes to speed up theresponse to user queries
-
7/28/2019 Logical and Physical Design
7/103
Schema for four relations (Pine Valley Furniture Company)
Primary Key
Foreign Key(implements 1:N relationshipbetween customer and order)
Combined, these are acompositeprimary key(uniquely identifies the
order line)individually they areforeign keys(implement M:Nrelationship between order and product)
-
7/28/2019 Logical and Physical Design
8/103
Integrity Constraints
Domain ConstraintsAllowable values for an attribute. Entity
Integrity
No primary key attribute may be null. Allprimary key fields MUST have data
-
7/28/2019 Logical and Physical Design
9/103
Domain definitions enforce domain integrity constraints
-
7/28/2019 Logical and Physical Design
10/103
Integrity Constraints
Referential Integrityrule states that any foreign key value
(on the relation of the many side) MUST match a primarykey value in the relation of the one side. (Or the foreignkey can be null)
For example: Delete Rules Restrictdont allow delete of parent side if related rows exist
in dependent side
Cascadeautomatically delete dependent side rows that
correspond with the parent side row to be deleted Set-to-Nullset the foreign key in the dependent side to null if
deleting from the parent side not allowed for weak entities
-
7/28/2019 Logical and Physical Design
11/103
Referential integrity constraints
Referentialintegrity
constraints aredrawn via arrowsfrom dependent to
parent table
-
7/28/2019 Logical and Physical Design
12/103
Referentialintegrity
constraints areimplemented with
foreign key to
primary keyreferences
-
7/28/2019 Logical and Physical Design
13/103
Transforming EER Diagrams into Relations
Mapping Regular Entities to Relations
1. Simple attributes: E-R attributes map directlyonto the relation
2. Composite attributes: Use only their simple,
component attributes3. Multivalued AttributeBecomes a separate
relation with a foreign key taken from thesuperior entity
-
7/28/2019 Logical and Physical Design
14/103
(a) CUSTOMER
entity type withsimple
attributes
Mapping a regular entity
(b) CUSTOMER relation
-
7/28/2019 Logical and Physical Design
15/103
(a) CUSTOMER
entity type withcompositeattribute
Mapping a composite attribute
(b) CUSTOMER relation with address detail
-
7/28/2019 Logical and Physical Design
16/103
Mapping an entity with a multivalued attribute
Onetomany relationship between original entity and new relation
(a)
Multivalued attribute becomes a separate relation with foreign key
(b)
-
7/28/2019 Logical and Physical Design
17/103
Transforming EER Diagrams into Relations (cont.)
Mapping Weak Entities
Becomes a separate relation with aforeign key taken from the superior
entityPrimary key composed of:
Partial identifier of weak entity
Primary key of identifying relation (strongentity)
-
7/28/2019 Logical and Physical Design
18/103
a) Weak entity DEPENDENT
Example of mapping a weak entity
-
7/28/2019 Logical and Physical Design
19/103
NOTE: the domain constraintfor the foreign key should
NOT allownull value ifDEPENDENT is a weakentity
Foreign key
Composite primary key
Example of mapping a weak entity (cont.)
b) Relations resulting from weak entity
-
7/28/2019 Logical and Physical Design
20/103
Transforming EER Diagrams into Relations (cont.)
Mapping Binary Relationships
One-to-ManyPrimary key on the one sidebecomes a foreign key on the many side
Many-to-ManyCreate a new relation with
the primary keys of the two entities as itsprimary key
One-to-OnePrimary key on the mandatoryside becomes a foreign key on the optionalside
-
7/28/2019 Logical and Physical Design
21/103
Example of mapping a 1:M relationship
a) Relationship between customers and orders
Note the mandatory one
b) Mapping the relationship
Again, no null value in theforeign keythis is because
of the mandatory minimumcardinality
Foreign key
-
7/28/2019 Logical and Physical Design
22/103
Example of mapping an M:N relationship
a) Completes relationship (M:N)
TheCompletesrelationship will need to become a separate relation
-
7/28/2019 Logical and Physical Design
23/103
New
intersectionrelation
Foreign key
Foreign key
Composite primary key
Example of mapping an M:N relationship (cont.)
b) Three resulting relations
-
7/28/2019 Logical and Physical Design
24/103
Example of mapping a binary 1:1 relationship
a) In_charge relationship (1:1)
Often in 1:1 relationships, one direction is optional.
-
7/28/2019 Logical and Physical Design
25/103
b) Resulting relations
Example of mapping a binary 1:1 relationship (cont.)
Foreign key goes in the relation on the optional side,Matching the primary key on the mandatory side
-
7/28/2019 Logical and Physical Design
26/103
Transforming EER Diagrams into Relations (cont.)
Mapping Associative Entities
Identifier Not AssignedDefault primary key for the association
relation is composed of the primary keys ofthe two entities (as in M:N relationship)
Identifier Assigned
It is natural and familiar to end-usersDefault identifier may not be unique
-
7/28/2019 Logical and Physical Design
27/103
Example of mapping an associative entity
a) An associative entity
-
7/28/2019 Logical and Physical Design
28/103
Example of mapping an associative entity (cont.)
b) Three resulting relations
Composite primary key formed from the two foreign keys
-
7/28/2019 Logical and Physical Design
29/103
Example of mapping an associative entity with
an identifier
a) SHIPMENT associative entity
-
7/28/2019 Logical and Physical Design
30/103
Example of mapping an associative entity with
an identifier (cont.)
b) Three resulting relations
Primary key differs from foreign keys
-
7/28/2019 Logical and Physical Design
31/103
Transforming EER Diagrams into Relations (cont.)
Mapping Unary RelationshipsOne-to-ManyRecursive foreign key in the
same relation
Many-to-ManyTwo relations:One for the entity type
One for an associative relation in which theprimary key has two attributes, both takenfrom the primary key of the entity
-
7/28/2019 Logical and Physical Design
32/103
Mapping a unary 1:N relationship
(a) EMPLOYEE entity with
unary relationship
(b) EMPLOYEErelation withrecursive foreignkey
-
7/28/2019 Logical and Physical Design
33/103
Mapping a unary M:N relationship
(a) Bill-of-materialsrelationships (M:N)
(b) ITEM andCOMPONENTrelations
-
7/28/2019 Logical and Physical Design
34/103
Transforming EER Diagrams into Relations (cont.)
Mapping Ternary (and n-ary)
RelationshipsOne relation for each entity and
one for the associative entityAssociative entity has foreign keys
to each entity in the relationship
-
7/28/2019 Logical and Physical Design
35/103
Mapping a ternary relationship
a) PATIENT TREATMENT Ternary relationship withassociative entity
-
7/28/2019 Logical and Physical Design
36/103
b) Mapping the ternary relationship PATIENT TREATMENT
Rememberthat the
primary keyMUST be
unique
Mapping a ternary relationship (cont.)
This is whytreatment dateand time are
included in the
compositeprimary key
But this makes avery
cumbersomekey
It would bebetter to create asurrogate key
like Treatment#
-
7/28/2019 Logical and Physical Design
37/103
Transforming EER Diagrams into Relations (cont.)
Mapping Supertype/Subtype Relationships
One relation for supertype and for each subtypeSupertype attributes (including identifier and
subtype discriminator) go into supertype relation
Subtype attributes go into each subtype;primary key of supertype relation also becomesprimary key of subtype relation
1:1 relationship established between supertypeand each subtype, with supertype as primarytable
-
7/28/2019 Logical and Physical Design
38/103
Supertype/subtype relationships
-
7/28/2019 Logical and Physical Design
39/103
Mapping Supertype/subtype relationships to relations
These are implemented as one-to-onerelationships
-
7/28/2019 Logical and Physical Design
40/103
Data Normalization Primarily a tool to validate and improve
a logical design so that it satisfiescertain constraints that avoid
unnecessary duplication of dataThe process of decomposing relations
with anomalies to produce smaller,
well-structured relations
-
7/28/2019 Logical and Physical Design
41/103
Well-Structured Relations A relation that contains minimal data redundancy and
allows users to insert, delete, and update rowswithout causing data inconsistencies
Goal is to avoid anomalies
Insertion Anomalyadding new rows forces user to createduplicate data
Deletion Anomalydeleting rows may cause a loss of datathat would be needed for other future rows
Modification Anomalychanging data in a row forceschanges to other rows because of duplication
General rule of thumb: A table should not pertain to
more than one entity type
-
7/28/2019 Logical and Physical Design
42/103
Example
QuestionIs this a relation?AnswerYes: Unique rows and no
multivaluedattributes
QuestionWhats the primary key? AnswerComposite: Emp_ID, Course_Title
-
7/28/2019 Logical and Physical Design
43/103
Anomalies in this Table
Insertioncant enter a new employee without
having the employee take a class Deletionif we remove employee 140, we lose
information about the existence of a Tax Acc
class Modificationgiving a salary increase to
employee 100 forces us to update multiple
recordsWhy do these anomalies exist?
Because there are two themes (entity types) in this one
relation. This results in data duplication and anunnecessary dependency between the entities
-
7/28/2019 Logical and Physical Design
44/103
Functional Dependencies and Keys
Functional Dependency: The value ofone attribute (the determinant)determines the value of anotherattribute
Candidate Key:A unique identifier. One of the candidate
keys will become the primary key E.g. perhaps there is both credit card number
and SS# in a tablein this case both arecandidate keys
Each non-key field is functionallydependent on every candidate key
-
7/28/2019 Logical and Physical Design
45/103
Normalization
Relations can fall into one or more categories (or classes) called Normal Forms
Normal Form: A class of relations free from a certain set of modification
anomalies.
Normal forms are given name such as:
First normal form (1NF)Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)
These forms are cumulative. A relation in Third normal form is also in 2NF and
1NF.
Normalization
-
7/28/2019 Logical and Physical Design
46/103
Steps in normalization
-
7/28/2019 Logical and Physical Design
47/103
A relation is in first normal form if it meets the definition of a relation:1.Each column (attribute) value must be a single value only.2.All values for a given column (attribute) must be of the same type.3.Each column (attribute) name must be unique.4.The order of columns is insignificant.5.No two rows (tuples) in a relation can be identical.6.The order of the rows (tuples) is insignificant.
If you have a key defined for the relation, then you can meet the uniquerow requirement.Example relation in 1NF:STOCKS (Company, Symbol, Date, Close_Price)
112.0001/06/94NETSNetscape
33.0001/05/94NETSNetscape
102.0001/07/94IBMIBM
100.5001/06/94IBMIBM
101.0001/05/94IBMIBM
Close PriceDateSymbolCompany
First Normal Form (1NF)
-
7/28/2019 Logical and Physical Design
48/103
A relation is in second normal form (2NF) if all of its non-key attributes are
dependent on all of the key.
Relations that have a single attribute for a key are automatically in 2NF.This is one reason why we often use artificial identifiers as keys.
In the example below, Close Price is dependent on Company, Date and
Symbol, Date
The following example relation is not in 2NF:
STOCKS (Company, Symbol, Headquarters, Date, Close_Price)
112.0001/06/94Sunyvale, CANETSNetscape
33.0001/05/94Sunyvale, CANETSNetscape
102.0001/07/94Armonk, NYIBMIBM
100.5001/06/94Armonk, NYIBMIBM
101.0001/05/94Armonk, NYIBMIBM
Close PriceDateHeadquartersSymbolCompanyCompany, Dat e - > Cl ose Pr i ce
Symbol , Dat e - > Cl ose Pr i ceCompany - > Symbol , Headquar t er s
Symbol - > Company, Headquar t er s
Second Normal Form (2NF)
-
7/28/2019 Logical and Physical Design
49/103
Consider that Company, Dat e - > Cl ose Pr i ce.
So we might use Company, Date as our key.
However: Company - > Headquar t er s
This violates the rule for 2NF. Also, consider the insertion and deletionanomalies.
One Solution: Split this up into two relations:
COMPANY (Company, Symbol, Headquarters)STOCKS (Symbol, Date, Close_Price)
Sunnyvale,CA
NETSNetscape
Armonk, NYIBMIBM
HeadquartersSymbolCompany
Company - > Symbol , Headquar t er s
Symbol - > Company, Headquar t er s
112.0001/06/94NETS
33.0001/05/94NETS
102.0001/07/94IBM
100.5001/06/94IBM
101.0001/05/94IBM
Close PriceDateSymbol
Symbol, Date -> Close Price
Second Normal Form (2NF)
-
7/28/2019 Logical and Physical Design
50/103
A relation is in third normal form (3NF) if it is in second normal formand
it contains no transitive dependencies.
Consider relation R containing attributes A, B and C.I f A - > B and B - > C t hen A - > C
Transitive Dependency: Three attributes with the above dependencies.
Example: At CUNY:
Cour se_Code - > Cour se_Num, Sect i on
Cour se_Num, Sect i on - > Cl assr oom, Pr of essor
Example: At Rutgers: Course_
I ndex_Num - > Cour se_Num, Sect i onCour se_Num, Sect i on - > Cl assr oom, Pr of essor
Third Normal Form (3NF)
-
7/28/2019 Logical and Physical Design
51/103
26%BergenAT&T
28%PutnamIBM
Tax RateCountyCompany Company - > Count yand
Count y - > Tax Rat et hus
Company - > Tax Rat e
What happens if we remove AT&T ?We loose information about 2 different themes.
Split this up into two relations:
BergenAT&T
PutnamIBM
CountyCompany
Company - > Count y26%Bergen
28%Putnam
Tax RateCounty
Count y - > Tax Rat e
Example:
Third Normal Form (3NF)
-
7/28/2019 Logical and Physical Design
52/103
A relation is in BCNF if every determinant is a candidate key.
Recall that not all determinants are keys.
Those determinants that are keys we initially call candidate keys.
Eventually, we select a single candidate key to be the primary key for the relation.
Consider the following example:
Funds consist of one or more Investment Types.
Funds are managed by one or more ManagersInvestment Types can have one more Managers
Managers only manage one type of investment.
SmithCommon Stock11
BrownGrowth Stocks22
GreenCommon Stock33
J onesMunicipal Bonds99
SmithCommon Stock99
ManagerInvestmentTypeFundID
FundI D, Manager - > I nvest ment TypeFundI D, I nvest ment Type - > ManagerManager - > I nvest ment Type
Boyce-Codd Normal Form (BCNF)
-
7/28/2019 Logical and Physical Design
53/103
The combination FundID and InvestmentType form a candidate key because we can use
FundID,InvestmentType to uniquely identify a tuple in the relation.
Similarly, the combination FundID and Manager also form a candidate key because we
can use FundID, Manager to uniquely identify a tuple. Manager by itself is not a candidate key because we cannot use Manager alone to
uniquely identify a tuple in the relation.
Is this relation R(FundID, InvestmentType, Manager) in 1NF, 2NF or 3NF ?
Given we pick FundID, InvestmentType as the Primary Key: 1NF for sure.2NF because all of the non-key attributes (Manager) is dependant on all of the key.3NF
because there are no transitive dependencies.
Consider what happens if we delete the tuple with FundID 22. We loose the fact that
Brown manages the InvestmentType "Common Stocks."
SmithCommon Stock11
BrownGrowth Stocks22
GreenCommon Stock33
J onesMunicipal Bonds99
SmithCommon Stock99
Manage
r
InvestmentTyp
eFundID
FundI D, Manager - > I nvest ment TypeFundI D, I nvest ment Type - > ManagerManager - > I nvest ment Type
Boyce-Codd Normal Form (BCNF)
-
7/28/2019 Logical and Physical Design
54/103
The fol lowing are steps to normalize a relation into BCNF:1. List all of the determinants.2. See if each determinant can act as a key (candidate keys).3. For any determinant that is not a candidate key, create a new relation from the
functional dependency. Retain the determinant in the original relation.For our example:
Rorig(FundID, InvestmentType, Manager)1. The determinants are:
FundI D, I nvest ment TypeFundI D, ManagerManager
2. Which determinants can act as keys ?FundI D, I nvest ment Type YESFundI D, Manager YESManager NO
3. Create a new relation from the functional dependency:Rnew(Manager, InvestmentType)Rorig(FundID, Manager)
In this last step, we have retained the determinant "Manager" in the original relation Rorig.
Boyce-Codd Normal Form (BCNF)
-
7/28/2019 Logical and Physical Design
55/103
A relation is in fourth normal form if it is in BCNF and it contains
multivalued dependencies.
Multivalued Dependency: A type of functional dependency wherethe determinant can determine more than one value.
More formally, there are 3 criteria:
1. There must be at least 3 attributes in the relation. call them A, B, and
C, for example.
2. Given A, one can determine multiple values of B.
Given A, one can determine multiple values of C.
3. B and C are independent of one another.
example:
Student has one or more majors.
Student participates in one or more activities.
Fourth Normal Form (4NF)
-
7/28/2019 Logical and Physical Design
56/103
SwimmingMarketing200
VolleyballAccounting100
BaseballAccounting100
VolleyballCIS100
BaseballCIS100
ActivitiesMajorStudentID
St udent I D - >> Maj orSt udent I D - >> Act i vi t i es
T. Rowe Price Emerging Markets Bond FundKaufmann Fund888
Dreyfus Short-Intermediate Municipal Bond FundScudder Global Fund999
Municipal BondsScudder Global Fund999
Dreyfus Short-Intermediate Municipal Bond FundJ anus Fund999Municipal BondsJ anus Fund999
Bond FundStock FundPortfolio ID
Fourth Normal Form (4NF)
-
7/28/2019 Logical and Physical Design
57/103
A few characteristics:
1. No regular functional dependencies2. All three attributes taken together form the key.
3. Latter two attributes are independent of one another.
4. Insertion anomaly: Cannot add a stock fund without adding a
bond fund (NULL Value). Must always maintain the combinations
to preserve the meaning.
Stock Fund and Bond Fund form a multivalued dependency onPortfolio ID. PortfolioID ->-> Stock Fund PortfolioID ->-> Bond
Fund
Fourth Normal Form (4NF)
-
7/28/2019 Logical and Physical Design
58/103
Resolution: Split into two tables with the common key:
KaufmannFund888
ScudderGlobal Fund
999
J anus Fund999
Stock FundPortfolio
ID
T. Rowe Price Emerging Markets BondFund888
Dreyfus Short-Intermediate MunicipalBond Fund
999
Municipal Bonds999
Bond FundPortfolio
ID
T. Rowe Price Emerging Markets Bond FundKaufmann Fund888Dreyfus Short-Intermediate Municipal Bond FundScudder Global Fund999
Municipal BondsScudder Global Fund999
Dreyfus Short-Intermediate Municipal Bond FundJ anus Fund999
Municipal BondsJ anus Fund999
Bond FundStock FundPortfolio ID
Fourth Normal Form (4NF)
-
7/28/2019 Logical and Physical Design
59/103
There are certain conditions under which
after decomposing a relation, it cannot bereassembled back into its original form.
Fifth Normal Form (5NF)
-
7/28/2019 Logical and Physical Design
60/103
Consider the following relation:
CUSTOMER (CustomerID, Name, Address, City, State, Zip)This relation is not in DK/NF because it contains a functional dependency
not implied by the key.
Zi p - > Ci t y, St at e
We can normalize this into DK/NF by splitting the CUSTOMER relation
into two:
CUSTOMER (CustomerID, Name, Address, Zip)
CODES (Zip, City, State)
We may pay a performance penalty - each customer address lookup
requires we look in two relations (tables).
In such cases, we may de-normalize the relations to achieve a
performance improvement.
De-Normalization
-
7/28/2019 Logical and Physical Design
61/103
Many of you asked for a "complete" example that would run through all ofthe normal forms from beginning to end using the same tables. This istough to do, but here is an attempt:
Example relation:EMPLOYEE ( Name, Project, Task, Office, Phone )
Note: Keys are underlined.Example Data:
15885588T2100XEd
14424442T33300ZSue
14424442T33200YSue
14424442T33100XSue14004400T2200YBill
14004400T1200YBill
14004400T2100XBill
14004400T1100XBill
PhoneFloorOfficeTaskProjectName
All-in-One Example
-
7/28/2019 Logical and Physical Design
62/103
Name is the employee's name
Project is the project they are working on. Bill is working on two different
projects, Sue is working on 3.Task is the current task being worked on. Bill is now working on Tasks T1
and T2. Note that Tasks are independent of the project. Examples of a
task might be faxing a memo or holding a meeting.
Office is the office number for the employee. Bill works in office number400.
Flooris the floor on which the office is located.
Phone is the phone extension. Note this is associated with the phone in
the given office.
Question :
First Normal Form
Assume the key is Name, Project, Task.Is EMPLOYEE in 1NF ?
All-in-One Example
-
7/28/2019 Logical and Physical Design
63/103
Second Normal Form
List all of the functional dependencies for EMPLOYEE.
Are all of the non-key attributes dependant on all of the key ?
Split into two relations EMPLOYEE_PROJ ECT_TASK andEMPLOYEE_OFFICE_PHONE. EMPLOYEE_PROJ ECT_TASK (Name, Project,
Task)
T2100XEd
T33300ZSue
T33200YSue
T33100XSue
T2200YBill
T1200YBillT2100XBill
T1100XBill
TaskProjectName
EMPLOYEE_OFFI CE_PHONE ( Name,Of f i ce, Fl oor , Phone)
15885588T2100XEd
14424442T33300ZSue
14424442T33200YSue14424442T33100XSue
14004400T2200YBill
14004400T1200YBill
14004400T2100XBill
14004400T1100XBill
PhoneFloorOfficeTaskProjectName
15885588Ed
14424442Sue
14004400Bill
PhoneFloorOfficeName
All-in-One Example
-
7/28/2019 Logical and Physical Design
64/103
Third Normal Form
Assume each office has exactly one phone number.
Are there any transitive dependencies ?
Where are the modification anomalies in EMPLOYEE_OFFICE_PHONE ?Split EMPLOYEE_OFFICE_PHONE.
EMPLOYEE_PROJ ECT_TASK (Name, Project, Task)
Name Project Task
Bill 100X T1
Bill 100X T2
Bill 200Y T1
Bill 200Y T2Sue 100X T33
Sue 200Y T33
Sue 300Z T33
Ed 100X T2
EMPLOYEE_OFFI CE
( Name, Of f i ce, Fl oor )Name Of f i ce Fl oor
Bi l l 400 4
Sue 442 4
Ed 588 5EMPLOYEE_PHONE( Of f i ce, Phone)
Office Phone
400 1400
442 1442
588 1588
All-in-One Example
-
7/28/2019 Logical and Physical Design
65/103
Boyce-Codd Normal Form
List all of the functional dependencies for
EMPLOYEE_PROJ ECT_TASK, EMPLOYEE_OFFICE and
EMPLOYEE_PHONE. Look at the determinants.
Are all determinants candidate keys ?
All-in-One Example
-
7/28/2019 Logical and Physical Design
66/103
Forth Normal Form
Are there any multivalued dependencies ?
What are the modification anomalies ?
Split EMPLOYEE_PROJ ECT_TASK.
EMPLOYEE_PROJ ECT (Name, Project )
Name Project
Bill 100XBill 200YSue 100XSue 200YSue 300Z
Ed 100X
Name Project TaskBill 100X T1
Bill 100X T2
Bill 200Y T1
Bill 200Y T2
Sue 100X T33
Sue 200Y T33
Sue 300Z T33
Ed 100X T2
Name TaskBill T1Bill T2Sue T33
Ed T2EMPLOYEE_TASK ( Name, Task )
All-in-One Example
-
7/28/2019 Logical and Physical Design
67/103
EMPLOYEE_OFFI CE ( Name, Of f i ce, Fl oor )Name Office Floor
Bill 400 4
Sue 442 4
Ed 588 5
R4 ( Of f i ce, Phone)
Office Phone
400 1400
442 1442
588 1588
All-in-One Example
-
7/28/2019 Logical and Physical Design
68/103
At each step of the process, we did the following:
1.Write out the relation
2.(optionally) Write out some example data.3.Write out all of the functional dependencies
4.Starting with 1NF, go through each normal form and state why
the relation is in the given normal form.
All-in-One Example
-
7/28/2019 Logical and Physical Design
69/103
Another short example
Consider the following example of normalization for a CUSTOMER relation.
Relation Name
CUSTOMER (CustomerID, Name, Street, City, State, Zip, Phone)
Example Data
CustomerID Name Street City State Zip Phone
C101 Bill Smith 123 First St. New Brunswick NJ 07101 732-555-1212
C102 Mary Green 11 Birch St. Old Bridge NJ 07066 908-555-1212
Functional DependenciesCust omer I D - > Name, St r eet , Ci t y, St at e, Zi p, PhoneZi p - > Ci t y, St at e
All-in-One Example
-
7/28/2019 Logical and Physical Design
70/103
1NF Meets the definition of a relation.
2NF All non key attributes are dependent on all of the key.
3NFThere are no transitive dependencies.
BCNF Relation CUSTOMER is not in BCNF because one of the
determinants Zip can not act as a key for the entire relation. Solution:
Split CUSTOMER into two relations:CUSTOMER (CustomerID, Name, Street, Zip, Phone)
ZIPCODES (Zip, City, State)
Check both CUSTOMER and ZIPCODE to ensure they are both in 1NFup to BCNF.
4NFThere are no multi-valued dependencies in either CUSTOMER or
ZIPCODES.
As a final step, consider de-normalization.
Normalization
All-in-One Example
-
7/28/2019 Logical and Physical Design
71/103
Merging Relations
View IntegrationCombining entities frommultiple ER models into common relations
Issues to watch out for when merging entitiesfrom different ER models: Synonymstwo or more attributes with different
names but same meaning Homonymsattributes with same name but different
meanings
Transitive dependencieseven if relations are in 3NFprior to merging, they may not be after merging
Supertype/subtype relationshipsmay be hidden priorto merging
-
7/28/2019 Logical and Physical Design
72/103
Enterprise Keys Primary keys that are unique in the
whole database, not just within asingle relation
Corresponds with the concept of anobject ID in object-oriented systems
-
7/28/2019 Logical and Physical Design
73/103
Enterprise keys
a) Relations withenterprise key
b) Sample data withenterprise key
-
7/28/2019 Logical and Physical Design
74/103
Physical Database Design
Purposetranslate the logical descriptionof data into the technical specifications forstoring and retrieving data
Goalcreate a design for storing data that
will provide adequate performance andinsure database integrity, security, and
recoverability
-
7/28/2019 Logical and Physical Design
75/103
Physical Design Process
zNormalized relations
zVolume estimates
z
Attribute definitionszResponse time expectations
zData security needs
zBackup/recovery needs
zIntegrity expectations
zDBMS technology used
Inputs
zAttribute data types
zPhysical record descriptions
(doesnt always matchlogical design)
zFile organizations
zIndexes and databasearchitectures
zQuery optimization
Leads to
Decisions
-
7/28/2019 Logical and Physical Design
76/103
Composite usage map
-
7/28/2019 Logical and Physical Design
77/103
Composite usage map (cont.)
Data volumes
-
7/28/2019 Logical and Physical Design
78/103
Composite usage map (cont.)
Access Frequencies(per hour)
-
7/28/2019 Logical and Physical Design
79/103
Composite usage map (cont.)
Usage analysis:140 purchased parts accessedper hour
80 quotations accessed fromthese 140 purchased partaccesses
70 suppliers accessed fromthese 80 quotation accesses
-
7/28/2019 Logical and Physical Design
80/103
Composite usage map (cont.)
Usage analysis:75 suppliers accessed per
hour40 quotations accessed fromthese 75 supplier accesses
40 purchased parts accessedfrom these 40 quotationaccesses
-
7/28/2019 Logical and Physical Design
81/103
Designing Fields
Field: smallest unit of data in
databaseField design
Choosing data type
Coding, compression, encryption
Controlling data integrity
-
7/28/2019 Logical and Physical Design
82/103
Choosing Data Types
CHARfixed-length character
VARCHAR2variable-length character (memo) LONGlarge number
NUMBERpositive/negative number INEGERpositive/negative whole number
DATEactual date
BLOBbinary large object (good for graphics,sound clips, etc.)
-
7/28/2019 Logical and Physical Design
83/103
Example code look-up table
Code saves space, but costsan additional lookup toobtain actual value
-
7/28/2019 Logical and Physical Design
84/103
Field Data Integrity
Default valueassumed value if no explicit
value Range controlallowable value limitations
(constraints or validation rules)
Null value controlallowing or prohibitingempty fields
Referential integrityrange control (and null
value allowances) for foreign-key to primary-key match-ups
Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity
-
7/28/2019 Logical and Physical Design
85/103
Handling Missing Data
Substitute an estimate of the missing value(e.g., using a formula)
Construct a report listing missing values
In programs, ignore missing data unless thevalue is significant (sensitivity testing)
Triggers can be used to perform these operations
-
7/28/2019 Logical and Physical Design
86/103
Physical Records
Physical Record: A group of fields
stored in adjacent memory locationsand retrieved together as a unit
Page: The amount of data read orwritten in one I/O operation
Blocking Factor: The number of physical
records per page
-
7/28/2019 Logical and Physical Design
87/103
DenormalizationTransforming normalized relations into unnormalized
physical record specifications
Benefits: Can improve performance (speed) by reducing number of table
lookups (i.e. reduce number of necessary join queries)
Costs (due to data duplication)Wasted storage space
Data integrity/consistency threats
Common denormalization opportunities
One-to-one relationshipMany-to-many relationship with attributes)
Reference data (1:N relationship where 1-side has data not usedin any other relationship)
A possible denormalization situation: two entities with one to one
-
7/28/2019 Logical and Physical Design
88/103
A possible denormalization situation: two entities with one-to-onerelationship
A ibl d li ti it ti t l ti hi ith
-
7/28/2019 Logical and Physical Design
89/103
A possible denormalization situation: a many-to-many relationship withnonkey attributes
Extra tableaccessrequired
Null description possible
A possible denormalization situation:
-
7/28/2019 Logical and Physical Design
90/103
A possible denormalization situation:reference data
Extra tableaccess
required
Data duplication
P titi i
-
7/28/2019 Logical and Physical Design
91/103
Partitioning
Horizontal Partitioning: Distributing the rows of atable into several separate files Useful for situations where different users need access to
different rows
Three types: Key Range Partitioning, Hash Partitioning, orComposite Partitioning
Vertical Partitioning: Distributing the columns of atable into several separate relations Useful for situations where different users need access to
different columnsThe primary key must be repeated in each file
Combinations of Horizontal and Vertical
Partitions often correspond with User Schemas (user views)
P titi i ( t )
-
7/28/2019 Logical and Physical Design
92/103
Partitioning (cont.)
Advantages of Partitioning: Efficiency: Records used together are grouped together Local optimization: Each partition can be optimized for
performance Security, recovery Load balancing: Partitions stored on different disks, reduces
contentionTake advantage of parallel processing capability
Disadvantages of Partitioning:
Inconsistent access speed: Slow retrievals across partitions Complexity: Non-transparent partitioning Extra space or update time: Duplicate data; access from multiple
partitions
D t R li ti
-
7/28/2019 Logical and Physical Design
93/103
Data Replication
Purposely storing the same data in
multiple locations of the database Improves performance by allowing multiple
users to access the same data at the
same time with minimum contentionSacrifices data integrity due to data
duplicationBest for data that is not updated often
Designing Physical Files
-
7/28/2019 Logical and Physical Design
94/103
Designing Physical Files
Physical File: A named portion of secondary memory allocated
for the purpose of storing physical recordsTablespacenamed set of disk storage elements
in which physical files for database tables can bestored
Extentcontiguous section of disk space
Constructs to link two pieces of data:
Sequential storage Pointersfield of data that can be used to locate
related fields or records
-
7/28/2019 Logical and Physical Design
95/103
Physical file terminology in an Oracle environment
Fil O i ti
-
7/28/2019 Logical and Physical Design
96/103
File Organizations
Technique for physically arranging records of afile on secondary storage
Factors for selecting file organization: Fast data retrieval and throughput Efficient storage space utilization
Protection from failure and data lossMinimizing need for reorganization Accommodating growth
Security from unauthorized useTypes of file organizations
Sequential
Indexed Hashed
-
7/28/2019 Logical and Physical Design
97/103
Sequential fi le organization
If not sortedAverage time tofind desired record
= n/2
1
2
n
Records of thefile are stored insequence by theprimary key
field values
If sorted
every insert ordelete requiresresort
Indexed File Organizations
-
7/28/2019 Logical and Physical Design
98/103
Indexed File Organizations
Indexa separate table that containsorganization of records for quick retrieval
Primary keys are automatically indexed Oracle has a CREATE INDEX operation, and
MS ACCESS allows indexes to be created for
most field types Indexing approaches:
B-tree index
Bitmap index
Hash Index
J oin Index
B t i d
-
7/28/2019 Logical and Physical Design
99/103
B-tree index
uses a tree searchAverage time to find desiredrecord =depth of the tree
Leaves of the treeare all at samelevel
consistent accesstime
H h d fil i d i ti
-
7/28/2019 Logical and Physical Design
100/103
Hashed file or index organization
Hash algorithmUsually uses division-
remainder to determinerecord position. Recordswith same position aregrouped in lists
Bit i d i d i ti
-
7/28/2019 Logical and Physical Design
101/103
Bitmap index index organization
Bitmap saves on space requirementsRows - possible values of the attribute
Columns - table rows
Bit indicates whether the attribute of a row has the values
Join Indexes speeds up join operations
-
7/28/2019 Logical and Physical Design
102/103
Join Indexesspeeds up join operations
-
7/28/2019 Logical and Physical Design
103/103