Database Systems: Design, Implementation, and Management Ninth ...
Database Systems: Design, Implementation, and Management
Transcript of Database Systems: Design, Implementation, and Management
Database Systems:
Design, Implementation, and
Management
Chapter 5
Advanced Data Modeling
Objectives
• In this chapter, students will learn:
– About the extended entity relationship (EER)
model
– How entity clusters are used to represent
multiple entities and relationships
– The characteristics of good primary keys and
how to select them
– How to use flexible solutions for special data
modeling cases
Database Systems 2
The Extended Entity
Relationship Model
• Result of adding more semantic constructs to
original entity relationship (ER) model
• Diagram using this model is called an EER
diagram (EERD)
Database Systems 3
Entity Supertypes and Subtypes
• Entity supertype
– Generic entity type related to one or more entity
subtypes
– Contains common characteristics
• Entity subtype
– Contains unique characteristics of each entity
subtype
Database Systems 4
Database Systems 5
Specialization Hierarchy
• Depicts arrangement of higher-level entity
supertypes and lower-level entity subtypes
• Relationships described in terms of “IS-A”
relationships
• Subtype exists only within context of supertype
• Every subtype has only one supertype to which
it is directly related
• Can have many levels of supertype/subtype
relationships
Database Systems 6
Database Systems 7
Inheritance
• Enables entity subtype to inherit attributes and
relationships of supertype
• All entity subtypes inherit their primary key
attribute from their supertype
• At implementation level, supertype and its
subtype(s) maintain a 1:1 relationship
• Entity subtypes inherit all relationships in which
supertype entity participates
• Lower-level subtypes inherit all attributes and
relationships from all upper-level supertypes Database Systems 8
Database Systems 9
Subtype Discriminator
• Attribute in supertype entity
– Determines to which entity subtype each
supertype occurrence is related
• Default comparison condition for subtype
discriminator attribute is equality comparison
• Subtype discriminator may be based on other
comparison condition
Database Systems 10
Disjoint and Overlapping Constraints
• Disjoint subtypes
– Also called nonoverlapping subtypes
– Subtypes that contain unique subset of
supertype entity set
• Overlapping subtypes
– Subtypes that contain nonunique subsets of
supertype entity set
Database Systems 11
Database Systems 12
Database Systems 13
Completeness Constraint
• Specifies whether entity supertype occurrence
must be a member of at least one subtype
• Partial completeness
– Symbolized by a circle over a single line
– Some supertype occurrences are not members
of any subtype
• Total completeness
– Symbolized by a circle over a double line
– Every supertype occurrence must be member of
at least one subtype Database Systems 14
Database Systems 15
Specialization and Generalization
• Specialization
– Identifies more specific entity subtypes from
higher-level entity supertype
– Top-down process
– Based on grouping unique characteristics and
relationships of the subtypes
Database Systems 16
Specialization and Generalization
(cont’d.)
• Generalization
– Identifies more generic entity supertype from
lower-level entity subtypes
– Bottom-up process
– Based on grouping common characteristics and
relationships of the subtypes
Database Systems 17
Entity Integrity:
Selecting Primary Keys
• Primary key is the most important characteristic
of an entity
– Single attribute or some combination of
attributes
• Primary key’s function is to guarantee entity
integrity
• Primary keys and foreign keys work together to
implement relationships
• Properly selecting primary key has direct
bearing on efficiency and effectiveness Database Systems 18
Natural Keys and Primary Keys
• Natural key is a real-world identifier used to
uniquely identify real-world objects
– Familiar to end users and forms part of their day-
to-day business vocabulary
• Generally, data modeler uses natural identifier
as primary key of entity being modeled
• May instead use composite primary key or
surrogate key
Database Systems 19
Primary Key Guidelines
• Attribute that uniquely identifies entity instances
in an entity set
– Could also be combination of attributes
• Main function is to uniquely identify an entity
instance or row within a table
• Guarantee entity integrity, not to “describe” the
entity
• Primary keys and foreign keys implement
relationships among entities
– Behind the scenes, hidden from user Database Systems 20
Database Systems 21
When to Use Composite Primary Keys
• Composite primary keys useful in two cases:
– As identifiers of composite entities
• In which each primary key combination is allowed
once in M:N relationship
– As identifiers of weak entities
• In which weak entity has a strong identifying
relationship with the parent entity
• Automatically provides benefit of ensuring that
there cannot be duplicate values
Database Systems 22
Database Systems 23
When to Use Composite Primary Keys
(cont’d.)
• When used as identifiers of weak entities
normally used to represent:
– Real-world object that is existent-dependent on
another real-world object
– Real-world object that is represented in data
model as two separate entities in strong
identifying relationship
• Dependent entity exists only when it is related
to parent entity
Database Systems 24
When To Use Surrogate Primary Keys
• Especially helpful when there is:
– No natural key
– Selected candidate key has embedded semantic
contents
– Selected candidate key is too long or
cumbersome
Database Systems 25
When To Use Surrogate Primary Keys
(cont’d.)
• If you use surrogate key:
– Ensure that candidate key of entity in question
performs properly
– Use “unique index” and “not null” constraints
Database Systems 26
Design Cases:
Learning Flexible Database Design
• Data modeling and design requires skills
acquired through experience
• Experience acquired through practice
• Four special design cases that highlight:
– Importance of flexible design
– Proper identification of primary keys
– Placement of foreign keys
Database Systems 27
Design Case 1: Implementing 1:1
Relationships
• Foreign keys work with primary keys to properly
implement relationships in relational model
• Put primary key of the “one” side on the “many”
side as foreign key
– Primary key: parent entity
– Foreign key: dependent entity
Database Systems 28
Design Case 1: Implementing 1:1
Relationships (cont’d.)
• In 1:1 relationship, there are two options:
– Place a foreign key in both entities (not
recommended)
– Place a foreign key in one of the entities
• Primary key of one of the two entities appears as
foreign key of other
Database Systems 29
Database Systems 30
Database Systems 31
Design Case 2: Maintaining History of
Time-Variant Data • Normally, existing attribute values are replaced
with new value without regard to previous value
• Time-variant data:
– Values change over time
– Must keep a history of data changes
• Keeping history of time-variant data equivalent
to having a multivalued attribute in your entity
• Must create new entity in 1:M relationships with
original entity
• New entity contains new value, date of change
Database Systems 32
Database Systems 33
Database Systems 34
Design Case 3: Fan Traps
• Design trap occurs when relationship is
improperly or incompletely identified
– Represented in a way not consistent with the
real world
• Most common design trap is known as fan trap
• Fan trap occurs when one entity is in two 1:M
relationships to other entities
– Produces an association among other entities
not expressed in the model
Database Systems 35
Database Systems 36
Database Systems 37
Design Case 4:
Redundant Relationships
• Redundancy is seldom a good thing in
database environment
• Occurs when there are multiple relationship
paths between related entities
• Main concern is that redundant relationships
remain consistent across model
• Some designs use redundant relationships to
simplify the design
Database Systems 38
Database Systems 39
Summary
• Extended entity relationship (EER) model adds
semantics to ER model
– Adds semantics via entity supertypes, subtypes,
and clusters
– Entity supertype is a generic entity type related
to one or more entity subtypes
• Specialization hierarchy
– Depicts arrangement and relationships between
entity supertypes and entity subtypes
• Inheritance means an entity subtype inherits
attributes and relationships of supertype Database Systems 40
Summary (cont’d.)
• Subtype discriminator determines which entity
subtype the supertype occurrence is related to:
– Partial or total completeness
– Specialization vs. generalization
• Entity cluster is “virtual” entity type
– Represents multiple entities and relationships in
ERD
– Formed by combining multiple interrelated
entities and relationships into a single object
Database Systems 41
Summary (cont’d.)
• Natural keys are identifiers that exist in real
world
– Sometimes make good primary keys
• Characteristics of primary keys:
– Must have unique values
– Should be nonintelligent
– Must not change over time
– Preferably numeric or composed of single
attribute
Database Systems 42
Summary (cont’d.)
• Composite keys are useful to represent
– M:N relationships
– Weak (strong-identifying) entities
• Surrogate primary keys are useful when no
suitable natural key makes primary key
• In a 1:1 relationship, place the PK of mandatory
entity:
– As FK in optional entity
– As FK in entity that causes least number of nulls
– As FK where the role is played Database Systems 43
Summary (cont’d.)
• Time-variant data
– Data whose values change over time
– Requires keeping a history of changes
• To maintain history of time-variant data:
– Create entity containing the new value, date of
change, other time-relevant data
– Entity maintains 1:M relationship with entity for
which history maintained
Database Systems 44
Summary (cont’d.)
• Fan trap:
– One entity in two 1:M relationships to other
entities
– Association among the other entities not
expressed in model
• Redundant relationships occur when multiple
relationship paths between related entities
– Main concern is that they remain consistent
across the model
• Data modeling checklist provides way to check
that the ERD meets minimum requirements Database Systems 45
Database Systems:
Design, Implementation, and
Management
Chapter 6
Normalization of Database Tables
Database Systems 47
Objectives
• In this chapter, students will learn:
– What normalization is and what role it plays in the database design process
– About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF
– How normal forms can be transformed from lower normal forms to higher normal forms
– That normalization and ER modeling are used concurrently to produce a good database design
– That some situations require denormalization to generate information efficiently
Database Tables and Normalization
• Normalization
– Process for evaluating and correcting table
structures to minimize data redundancies
• Reduces data anomalies
– Series of stages called normal forms:
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
Database Systems 48
Database Tables and Normalization
(cont’d.)
• Normalization (continued)
– 2NF is better than 1NF; 3NF is better than 2NF
– For most business database design purposes,
3NF is as high as needed in normalization
– Highest level of normalization is not always most
desirable
• Denormalization produces a lower normal form
– Increased performance but greater data
redundancy
Database Systems 49
The Need for Normalization
• Example: company that manages building
projects
– Charges its clients by billing hours spent on
each contract
– Hourly billing rate is dependent on employee’s
position
– Periodically, report is generated that contains
information such as displayed in Table 6.1
Database Systems 50
Database Systems 51
The Need for Normalization (cont’d.)
• Structure of data set in Figure 6.1 does not
handle data very well
• Table structure appears to work; report is
generated with ease
• Report may yield different results depending on
what data anomaly has occurred
• Relational database environment is suited to
help designer avoid data integrity problems
Database Systems 52
The Normalization Process
• Each table represents a single subject
• No data item will be unnecessarily stored in
more than one table
• All nonprime attributes in a table are dependent
on the primary key
• Each table is void of insertion, update, and
deletion anomalies
Database Systems 53
Database Systems 54
Database Systems 55
The Normalization Process (cont’d.)
• Objective of normalization is to ensure that all
tables are in at least 3NF
• Higher forms are not likely to be encountered in
business environment
• Normalization works one relation at a time
• Progressively breaks table into new set of
relations based on identified dependencies
Database Systems 56
The Normalization Process (cont’d.)
• Partial dependency
– Exists when there is a functional dependence in
which the determinant is only part of the primary
key
• Transitive dependency
– Exists when there are functional dependencies
such that X → Y, Y → Z, and X is the primary
key
Database Systems 57
Conversion to First Normal Form
• Repeating group
– Group of multiple entries of same type can exist
for any single key attribute occurrence
• Relational table must not contain repeating
groups
• Normalizing table structure will reduce data
redundancies
• Normalization is three-step procedure
Database Systems 58
Conversion to First Normal Form
(cont’d.)
• Step 1: Eliminate the Repeating Groups
– Eliminate nulls: each repeating group attribute
contains an appropriate data value
• Step 2: Identify the Primary Key
– Must uniquely identify attribute value
– New key must be composed
• Step 3: Identify All Dependencies
– Dependencies are depicted with a diagram
Database Systems 59
Database Systems 60
Conversion to First Normal Form
(cont’d.)
• Dependency diagram:
– Depicts all dependencies found within given
table structure
– Helpful in getting bird’s-eye view of all
relationships among table’s attributes
– Makes it less likely that you will overlook an
important dependency
Database Systems 61
Database Systems 62
Conversion to First Normal Form
(cont’d.)
• First normal form describes tabular format:
– All key attributes are defined
– No repeating groups in the table
– All attributes are dependent on primary key
• All relational tables satisfy 1NF requirements
• Some tables contain partial dependencies
– Dependencies are based on part of the primary
key
– Should be used with caution
Database Systems 63
Conversion to Second Normal Form
• Step 1: Make New Tables to Eliminate Partial
Dependencies
– Write each key component on separate line,
then write original (composite) key on last line
– Each component will become key in new table
• Step 2: Reassign Corresponding Dependent
Attributes
– Determine attributes that are dependent on
other attributes
– At this point, most anomalies have been
eliminated Database Systems 64
Database Systems 65
Conversion to Second Normal Form
(cont’d.)
• Table is in second normal form (2NF) when:
– It is in 1NF and
– It includes no partial dependencies:
• No attribute is dependent on only portion of
primary key
Database Systems 66
Conversion to Third Normal Form
• Step 1: Make New Tables to Eliminate
Transitive Dependencies
– For every transitive dependency, write its
determinant as PK for new table
– Determinant: any attribute whose value
determines other values within a row
Database Systems 67
Conversion to Third Normal Form
(cont’d.)
• Step 2: Reassign Corresponding Dependent
Attributes
– Identify attributes dependent on each
determinant identified in Step 1
• Identify dependency
– Name table to reflect its contents and function
Database Systems 68
Database Systems 69
Conversion to Third Normal Form
(cont’d.)
• A table is in third normal form (3NF) when both
of the following are true:
– It is in 2NF
– It contains no transitive dependencies
Database Systems 70
Improving the Design
• Table structures should be cleaned up to
eliminate initial partial and transitive
dependencies
• Normalization cannot, by itself, be relied on to
make good designs
• Valuable because it helps eliminate data
redundancies
Database Systems 71
Improving the Design (cont’d.)
• Issues to address, in order, to produce a good
normalized set of tables:
– Evaluate PK Assignments
– Evaluate Naming Conventions
– Refine Attribute Atomicity
– Identify New Attributes
Database Systems 72
Improving the Design (cont’d.)
– Identify New Relationships
– Refine Primary Keys as Required for Data
Granularity
– Maintain Historical Accuracy
– Evaluate Using Derived Attributes
Database Systems 73
Database Systems 74
Database Systems 75
Surrogate Key Considerations
• When primary key is considered to be
unsuitable, designers use surrogate keys
• Data entries in Table 6.4 are inappropriate
because they duplicate existing records
– No violation of entity or referential integrity
Database Systems 76
Higher-Level Normal Forms
• Tables in 3NF perform suitably in business
transactional databases
• Higher-order normal forms are useful on
occasion
• Two special cases of 3NF:
– Boyce-Codd normal form (BCNF)
– Fourth normal form (4NF)
Database Systems 77
The Boyce-Codd Normal Form
• Every determinant in table is a candidate key
– Has same characteristics as primary key, but for
some reason, not chosen to be primary key
• When table contains only one candidate key,
the 3NF and the BCNF are equivalent
• BCNF can be violated only when table contains
more than one candidate key
Database Systems 78
The Boyce-Codd Normal Form
(cont’d.)
• Most designers consider the BCNF as a special
case of 3NF
• Table is in 3NF when it is in 2NF and there are
no transitive dependencies
• Table can be in 3NF and fail to meet BCNF
– No partial dependencies, nor does it contain
transitive dependencies
– A nonkey attribute is the determinant of a key
attribute
Database Systems 79
Fourth Normal Form (4NF)
• Table is in fourth normal form (4NF) when both
of the following are true:
– It is in 3NF
– No multiple sets of multivalued dependencies
• 4NF is largely academic if tables conform to
following two rules:
– All attributes dependent on primary key,
independent of each other
– No row contains two or more multivalued facts
about an entity Database Systems 80
Normalization and Database Design
• Normalization should be part of the design
process
• Make sure that proposed entities meet required
normal form before table structures are created
• Many real-world databases have been
improperly designed or burdened with
anomalies
• You may be asked to redesign and modify
existing databases
Database Systems 81
Normalization and Database Design
(cont’d.)
• ER diagram
– Identify relevant entities, their attributes, and
their relationships
– Identify additional entities and attributes
• Normalization procedures
– Focus on characteristics of specific entities
– Micro view of entities within ER diagram
• Difficult to separate normalization process from
ER modeling process
Database Systems 82
Database Systems 83
Database Systems 84
Database Systems 85
Database Systems 86
Database Systems 87
Denormalization
• Creation of normalized relations is important
database design goal
• Processing requirements should also be a goal
• If tables are decomposed to conform to
normalization requirements:
– Number of database tables expands
Database Systems 88
Denormalization (cont’d.)
• Joining the larger number of tables reduces
system speed
• Conflicts are often resolved through
compromises that may include denormalization
• Defects of unnormalized tables:
– Data updates are less efficient because tables
are larger
– Indexing is more cumbersome
– No simple strategies for creating virtual tables
known as views
Database Systems 89
Summary
• Normalization minimizes data redundancies
• First three normal forms (1NF, 2NF, and 3NF)
are most commonly encountered
• Table is in 1NF when:
– All key attributes are defined
– All remaining attributes are dependent on
primary key
Database Systems 90
Summary (cont’d.)
• Table is in 2NF when it is in 1NF and contains
no partial dependencies
• Table is in 3NF when it is in 2NF and contains
no transitive dependencies
• Table that is not in 3NF may be split into new
tables until all of the tables meet 3NF
requirements
• Normalization is important part—but only part—
of the design process
Database Systems 91
Summary (cont’d.)
• Table in 3NF may contain multivalued
dependencies
– Numerous null values or redundant data
• Convert 3NF table to 4NF by:
– Splitting table to remove multivalued
dependencies
• Tables are sometimes denormalized to yield
less I/O, which increases processing speed
Database Systems 92