Database Systems: Design, Implementation, and Management

92
Database Systems: Design, Implementation, and Management Chapter 5 Advanced Data Modeling

Transcript of Database Systems: Design, Implementation, and Management

Page 1: Database Systems: Design, Implementation, and Management

Database Systems:

Design, Implementation, and

Management

Chapter 5

Advanced Data Modeling

Page 2: Database Systems: Design, Implementation, and Management

Objectives

• In this chapter, students will learn:

– About the extended entity relationship (EER)

model

– How entity clusters are used to represent

multiple entities and relationships

– The characteristics of good primary keys and

how to select them

– How to use flexible solutions for special data

modeling cases

Database Systems 2

Page 3: Database Systems: Design, Implementation, and Management

The Extended Entity

Relationship Model

• Result of adding more semantic constructs to

original entity relationship (ER) model

• Diagram using this model is called an EER

diagram (EERD)

Database Systems 3

Page 4: Database Systems: Design, Implementation, and Management

Entity Supertypes and Subtypes

• Entity supertype

– Generic entity type related to one or more entity

subtypes

– Contains common characteristics

• Entity subtype

– Contains unique characteristics of each entity

subtype

Database Systems 4

Page 5: Database Systems: Design, Implementation, and Management

Database Systems 5

Page 6: Database Systems: Design, Implementation, and Management

Specialization Hierarchy

• Depicts arrangement of higher-level entity

supertypes and lower-level entity subtypes

• Relationships described in terms of “IS-A”

relationships

• Subtype exists only within context of supertype

• Every subtype has only one supertype to which

it is directly related

• Can have many levels of supertype/subtype

relationships

Database Systems 6

Page 7: Database Systems: Design, Implementation, and Management

Database Systems 7

Page 8: Database Systems: Design, Implementation, and Management

Inheritance

• Enables entity subtype to inherit attributes and

relationships of supertype

• All entity subtypes inherit their primary key

attribute from their supertype

• At implementation level, supertype and its

subtype(s) maintain a 1:1 relationship

• Entity subtypes inherit all relationships in which

supertype entity participates

• Lower-level subtypes inherit all attributes and

relationships from all upper-level supertypes Database Systems 8

Page 9: Database Systems: Design, Implementation, and Management

Database Systems 9

Page 10: Database Systems: Design, Implementation, and Management

Subtype Discriminator

• Attribute in supertype entity

– Determines to which entity subtype each

supertype occurrence is related

• Default comparison condition for subtype

discriminator attribute is equality comparison

• Subtype discriminator may be based on other

comparison condition

Database Systems 10

Page 11: Database Systems: Design, Implementation, and Management

Disjoint and Overlapping Constraints

• Disjoint subtypes

– Also called nonoverlapping subtypes

– Subtypes that contain unique subset of

supertype entity set

• Overlapping subtypes

– Subtypes that contain nonunique subsets of

supertype entity set

Database Systems 11

Page 12: Database Systems: Design, Implementation, and Management

Database Systems 12

Page 13: Database Systems: Design, Implementation, and Management

Database Systems 13

Page 14: Database Systems: Design, Implementation, and Management

Completeness Constraint

• Specifies whether entity supertype occurrence

must be a member of at least one subtype

• Partial completeness

– Symbolized by a circle over a single line

– Some supertype occurrences are not members

of any subtype

• Total completeness

– Symbolized by a circle over a double line

– Every supertype occurrence must be member of

at least one subtype Database Systems 14

Page 15: Database Systems: Design, Implementation, and Management

Database Systems 15

Page 16: Database Systems: Design, Implementation, and Management

Specialization and Generalization

• Specialization

– Identifies more specific entity subtypes from

higher-level entity supertype

– Top-down process

– Based on grouping unique characteristics and

relationships of the subtypes

Database Systems 16

Page 17: Database Systems: Design, Implementation, and Management

Specialization and Generalization

(cont’d.)

• Generalization

– Identifies more generic entity supertype from

lower-level entity subtypes

– Bottom-up process

– Based on grouping common characteristics and

relationships of the subtypes

Database Systems 17

Page 18: Database Systems: Design, Implementation, and Management

Entity Integrity:

Selecting Primary Keys

• Primary key is the most important characteristic

of an entity

– Single attribute or some combination of

attributes

• Primary key’s function is to guarantee entity

integrity

• Primary keys and foreign keys work together to

implement relationships

• Properly selecting primary key has direct

bearing on efficiency and effectiveness Database Systems 18

Page 19: Database Systems: Design, Implementation, and Management

Natural Keys and Primary Keys

• Natural key is a real-world identifier used to

uniquely identify real-world objects

– Familiar to end users and forms part of their day-

to-day business vocabulary

• Generally, data modeler uses natural identifier

as primary key of entity being modeled

• May instead use composite primary key or

surrogate key

Database Systems 19

Page 20: Database Systems: Design, Implementation, and Management

Primary Key Guidelines

• Attribute that uniquely identifies entity instances

in an entity set

– Could also be combination of attributes

• Main function is to uniquely identify an entity

instance or row within a table

• Guarantee entity integrity, not to “describe” the

entity

• Primary keys and foreign keys implement

relationships among entities

– Behind the scenes, hidden from user Database Systems 20

Page 21: Database Systems: Design, Implementation, and Management

Database Systems 21

Page 22: Database Systems: Design, Implementation, and Management

When to Use Composite Primary Keys

• Composite primary keys useful in two cases:

– As identifiers of composite entities

• In which each primary key combination is allowed

once in M:N relationship

– As identifiers of weak entities

• In which weak entity has a strong identifying

relationship with the parent entity

• Automatically provides benefit of ensuring that

there cannot be duplicate values

Database Systems 22

Page 23: Database Systems: Design, Implementation, and Management

Database Systems 23

Page 24: Database Systems: Design, Implementation, and Management

When to Use Composite Primary Keys

(cont’d.)

• When used as identifiers of weak entities

normally used to represent:

– Real-world object that is existent-dependent on

another real-world object

– Real-world object that is represented in data

model as two separate entities in strong

identifying relationship

• Dependent entity exists only when it is related

to parent entity

Database Systems 24

Page 25: Database Systems: Design, Implementation, and Management

When To Use Surrogate Primary Keys

• Especially helpful when there is:

– No natural key

– Selected candidate key has embedded semantic

contents

– Selected candidate key is too long or

cumbersome

Database Systems 25

Page 26: Database Systems: Design, Implementation, and Management

When To Use Surrogate Primary Keys

(cont’d.)

• If you use surrogate key:

– Ensure that candidate key of entity in question

performs properly

– Use “unique index” and “not null” constraints

Database Systems 26

Page 27: Database Systems: Design, Implementation, and Management

Design Cases:

Learning Flexible Database Design

• Data modeling and design requires skills

acquired through experience

• Experience acquired through practice

• Four special design cases that highlight:

– Importance of flexible design

– Proper identification of primary keys

– Placement of foreign keys

Database Systems 27

Page 28: Database Systems: Design, Implementation, and Management

Design Case 1: Implementing 1:1

Relationships

• Foreign keys work with primary keys to properly

implement relationships in relational model

• Put primary key of the “one” side on the “many”

side as foreign key

– Primary key: parent entity

– Foreign key: dependent entity

Database Systems 28

Page 29: Database Systems: Design, Implementation, and Management

Design Case 1: Implementing 1:1

Relationships (cont’d.)

• In 1:1 relationship, there are two options:

– Place a foreign key in both entities (not

recommended)

– Place a foreign key in one of the entities

• Primary key of one of the two entities appears as

foreign key of other

Database Systems 29

Page 30: Database Systems: Design, Implementation, and Management

Database Systems 30

Page 31: Database Systems: Design, Implementation, and Management

Database Systems 31

Page 32: Database Systems: Design, Implementation, and Management

Design Case 2: Maintaining History of

Time-Variant Data • Normally, existing attribute values are replaced

with new value without regard to previous value

• Time-variant data:

– Values change over time

– Must keep a history of data changes

• Keeping history of time-variant data equivalent

to having a multivalued attribute in your entity

• Must create new entity in 1:M relationships with

original entity

• New entity contains new value, date of change

Database Systems 32

Page 33: Database Systems: Design, Implementation, and Management

Database Systems 33

Page 34: Database Systems: Design, Implementation, and Management

Database Systems 34

Page 35: Database Systems: Design, Implementation, and Management

Design Case 3: Fan Traps

• Design trap occurs when relationship is

improperly or incompletely identified

– Represented in a way not consistent with the

real world

• Most common design trap is known as fan trap

• Fan trap occurs when one entity is in two 1:M

relationships to other entities

– Produces an association among other entities

not expressed in the model

Database Systems 35

Page 36: Database Systems: Design, Implementation, and Management

Database Systems 36

Page 37: Database Systems: Design, Implementation, and Management

Database Systems 37

Page 38: Database Systems: Design, Implementation, and Management

Design Case 4:

Redundant Relationships

• Redundancy is seldom a good thing in

database environment

• Occurs when there are multiple relationship

paths between related entities

• Main concern is that redundant relationships

remain consistent across model

• Some designs use redundant relationships to

simplify the design

Database Systems 38

Page 39: Database Systems: Design, Implementation, and Management

Database Systems 39

Page 40: Database Systems: Design, Implementation, and Management

Summary

• Extended entity relationship (EER) model adds

semantics to ER model

– Adds semantics via entity supertypes, subtypes,

and clusters

– Entity supertype is a generic entity type related

to one or more entity subtypes

• Specialization hierarchy

– Depicts arrangement and relationships between

entity supertypes and entity subtypes

• Inheritance means an entity subtype inherits

attributes and relationships of supertype Database Systems 40

Page 41: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Subtype discriminator determines which entity

subtype the supertype occurrence is related to:

– Partial or total completeness

– Specialization vs. generalization

• Entity cluster is “virtual” entity type

– Represents multiple entities and relationships in

ERD

– Formed by combining multiple interrelated

entities and relationships into a single object

Database Systems 41

Page 42: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Natural keys are identifiers that exist in real

world

– Sometimes make good primary keys

• Characteristics of primary keys:

– Must have unique values

– Should be nonintelligent

– Must not change over time

– Preferably numeric or composed of single

attribute

Database Systems 42

Page 43: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Composite keys are useful to represent

– M:N relationships

– Weak (strong-identifying) entities

• Surrogate primary keys are useful when no

suitable natural key makes primary key

• In a 1:1 relationship, place the PK of mandatory

entity:

– As FK in optional entity

– As FK in entity that causes least number of nulls

– As FK where the role is played Database Systems 43

Page 44: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Time-variant data

– Data whose values change over time

– Requires keeping a history of changes

• To maintain history of time-variant data:

– Create entity containing the new value, date of

change, other time-relevant data

– Entity maintains 1:M relationship with entity for

which history maintained

Database Systems 44

Page 45: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Fan trap:

– One entity in two 1:M relationships to other

entities

– Association among the other entities not

expressed in model

• Redundant relationships occur when multiple

relationship paths between related entities

– Main concern is that they remain consistent

across the model

• Data modeling checklist provides way to check

that the ERD meets minimum requirements Database Systems 45

Page 46: Database Systems: Design, Implementation, and Management

Database Systems:

Design, Implementation, and

Management

Chapter 6

Normalization of Database Tables

Page 47: Database Systems: Design, Implementation, and Management

Database Systems 47

Objectives

• In this chapter, students will learn:

– What normalization is and what role it plays in the database design process

– About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF

– How normal forms can be transformed from lower normal forms to higher normal forms

– That normalization and ER modeling are used concurrently to produce a good database design

– That some situations require denormalization to generate information efficiently

Page 48: Database Systems: Design, Implementation, and Management

Database Tables and Normalization

• Normalization

– Process for evaluating and correcting table

structures to minimize data redundancies

• Reduces data anomalies

– Series of stages called normal forms:

• First normal form (1NF)

• Second normal form (2NF)

• Third normal form (3NF)

Database Systems 48

Page 49: Database Systems: Design, Implementation, and Management

Database Tables and Normalization

(cont’d.)

• Normalization (continued)

– 2NF is better than 1NF; 3NF is better than 2NF

– For most business database design purposes,

3NF is as high as needed in normalization

– Highest level of normalization is not always most

desirable

• Denormalization produces a lower normal form

– Increased performance but greater data

redundancy

Database Systems 49

Page 50: Database Systems: Design, Implementation, and Management

The Need for Normalization

• Example: company that manages building

projects

– Charges its clients by billing hours spent on

each contract

– Hourly billing rate is dependent on employee’s

position

– Periodically, report is generated that contains

information such as displayed in Table 6.1

Database Systems 50

Page 51: Database Systems: Design, Implementation, and Management

Database Systems 51

Page 52: Database Systems: Design, Implementation, and Management

The Need for Normalization (cont’d.)

• Structure of data set in Figure 6.1 does not

handle data very well

• Table structure appears to work; report is

generated with ease

• Report may yield different results depending on

what data anomaly has occurred

• Relational database environment is suited to

help designer avoid data integrity problems

Database Systems 52

Page 53: Database Systems: Design, Implementation, and Management

The Normalization Process

• Each table represents a single subject

• No data item will be unnecessarily stored in

more than one table

• All nonprime attributes in a table are dependent

on the primary key

• Each table is void of insertion, update, and

deletion anomalies

Database Systems 53

Page 54: Database Systems: Design, Implementation, and Management

Database Systems 54

Page 55: Database Systems: Design, Implementation, and Management

Database Systems 55

The Normalization Process (cont’d.)

• Objective of normalization is to ensure that all

tables are in at least 3NF

• Higher forms are not likely to be encountered in

business environment

• Normalization works one relation at a time

• Progressively breaks table into new set of

relations based on identified dependencies

Page 56: Database Systems: Design, Implementation, and Management

Database Systems 56

Page 57: Database Systems: Design, Implementation, and Management

The Normalization Process (cont’d.)

• Partial dependency

– Exists when there is a functional dependence in

which the determinant is only part of the primary

key

• Transitive dependency

– Exists when there are functional dependencies

such that X → Y, Y → Z, and X is the primary

key

Database Systems 57

Page 58: Database Systems: Design, Implementation, and Management

Conversion to First Normal Form

• Repeating group

– Group of multiple entries of same type can exist

for any single key attribute occurrence

• Relational table must not contain repeating

groups

• Normalizing table structure will reduce data

redundancies

• Normalization is three-step procedure

Database Systems 58

Page 59: Database Systems: Design, Implementation, and Management

Conversion to First Normal Form

(cont’d.)

• Step 1: Eliminate the Repeating Groups

– Eliminate nulls: each repeating group attribute

contains an appropriate data value

• Step 2: Identify the Primary Key

– Must uniquely identify attribute value

– New key must be composed

• Step 3: Identify All Dependencies

– Dependencies are depicted with a diagram

Database Systems 59

Page 60: Database Systems: Design, Implementation, and Management

Database Systems 60

Page 61: Database Systems: Design, Implementation, and Management

Conversion to First Normal Form

(cont’d.)

• Dependency diagram:

– Depicts all dependencies found within given

table structure

– Helpful in getting bird’s-eye view of all

relationships among table’s attributes

– Makes it less likely that you will overlook an

important dependency

Database Systems 61

Page 62: Database Systems: Design, Implementation, and Management

Database Systems 62

Page 63: Database Systems: Design, Implementation, and Management

Conversion to First Normal Form

(cont’d.)

• First normal form describes tabular format:

– All key attributes are defined

– No repeating groups in the table

– All attributes are dependent on primary key

• All relational tables satisfy 1NF requirements

• Some tables contain partial dependencies

– Dependencies are based on part of the primary

key

– Should be used with caution

Database Systems 63

Page 64: Database Systems: Design, Implementation, and Management

Conversion to Second Normal Form

• Step 1: Make New Tables to Eliminate Partial

Dependencies

– Write each key component on separate line,

then write original (composite) key on last line

– Each component will become key in new table

• Step 2: Reassign Corresponding Dependent

Attributes

– Determine attributes that are dependent on

other attributes

– At this point, most anomalies have been

eliminated Database Systems 64

Page 65: Database Systems: Design, Implementation, and Management

Database Systems 65

Page 66: Database Systems: Design, Implementation, and Management

Conversion to Second Normal Form

(cont’d.)

• Table is in second normal form (2NF) when:

– It is in 1NF and

– It includes no partial dependencies:

• No attribute is dependent on only portion of

primary key

Database Systems 66

Page 67: Database Systems: Design, Implementation, and Management

Conversion to Third Normal Form

• Step 1: Make New Tables to Eliminate

Transitive Dependencies

– For every transitive dependency, write its

determinant as PK for new table

– Determinant: any attribute whose value

determines other values within a row

Database Systems 67

Page 68: Database Systems: Design, Implementation, and Management

Conversion to Third Normal Form

(cont’d.)

• Step 2: Reassign Corresponding Dependent

Attributes

– Identify attributes dependent on each

determinant identified in Step 1

• Identify dependency

– Name table to reflect its contents and function

Database Systems 68

Page 69: Database Systems: Design, Implementation, and Management

Database Systems 69

Page 70: Database Systems: Design, Implementation, and Management

Conversion to Third Normal Form

(cont’d.)

• A table is in third normal form (3NF) when both

of the following are true:

– It is in 2NF

– It contains no transitive dependencies

Database Systems 70

Page 71: Database Systems: Design, Implementation, and Management

Improving the Design

• Table structures should be cleaned up to

eliminate initial partial and transitive

dependencies

• Normalization cannot, by itself, be relied on to

make good designs

• Valuable because it helps eliminate data

redundancies

Database Systems 71

Page 72: Database Systems: Design, Implementation, and Management

Improving the Design (cont’d.)

• Issues to address, in order, to produce a good

normalized set of tables:

– Evaluate PK Assignments

– Evaluate Naming Conventions

– Refine Attribute Atomicity

– Identify New Attributes

Database Systems 72

Page 73: Database Systems: Design, Implementation, and Management

Improving the Design (cont’d.)

– Identify New Relationships

– Refine Primary Keys as Required for Data

Granularity

– Maintain Historical Accuracy

– Evaluate Using Derived Attributes

Database Systems 73

Page 74: Database Systems: Design, Implementation, and Management

Database Systems 74

Page 75: Database Systems: Design, Implementation, and Management

Database Systems 75

Page 76: Database Systems: Design, Implementation, and Management

Surrogate Key Considerations

• When primary key is considered to be

unsuitable, designers use surrogate keys

• Data entries in Table 6.4 are inappropriate

because they duplicate existing records

– No violation of entity or referential integrity

Database Systems 76

Page 77: Database Systems: Design, Implementation, and Management

Higher-Level Normal Forms

• Tables in 3NF perform suitably in business

transactional databases

• Higher-order normal forms are useful on

occasion

• Two special cases of 3NF:

– Boyce-Codd normal form (BCNF)

– Fourth normal form (4NF)

Database Systems 77

Page 78: Database Systems: Design, Implementation, and Management

The Boyce-Codd Normal Form

• Every determinant in table is a candidate key

– Has same characteristics as primary key, but for

some reason, not chosen to be primary key

• When table contains only one candidate key,

the 3NF and the BCNF are equivalent

• BCNF can be violated only when table contains

more than one candidate key

Database Systems 78

Page 79: Database Systems: Design, Implementation, and Management

The Boyce-Codd Normal Form

(cont’d.)

• Most designers consider the BCNF as a special

case of 3NF

• Table is in 3NF when it is in 2NF and there are

no transitive dependencies

• Table can be in 3NF and fail to meet BCNF

– No partial dependencies, nor does it contain

transitive dependencies

– A nonkey attribute is the determinant of a key

attribute

Database Systems 79

Page 80: Database Systems: Design, Implementation, and Management

Fourth Normal Form (4NF)

• Table is in fourth normal form (4NF) when both

of the following are true:

– It is in 3NF

– No multiple sets of multivalued dependencies

• 4NF is largely academic if tables conform to

following two rules:

– All attributes dependent on primary key,

independent of each other

– No row contains two or more multivalued facts

about an entity Database Systems 80

Page 81: Database Systems: Design, Implementation, and Management

Normalization and Database Design

• Normalization should be part of the design

process

• Make sure that proposed entities meet required

normal form before table structures are created

• Many real-world databases have been

improperly designed or burdened with

anomalies

• You may be asked to redesign and modify

existing databases

Database Systems 81

Page 82: Database Systems: Design, Implementation, and Management

Normalization and Database Design

(cont’d.)

• ER diagram

– Identify relevant entities, their attributes, and

their relationships

– Identify additional entities and attributes

• Normalization procedures

– Focus on characteristics of specific entities

– Micro view of entities within ER diagram

• Difficult to separate normalization process from

ER modeling process

Database Systems 82

Page 83: Database Systems: Design, Implementation, and Management

Database Systems 83

Page 84: Database Systems: Design, Implementation, and Management

Database Systems 84

Page 85: Database Systems: Design, Implementation, and Management

Database Systems 85

Page 86: Database Systems: Design, Implementation, and Management

Database Systems 86

Page 87: Database Systems: Design, Implementation, and Management

Database Systems 87

Page 88: Database Systems: Design, Implementation, and Management

Denormalization

• Creation of normalized relations is important

database design goal

• Processing requirements should also be a goal

• If tables are decomposed to conform to

normalization requirements:

– Number of database tables expands

Database Systems 88

Page 89: Database Systems: Design, Implementation, and Management

Denormalization (cont’d.)

• Joining the larger number of tables reduces

system speed

• Conflicts are often resolved through

compromises that may include denormalization

• Defects of unnormalized tables:

– Data updates are less efficient because tables

are larger

– Indexing is more cumbersome

– No simple strategies for creating virtual tables

known as views

Database Systems 89

Page 90: Database Systems: Design, Implementation, and Management

Summary

• Normalization minimizes data redundancies

• First three normal forms (1NF, 2NF, and 3NF)

are most commonly encountered

• Table is in 1NF when:

– All key attributes are defined

– All remaining attributes are dependent on

primary key

Database Systems 90

Page 91: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Table is in 2NF when it is in 1NF and contains

no partial dependencies

• Table is in 3NF when it is in 2NF and contains

no transitive dependencies

• Table that is not in 3NF may be split into new

tables until all of the tables meet 3NF

requirements

• Normalization is important part—but only part—

of the design process

Database Systems 91

Page 92: Database Systems: Design, Implementation, and Management

Summary (cont’d.)

• Table in 3NF may contain multivalued

dependencies

– Numerous null values or redundant data

• Convert 3NF table to 4NF by:

– Splitting table to remove multivalued

dependencies

• Tables are sometimes denormalized to yield

less I/O, which increases processing speed

Database Systems 92