4. Databases Design

171
Connolly, Thomas and Begg, Carolyn. 2010. Database Systems: A Practical Approach to Design, Implementation, and Management. 5th Ed. Pearson Education. 4. Databases Design ITM-661 ระบบฐานข ้อมูล (Database system) 1

Transcript of 4. Databases Design

Page 1: 4. Databases Design

Connolly, Thomas and Begg, Carolyn. 2010. Database Systems: A Practical Approach to Design, Implementation, and Management. 5th Ed. Pearson Education.

4. Databases Design

ITM-661 ระบบฐานขอมล (Database system) 1

Page 2: 4. Databases Design

“...ความผดพลาดลมเหลวของบคคลหรอภารกจตางๆนน สวนมากเกดจากมลเหตขอใหญคอความหลอกตวเอง หลอกกนและกน และเมอทาการงานโดยไมอาศยความจรงเปนหลก การดาเนนงานและการปรบปรงแกไขกผดพลาด ไมอาจทาใหงาน ใหตนเอง ประสบผลสาเรจทดได.

นกปฏบตงานเพอความสาเรจและความเจรญจงตองยอมรบความจรงและยดมนในความจรง มความจรงใจตอตวเองและตอกนและกนอยางมนคงตลอดเวลา. แตละคนจงจะปฏบตตวปฏบตงานไดอยางสะดวกใจมนใจ ถกตองเทยงตรง ตามเปาหมาย และพอเหมาะพอด แกฐานะ แกหนาท แกโอกาส พรอมทกอยางได ยงผลใหการสรางสรรคความดความเจรญบรรลศภผลอนพงประสงค...”

คดตดตอนจากพระบรมราโชวาทในพธพระราชทานปรญญาบตร

จฬาลงกรณมหาวทยาลย

วนท ๑๕ กรกฎาคม ๒๕๒๖

2

Page 3: 4. Databases Design

3

Design Methodology

A structured approach that uses procedures, techniques, tools, and documentation aids to support and facilitate the process of design.

Page 4: 4. Databases Design

4

Database Design Methodology

Three main phases Conceptual database design Logical database design Physical database design

Page 5: 4. Databases Design

5

Conceptual Database Design

The process of constructing a model of the data used in an enterprise, independent of all physical considerations.

Page 6: 4. Databases Design

6

Logical Database Design

The process of constructing a model of the data used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations.

Page 7: 4. Databases Design

7

Physical Database Design

The process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and indexes design used to achieve efficient access to the data, and any associated integrity constraints and security measures.

Page 8: 4. Databases Design

8

Critical Success Factors in Database Design

Work interactively with the users as much as possible. Follow a structured methodology throughout the data modeling

process. Employ a data-driven approach. Incorporate structural and integrity considerations into the data

models. Combine conceptualization, normalization, and transaction

validation techniques into the data modeling methodology.

Page 9: 4. Databases Design

9

Critical Success Factors in Database Design

Use diagrams to represent as much of the data models as possible.

Use a Database Design Language (DBDL) to represent additional data semantics.

Build a data dictionary to supplement the data model diagrams.

Be willing to repeat steps.

Page 10: 4. Databases Design

10

Overview Database Design Methodology

Conceptual database design Step 1 Build conceptual data model Step 1.1 Identify entity types Step 1.2 Identify relationship types Step 1.3 Identify and associate attributes with entity or

relationship types Step 1.4 Determine attribute domains Step 1.5 Determine candidate, primary, and alternate key

attributes

Page 11: 4. Databases Design

11

Overview Database Design Methodology

Step 1.6 Consider use of enhanced modeling concepts (optional step)

Step 1.7 Check model for redundancy Step 1.8 Validate conceptual model against user transactions Step 1.9 Review conceptual data model with user

Page 12: 4. Databases Design

12

Overview Database Design Methodology

Logical database design for the relational model Step 2 Build and validate logical data model Step 2.1 Derive relations for logical data model Step 2.2 Validate relations using normalization Step 2.3 Validate relations against user transactions Step 2.4 Define integrity constraints Step 2.5 Review logical data model with user Step 2.6 Merge logical data models into global model

(optional step) Step 2.7 Check for future growth

Page 13: 4. Databases Design

13

Overview Database Design Methodology

Physical database design for relational database Step 3 Translate logical data model for target DBMS Step 3.1 Design base relations Step 3.2 Design representation of derived data Step 3.3 Design general constraints

Page 14: 4. Databases Design

14

Overview Database Design Methodology

Step 4 Design file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organization Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements

Step 5 Design user views Step 6 Design security mechanisms Step 7 Consider the introduction of controlled redundancy Step 8 Monitor and tune the operational system

Page 15: 4. Databases Design

Conceptual database design 15

Page 16: 4. Databases Design

16

Step 1 Build Conceptual Data

To build a conceptual data model of the data requirements of the enterprise. Model comprises entity types, relationship types, attributes and

attribute domains, primary and alternate keys, and integrity constraints.

Page 17: 4. Databases Design

17

1.1 Identify entity types To identify the required entity types.

1.2 Identify relationship types To identify the important relationships that exist between

the entity types.

Page 18: 4. Databases Design

Extract from data dictionary for Staff user views of DreamHome showing description of entities

Page 19: 4. Databases Design

19

First-cut ERD for Staff user views of DreamHome

Page 20: 4. Databases Design

20

Extract from data dictionary for Staff user views of DreamHome showing description of relationships

Page 21: 4. Databases Design

21

1.3 Identify and associate attributes with entity or relationship types To associate attributes with the appropriate entity or

relationship types and document the details of each attribute.

Page 22: 4. Databases Design

22

1.4 Determine attribute domains To determine domains for the attributes in the data model

and document the details of each domain.

1.5 Determine candidate, primary, and alternate key attributes To identify the candidate key(s) for each entity and if there is

more than one candidate key, to choose one to be the primary key and the others as alternate keys.

Page 23: 4. Databases Design

23

Extract from data dictionary for Staff user views of DreamHome showing description of attributes

Page 24: 4. Databases Design

24

ER diagram for Staff user views of DreamHome with primary keys added

Page 25: 4. Databases Design

25

1.6 Consider use of enhanced modeling concepts (optional step) To consider the use of enhanced modeling concepts, such

as specialization / generalization, aggregation, and composition.

Page 26: 4. Databases Design

26

Revised ER diagram for Staff user views of DreamHome with specialization / generalization

Page 27: 4. Databases Design

27

1.7 Check model for redundancy To check for the presence of any redundancy in the model

and to remove any that does exist.

Page 28: 4. Databases Design

28

Removing a redundant relationship called Rents

Page 29: 4. Databases Design

29

a non-redundant relationship FatherOf

Page 30: 4. Databases Design

30

1.8 Validate conceptual model against user transactions To ensure that the conceptual model supports the required

transactions.

Page 31: 4. Databases Design

31

ตวอยาง Transaction requirement

Data entry Enter the details of new property and the owner Enter the details of a new client

Data update / deletion Update/delete the details of a property Update/delete the details of a property owner

Page 32: 4. Databases Design

32

Data queries List details of staff supervised by a named supervisor at

the branch List detail of all Assistants, alphabetically by name at the

branch List the details of property available for rent at the

branch, along with the owner’s details List the details of properties managed by a named

member of staff at the branch List the clients registering at the branch and the names of

staffs who registered the clients

Page 33: 4. Databases Design

33

Using pathways to check that the conceptual model supports the user transactions

(a) List detail of staff supervised by a named Supervisor at the branch

(b) List detail of All Assistants, alphabetically by name at the branch

(c) List the details of property available for rent at the branch, along with the owner’s details

(d) List the details of properties managed by a named member of staff at the branch

(e) List the clients registering at the branch and the names of staffs who registered the clients

Page 34: 4. Databases Design

34

1.9 Review conceptual data model with user To review the conceptual data model with the user to ensure

that the model is a ‘true’ representation of the data requirements of the enterprise.

Page 35: 4. Databases Design

35

Conceptual data model for Staff user view showing all attributes

Page 36: 4. Databases Design

Logical Database Design 36

Page 37: 4. Databases Design

37

"...ผมปญญาและความรด เพราะมโอกาสไดศกษาเลาเรยนมามากกวาผอน ยอมมหนาทและความรบผดชอบเปนพเศษ ทจะตองทาตวทางานใหเปนประโยชนแกประเทศชาตและประชาชน. การทจะกระทาใหไดผลประโยชนดงนน จาเปนทแตละคนจะตองรซ งถงประโยชนทแทเปนเบองตนกอน. ประโยชนทแทนนมอย ๒ อยาง คอ ประโยชนสวนตว ททกคนมสทธจะแสวงหาและไดรบ แตตองดวยวถทางทสจรตและเปนธรรม กบประโยชนสวนรวม ซงเปนประโยชนของชาตทแตละคนมสวนรวมอย.

การทางานทกอยางจะตองใหไดประโยชนแททงสวนตวและสวนรวม ประโยชนนนจงจะสมบรณและมนคงถาวร เปนผลดแกชาตบานเมองอยางแทจรง..."

พระบรมราโชวาทในพธพระราชทานปรญญาบตร

มหาวทยาลยมหดล วนท ๕ กรกฎาคม ๒๕๓๙

Page 38: 4. Databases Design

38

Objectives

How to derive a set of relations from a conceptual data model.

How to validate these relations using the technique of normalization.

How to validate a logical data model to ensure it supports the required transactions.

Page 39: 4. Databases Design

39

Objectives

How to merge local logical data models based on one or more user views into a global logical data model that represents all user views.

How to ensure that the final logical data model is a true and accurate representation of the data requirements of the enterprise.

Page 40: 4. Databases Design

40

Step 2 Build and Validate Logical Data Model

To translate the conceptual data model into a logical data model and then to validate this model to check that it is structurally correct using normalization and supports the required transactions.

Page 41: 4. Databases Design

41

Step 2.1 Derive relations for logical data model To create relations for the logical data model to represent

the entities, relationships, and attributes that have been identified.

Step 2 Build and Validate Logical Data Model

Page 42: 4. Databases Design

42

Conceptual data model for Staff view showing all attributes

Page 43: 4. Databases Design

43

Step 2.1 Derive relations for logical data model

(1) Strong entity types For each strong entity in the data model, create a

relation that includes all the simple attributes of that entity.

For composite attributes, include only the constituent simple attributes.

Client (clientno, fname, lname, telno)

Page 44: 4. Databases Design

44

Step 2.1 Derive relations for logical data model

(2) Weak entity types For each weak entity in the data model, create a relation

that includes all the simple attributes of that entity. The primary key of a weak entity is partially or fully

derived from each owner entity the primary key of a weak entity cannot be made until

after all the relationships with the owner entities have been mapped.

Page 45: 4. Databases Design

45

ตวอยาง

ผปวย

รหส {pk} ชอ นามสกล เพศ

ประวตอาการ วนท BT BP อาการ

ม 1..* 1..1

ประวตอาการ (รหส, วนท, BT, BP, อาการ)

Page 46: 4. Databases Design

46

(3) One-to-many (1:*) binary relationship types the entity on the ‘one side’ of the relationship is designated

as the parent entity the entity on the ‘many side’ is designated as the child

entity. To represent this relationship, post a copy of the primary

key attribute(s) of parent entity into the relation representing the child entity, to act as a foreign key.

Step 2.1 Derive relations for logical data model

Page 47: 4. Databases Design

47

Propertyforrent (propertyno, street, city, postcode, type, rooms, rent, ownerno)

Owner (ownerno, address, telno)

parent

child

Page 48: 4. Databases Design

48

(4) One-to-one (1:1) binary relationship types Creating relations to represent a 1:1 relationship is more

complex as the cardinality cannot be used to identify the parent and child entities in a relationship.

Step 2.1 Derive relations for logical data model

Page 49: 4. Databases Design

49

Instead, the participation constraints are used to decide whether it is best to represent the relationship by combining the entities involved into one relation or by creating two relations and posting a copy of the

primary key from one relation to the other.

Page 50: 4. Databases Design

50

Consider the following (a) mandatory participation on both sides of 1:1

relationship; (b) mandatory participation on one side of 1:1

relationship; (c) optional participation on both sides of 1:1

relationship.

Step 2.1 Derive relations for logical data model

Page 51: 4. Databases Design

51

(a) Mandatory participation on both sides of 1:1 relationship Combine entities involved into one relation and choose

one of the primary keys of original entities to be primary key of the new relation, while the other (if one exists) is used as an alternate key.

Step 2.1 Derive relations for logical data model

Page 52: 4. Databases Design

52

Client (clientno, fname, lname, telno, no, name)

no {pk} name

project

offer

Page 53: 4. Databases Design

53

(b) Mandatory participation on one side of a 1:1 relationship Identify parent and child entities using participation

constraints. Entity with optional participation in relationship is

designated as parent entity entity with mandatory participation is designated as child

entity.

Step 2.1 Derive relations for logical data model

Page 54: 4. Databases Design

54

A copy of primary key of the parent entity is placed in the relation representing the child entity.

If the relationship has one or more attributes, these attributes should follow the posting of the primary key to the child relation.

Page 55: 4. Databases Design

55

Client (clientno, fname, lname, telno, no) Project (no, name)

no {pk} name

project

offer

0..1

Page 56: 4. Databases Design

56

(c) Optional participation on both sides of a 1:1 relationship In this case, the designation of the parent and child

entities is arbitrary unless we can find out more about the relationship that can help a decision to be made one way or the other.

Step 2.1 Derive relations for logical data model

Page 57: 4. Databases Design

57

Client (clientno, fname, lname, telno) Project(no, name, clientno) หรอ Client (clientno, fname, lname, telno, no) Project (no, name)

no {pk} name

project

offer

0..1

0..1

Page 58: 4. Databases Design

58

(5) One-to-one (1:1) recursive relationships For a 1:1 recursive relationship, follow the rules for

participation as described above for a 1:1 relationship. mandatory participation on both sides, represent the

recursive relationship as a single relation with two copies of the primary key.

Step 2.1 Derive relations for logical data model

Page 59: 4. Databases Design

59

mandatory participation on only one side, option to create a single relation with two copies of the primary key,

or a new relation to represent the relationship. The new

relation would only have two attributes, both copies of the primary key.

Step 2.1 Derive relations for logical data model

Page 60: 4. Databases Design

60

optional participation on both sides, again create a new relation as described above.

Page 61: 4. Databases Design

61

(6) Superclass/subclass relationship types Identify superclass entity as parent entity and subclass entity as the child entity. There are various options on how to represent such a

relationship as one or more relations.

Step 2.1 Derive relations for logical data model

Page 62: 4. Databases Design

62

The selection of the most appropriate option is dependent on a number of factors such as the disjointness and participation constraints on the superclass/subclass relationship, whether the subclasses are involved in distinct relationships, and the number of participants in the superclass/subclass relationship.

Step 2.1 Derive relations for logical data model

Page 63: 4. Databases Design

Guidelines for representation of superclass / subclass relationship

Page 64: 4. Databases Design

64

{mandatory, and}

Page 65: 4. Databases Design

65

{optional, and}

Page 66: 4. Databases Design

66

Privateowner (ownerno, address, telno, fname, lname) Businessowner (ownerno, address, bname, btype, contactname)

Page 67: 4. Databases Design

67

{optional, or}

Page 68: 4. Databases Design

68

Representation of superclass /

subclass relationship

based on participation and

disjointness

Page 69: 4. Databases Design

69

(7) Many-to-many (*:*) binary relationship types Create a relation to represent the relationship and include

any attributes that are part of the relationship. We post a copy of the primary key attribute(s) of the entities that participate in the relationship into the new relation, to act as foreign keys.

These foreign keys will also form the primary key of the new relation, possibly in combination with some of the attributes of the relationship.

Step 2.1 Derive relations for logical data model

Page 70: 4. Databases Design

70

propertyforrent (propertyno, street, city, postcode, type, rooms, rent) client (clientno, fname, lname, telno) viewing (propertyno, clientno, viewdate, comment)

Page 71: 4. Databases Design

71

(8) Complex relationship types Create a relation to represent the relationship and include

any attributes that are part of the relationship. Post a copy of the primary key attribute(s) of the entities that participate in the complex relationship into the new relation, to act as foreign keys.

Any foreign keys that represent a ‘many’ relationship (for example, 1..*, 0..*) generally will also form the primary key of this new relation, possibly in combination with some of the attributes of the relationship.

Step 2.1 Derive relations for logical data model

Page 72: 4. Databases Design

72

แปลงได 4 ตารางคอ A (aid) B (bid) C (cid) D (aid, bid, cid)

A

aid {pk}

D

C

cid {pk}

n

n B

bid {pk}

n

Page 73: 4. Databases Design

73

แปลงได 3 ตารางคอ A (aid) B (bid, aid, cid) C (cid)

A

aid {pk}

D

C

cid {pk}

1

1 B

bid {pk}

n

Page 74: 4. Databases Design

74

แปลงได 4 ตารางคอ A (aid) B (bid) C (cid) D (aid, bid, cid)

A

aid {pk}

D

C

cid {pk}

n

1 B

bid {pk}

n

Page 75: 4. Databases Design

75

Step 2.1 Derive relations for logical data model

(9) Multi-valued attributes Create a new relation to represent multi-valued attribute

and include primary key of entity in new relation, to act as a foreign key.

Unless the multi-valued attribute is itself an alternate key of the entity, the primary key of the new relation is the combination of the multi-valued attribute and the primary key of the entity.

Page 76: 4. Databases Design

76

Summary of how to map entities and relationships to relations

Page 77: 4. Databases Design

77

Relations for the Staff user views of DreamHome

Page 78: 4. Databases Design

78

Page 79: 4. Databases Design

79

Step 2.2 Validate relations using normalization

To validate the relations in the logical data model using normalization.

Page 80: 4. Databases Design

80

Step 2.3 Validate relations against user transactions

To ensure that the relations in the logical data model support the required transactions.

Page 81: 4. Databases Design

81

Step 2.4 Check integrity constraints

To check integrity constraints are represented in the logical data model. This includes identifying: Required data Attribute domain constraints Multiplicity Entity integrity Referential integrity General constraints

Page 82: 4. Databases Design

82

Required data Some attribute must contain a valid value Not allowed to hold nulls

Page 83: 4. Databases Design

83

Attribute domain constraints Identify a set of values that are legal Sex of a member of staff is either ‘M’ or ‘F’

Page 84: 4. Databases Design

84

Multiplicity Constraints placed on relationships The requirements that a branch has many staff and a

member of staff and a member of staff works at a single branch

Page 85: 4. Databases Design

85

Entity integrity The PK of an entity cannot hold null Each tuple of the Staff relation must have a value for the

primary key attribute, staffNo

Page 86: 4. Databases Design

86

Referential integrity If the FK contains a value, that value must refer to an

existing tuple in the parent relation Issue whether nulls are allowed for FK how to ensure referential integrity

Page 87: 4. Databases Design

87

whether nulls are allowed for FK Can we store the details of a property for rent without having a

member of staff specified to manage it? Mandatory, then nulls are not allowed Optional, the nulls are allowed

Page 88: 4. Databases Design

88

how to ensure referential integrity

Specify existence constraints that define conditions under which a candidate key or FK must be inserted, updated or deleted.

Page 89: 4. Databases Design

89

1:*

parent

child

Page 90: 4. Databases Design

90

Case 1: insert tuple into child relation (PropertyForRent) Check that FK, staffNo, of the new PropertyForRent tuple is set

to null or to a value of an existing Staff tuple

Page 91: 4. Databases Design

91

Case 2: delete tuple from child relation (PropertyForRent) If a tuple of a child relation is deleted, referential integrity is

unaffected.

Page 92: 4. Databases Design

92

Case 3: update FK of child tuple (PropertyForRent) Similar to case 1. Check that the staffNo of the updated PropertyForRent

tuple is set to null or to a value of an existing Staff tuple

Page 93: 4. Databases Design

93

Case 4: Insert tuple into parent relation (staff) Does not affect referential integrity.

Page 94: 4. Databases Design

94

Case 5: delete tuple from parent relation (staff) If a tuple of a parent relation is deleted, referential integrity

is lost if there exists a child tuple referencing the deleted parent tuple.

Page 95: 4. Databases Design

95

NO ACTION Prevent a deletion from the parent relation if there are any

referenced child tuple Cannot delete a member of staff if he or she currently

manages any properties.

Page 96: 4. Databases Design

96

CASCADE Automatically delete any referenced child tuples Deleting a member of staff automatically deletes all

properties he or she manages This strategy would not be wise

Page 97: 4. Databases Design

97

SET NULL When a parent tuple is deleted, the FK values in all

corresponding child tuples are automatically set to null

Page 98: 4. Databases Design

98

SET DEFAULT When a parent tuple is deleted, the FK values in all

corresponding child tuples should automatically be set to their default values.

Page 99: 4. Databases Design

99

NO CHECK When a parent tuple is deleted, do nothing to ensure that

referential integrity is maintained

Page 100: 4. Databases Design

100

Case 6: update PK of parent tuple (staff) If the PK value of a parent relation tuple is updated,

referential integrity is lost if there exists a child tuple referncing the old PK value.

CASCADE strategy

Page 101: 4. Databases Design

101

Referential integrity constraints for relations in Staff user views of DreamHome

Page 102: 4. Databases Design

102

PropertyForRent(propertyNo, street, city, postcode, type, rooms, rent, ownerNo, staffNo)

Primary key propertyNo Foreign key ownerNo references PrivateOwner(ownerNo) and

BusinessOwner(ownerNo) ON UPDATE CASCADE ON DELETE NO ACTION

Page 103: 4. Databases Design

103

Referential integrity constraints for relations in Staff user views of DreamHome

Page 104: 4. Databases Design

104

Step 2.5 Review logical data model with user

To review the logical data model with the users to ensure that they consider the model to be a true representation of the data requirements of the enterprise.

Page 105: 4. Databases Design

105

Step 2.6 Merge logical data models into global Model (optional step)

To merge logical data models into a single global logical data model that represents all user views of a database.

Page 106: 4. Databases Design

106

Step 2.6.1 Merge local logical data models into global model

To merge local logical data model into a single global logical data model.

This activities in this step include: Step 2.6.1 Merge local logical data models into global model Step 2.6.2 Validate global logical data model Step 2.6.3 Review global logical data model with users.

Page 107: 4. Databases Design

107

(1) Review the names and contents of entities/relations and their candidate keys.

(2) Review the names and contents of relationships/foreign keys.

(3) Merge entities/relations from the local data models (4) Include (without merging) entities/relations unique to

each local data model (5) Merge relationships/foreign keys from the local data

models.

Step 2.6.1 Merge local logical data models into global model

Page 108: 4. Databases Design

108

(6) Include (without merging) relationships/foreign keys unique to each local data model.

(7) Check for missing entities/relations and relationships/foreign keys.

(8) Check foreign keys. (9) Check Integrity Constraints. (10) Draw the global ER/relation diagram (11) Update the documentation.

Step 2.6.1 Merge local logical data models into global model

Page 109: 4. Databases Design

109

Step 2.6.2 Validate global logical data model

To validate the relations created from the global logical data model using the technique of normalization and to ensure they support the required transactions, if necessary.

Page 110: 4. Databases Design

110

Step 2.6.3 Review global logical data model with users

To review the global logical data model with the users to ensure that they consider the model to be a true representation of the data requirements of an enterprise.

Page 111: 4. Databases Design

111

Relations for the Branch user views of DreamHome

Page 112: 4. Databases Design

112

Relations that represent the global logical data model for DreamHome

Page 113: 4. Databases Design

113

Global relation diagram for DreamHome

Page 114: 4. Databases Design

Physical Databases Design 114

Page 115: 4. Databases Design

115

"...การทางานใดๆ ไมวาใหญหรอเลก ควรอยางยงทจะตงเปาหมาย ขอบเขตและหลกการไวใหแนนอน เพราะจะชวยใหสามารถปฏบตมงเขาสผลสาเรจไดโดยตรงและถกตองพอเหมาะพอด เปนการปองกนและขจดความลาชา ความสนเปลอง ความเสยเปลาทกอยางไดสนเชง. และเมอปฏบตดาเนนงานสเปาหมายนน ผมการศกษาตองไมละทงหลกวชา ไมละทงความคดพจารณาตามเหตผลและความชอบธรรมถกตอง..."

พระบรมราโชวาทพระบาทสมเดจพระเจาอยหว

คดจากพระบรมราโชวาทในพธพระราชทานปรญญาบตร

จฬาลงกรณมหาวทยาลย วนศกรท ๑๗ กรกฎาคม ๒๕๓๐

Page 116: 4. Databases Design

116

Objectives

Purpose of physical database design.

How to map the logical database design to a physical database design.

How to design base relations for target DBMS.

How to design general constraints for target DBMS.

Page 117: 4. Databases Design

117

Objectives

How to select appropriate file organizations based on analysis of transactions.

When to use secondary indexes to improve performance.

How to estimate the size of the database.

How to design user views.

How to design security mechanisms to satisfy user requirements.

Page 118: 4. Databases Design

118

Logical v. Physical Database Design

Sources of information for physical design process includes logical data model and documentation that describes model.

Logical database design is concerned with the what, physical database design is concerned with the how.

Page 119: 4. Databases Design

119

Physical Database Design

Process of producing a description of the implementation of the database on secondary storage.

It describes the base relations, file organizations, and

indexes used to achieve efficient access to the data, and any associated integrity constraints and security measures.

Page 120: 4. Databases Design

120

Overview of Physical Database Design Methodology

Step 3 Translate logical data model for target DBMS Step 3.1 Design base relations Step 3.2 Design representation of derived data Step 3.3 Design general constraints

Step 4 Design file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements

Page 121: 4. Databases Design

121

Step 5 Design user views Step 6 Design security mechanisms Step 7 Consider the introduction of controlled

redundancy (lecture 8) Step 8 Monitor and tune operational system (lecture 8)

Overview of Physical Database Design Methodology

Page 122: 4. Databases Design

122

Step 3 Translate Logical Data Model for Target DBMS

To produce a relational database schema from the logical data model that can be implemented in the target DBMS.

Page 123: 4. Databases Design

123

Need to know functionality of target DBMS such as how to create base relations and whether the system supports the definition of: PKs, FKs, and AKs required data i.e. whether system supports NOT NULL Domains relational integrity constraints general constraints.

Page 124: 4. Databases Design

124

Step 3.1 Design base relations

To decide how to represent base relations identified in logical model in target DBMS.

For each relation, need to define:

the name of the relation a list of simple attributes in brackets; the PK and, where appropriate, AKs and FKs. referential integrity constraints for any FKs identified.

Page 125: 4. Databases Design

125

Step 3.1 Design base relations

From data dictionary, we have for each attribute: its domain, consisting of a data type, length, and any

constraints on the domain; an optional default value for the attribute; whether it can hold nulls; whether it is derived, and if so, how it should be computed.

Page 126: 4. Databases Design

126

สงทตองดาเนนการ

Implement base relations (slide ถดไป) Document design of base relation

Page 127: 4. Databases Design

127

DBDL for the Property ForRent Relation

Page 128: 4. Databases Design

128

Step 3.2 Design representation of derived data

To decide how to represent any derived data present in logical data model in target DBMS.

Examine logical data model and data dictionary, and produce

list of all derived attributes. Derived attribute can be stored in database or calculated

every time it is needed.

Page 129: 4. Databases Design

129

For example, the following are derived attribute Number of staff who work in a particular branch Total monthly salaries of all staff Number of properties that a member of staff handles

Often, derived attributes do not appear in the logical data model but are appear in the data dictionary.

Page 130: 4. Databases Design

130

step Examine the logical data model and the data dictionary Produce a list of all derived attributes From a physical database design perspective, whether a

derived attribute is Stored in the database OR Calculated every time it is needed

Page 131: 4. Databases Design

131

Step 3.2 Design representation of derived data

Option selected is based on: additional cost to store the derived data and keep it

consistent with operational data from which it is derived; cost to calculate it each time it is required.

Less expensive option is chosen subject to performance

constraints.

Page 132: 4. Databases Design

132

From previous example Store an additional attribute in the staff relation

(noOfProperties)

Page 133: 4. Databases Design

133

The additional storage overhead for this new derived attribute would not be particularly significant.

The attribute would need to be updated every time a member of staff were assigned to or deassigned from the a property.

In each case the number of noOfProperties is increased or decreased by 1

Page 134: 4. Databases Design

134

PropertyforRent Relation and Staff Relation with Derived Attribute noOfProperties

Page 135: 4. Databases Design

135

On the other hand, if the attribute is not stored directly in the staff relation, it must be calculated each time it is required.

This involves a join of the staff and PropertyForRent relations If this type of query is frequent, it is more appropriate to

store the derived attribute rather than calculate it each time.

Page 136: 4. Databases Design

136

It may be more appropriate to store derived attributes whenever the DBMS’s query language cannot easily cope with the algorithm to calculate the derived attribute.

For example SQL has limited set of aggregate functions

Page 137: 4. Databases Design

137

Step 3.3 Design general constraints

To design the general constraints for target DBMS. Some DBMS provide more facilities than others for defining

enterprise constraints. Example: CONSTRAINT StaffNotHandlingTooMuch

CHECK (NOT EXISTS ( SELECT staffNo FROM PropertyForRent GROUP BY staffNo HAVING COUNT(*) > 100))

Page 138: 4. Databases Design

138

The design of general constraints should be fully documented In particular, document the reasons for selecting one

approach where many alternative exist.

Page 139: 4. Databases Design

139

Step 4 Design File Organizations and Indexes

To determine optimal file organizations to store the base relations and the indexes that are required to achieve acceptable performance, that is, the way in which relations and tuples will be held on secondary storage

Must understand the typical workload that database must

support. Activities in this step Analyze transactions Choose file organization Choose indexes Estimates disk space requirements

Page 140: 4. Databases Design

140

Step 4.1 Analyze transactions

To understand the functionality of the transactions that will run on the database and to analyze the important transactions.

Attempt to identify performance criteria, such as: transactions that run frequently and will have a significant

impact on performance; transactions that are critical to the business; times during the day/week when there will be a high

demand made on the database (called the peak load).

Page 141: 4. Databases Design

141

Use this information to identify the parts of the database that may cause performance problems.

Also need to know high-level functionality of the transactions, such as: attributes that are updated search criteria used in a query

Page 142: 4. Databases Design

142

Often not possible to analyze all transactions, so investigate most ‘important’ ones.

To help identify these can use: transaction/relation cross-reference matrix, showing

relations that each transaction accesses, and/or transaction usage map, indicating which relations are

potentially heavily used.

Page 143: 4. Databases Design

143

To focus on areas that may be problematic:

(1) Map all transaction paths to relations. (2) Determine which relations are most frequently accessed by

transactions. (3) Analyze the data usage of selected transactions that involve

these relations.

Page 144: 4. Databases Design

144

(1) Map all transaction paths to relations. transaction/relation cross-reference matrix

Page 145: 4. Databases Design

145

ตวอยาง

StaffClient view A – Enter the details for a new property and the owner (such

as details of property number PG4 in Glasgow owned by Tina Murphy)

B – Update/delete the details of a property C – identify total number of staff in each position at branches

in Glasgow. Branch view D – list the property number, address, type, and rent of all

properties in Glasgow, ordered by rent. E – list the details of properties for rent managed by a named

member of staff. ….

Page 146: 4. Databases Design

146

Cross-referencing transactions and relations

Page 147: 4. Databases Design

147

(2) Determine which relations are most frequently accessed by transactions. Example Transaction Usage

Map

Page 148: 4. Databases Design

148

(3) Analyze the data usage of selected transactions that involve these relations.

Determine: relations and attributes accessed by the transaction and

the type of access e.g. Attributes that are updated may be candidates for

avoiding secondary index

Page 149: 4. Databases Design

149

attributes used in a predicates (conditions) in SQL may be candidates for access structure. Pattern matching: name LIKE ‘%smith%’ Range searches: salary BETWEEN 10000 AND 20000 Exact-match: salary = 25000

Page 150: 4. Databases Design

150

For a query, the attributes that are involved in the join of two or more relations may be candidates for access structure.

The expected frequency (average) at which the transaction will run

Performance goals for the transaction e.g. the transaction must complete within 1 second should have higher priority for access structures

Page 151: 4. Databases Design

151

Example Transaction Analysis Form

Page 152: 4. Databases Design

152

The form show Any predicates that will be used Any attributes that will be required to join relations

together (query) Attributes used to order results (query) Attributes used to group data together (query) Any built-in functions that may be used Any attributes that will be updated

Page 153: 4. Databases Design

153

Step 4.2 Choose file organizations

To determine an efficient file organization for each base relation.

e.g. If we want to retrieve staff tuples in alphabetical order of name, sorting the file by staff name is a good file organization.

To choose an optimal file organization for each relation File organizations include Heap, Hash, Indexed Sequential

Access Method (ISAM), B+-Tree, and Clusters. Some DBMSs may not allow selection of file organizations.

Page 154: 4. Databases Design

154

Step 4.3 Choose indexes

To determine whether adding indexes will improve the performance of the system.

One approach is to keep the tuples unordered and create as

many secondary indexes as necessary.

Secondary index = An index that provides an alternate method of accessing records or portions of records in a data base

Page 155: 4. Databases Design

155

Another approach is to order tuples in the relation by specifying a primary or clustering index.

Primary index = An index that holds the values of primary keys, in sequence

Essentially a clustered index defines the physical ordering of the data in the table. Obviously the data in the table can only be ordered once

Page 156: 4. Databases Design

156

In this case, choose the attribute for ordering or clustering the tuples as: attribute that is used most often for join operations - this

makes join operation more efficient, or

attribute that is used most often to access the tuples in a relation in order of that attribute.

Page 157: 4. Databases Design

157

If ordering attribute chosen is key of relation, index will be a primary index; otherwise, index will be a clustering index.

Each relation can ONLY have either a primary index or a clustering index.

Page 158: 4. Databases Design

158

Choosing secondary index Secondary indexes provide a mechanism for specifying an

additional key for a base relation that can be used to retrieve data more efficiently.

The PropertyForRent relation may be hashed on the property number, the primary index.

There may be frequent access to this relation based on the rent attribute. We may decide to add rent as a secondary index.

Have to balance overhead involved in maintenance and use of secondary indexes against performance improvement gained when retrieving data.

Page 159: 4. Databases Design

159

This includes: adding an index record to every secondary index whenever

tuple is inserted; updating secondary index when corresponding tuple

updated; increase in disk space needed to store secondary index; possible performance degradation during query

optimization to consider all secondary indexes.

Page 160: 4. Databases Design

160

file organization keep the tuples unordered and create as many secondary

index as necessary Order the tuples by specifying primary index and clustering

index สงทตองดาเนนการ Specify indexes Choose secondary indexes Update the database statistics Document choice of indexes

Page 161: 4. Databases Design

161

Step 4.3 Guidelines for choosing ‘wish-list’

1. Do not index small relations.

2. Index PK of a relation if it is not a key of the file organization. 3. Add secondary index to a FK if it is frequently accessed. For example, frequent join between PropertyForRent and

PrivateOwner/BusinessOwner tables, we can use ownerNo, FK of PropertyForRent table to be indexed

4. Add secondary index to any attribute heavily used as a

secondary key.

Page 162: 4. Databases Design

162

5. Add secondary index on attributes involved in: selection or join criteria; ORDER BY GROUP BY other operations involving sorting (such as UNION or

DISTINCT).

Page 163: 4. Databases Design

163

6. Add secondary index on attributes involved in built-in functions.

เชน SELECT branchNo, AVG(salary) FROM staff GROUP BY branchNo; ใช salary และ BranchNo

Page 164: 4. Databases Design

164

7. Add secondary index on attributes that could result in an index-only plan (คอ การตอบ query โดยใชขอมลจาก index อยางเดยว)

8. Avoid indexing an attribute or relation that is frequently

updated. 9. Avoid indexing an attribute if the query will retrieve a

significant proportion of the relation (เชน 25%). 10. Avoid indexing attributes that consist of long character

strings.

Page 165: 4. Databases Design

165

If the search criteria involve more than one predicate, and one of the terms contains an OR clause, and the term has no index/sort order, then adding indexes for other attributes is not going to help improve the speed of the query

Because of linear search is still required

SELECT * FROM PropertyForRent WHERE type = ‘FLAT’ OR rent > 500 OR room > 5

Page 166: 4. Databases Design

166

Guidelines for index in DreamHome Case Primary key is indexed Should indexing a field that Data type is text, number, currency or date/time The user anticipates searching for values stored in the

field The user anticipates sorting values in the field The user anticipates storing many different values in the

field.

Page 167: 4. Databases Design

167

Interaction between table

Table transaction field freq per day Staff a, d predicate: fname, lname 20 a join: staff on supervisorstaffno 20 b ordering: fname, lname 20 b predicate: position 20 Client e join: staff on staffno 1000-2000 j predicate: fname, lname 1000 Propertyforrent c predicate: rentfinish 5000-10000

Page 168: 4. Databases Design

168

Additional indexes to be created based on the query transactions for the staff view

Table Index Staff fname, lname position Client fname, lname PropertyForRent rentFinish city rent

Page 169: 4. Databases Design

169

Step 4.4 Estimate disk space requirements

To estimate the amount of disk space that will be required by the database.

The estimation based on Tuple size Number of tuple Growth factor

Page 170: 4. Databases Design

170

Step 5 Design User Views

To design the user views that were identified during the Requirements Collection and Analysis stage of the database system development lifecycle.

Director Manager Supervisor Assistance

Page 171: 4. Databases Design

171

Step 6 Design Security Measures

To design the security measures for the database as specified by the users.

System security Username and password

Data security Access and use of database object (relation and object)