Designing a Database Week 10: Database Schema...

12
1 Database Design — 1 CSC343 – Introduction to Databases Week 10: Database Design Database Design Database Design From an ER Schema to a Relational From an ER Schema to a Relational One One Restructuring an ER schema Restructuring an ER schema Performance Analysis Performance Analysis Analysis of Redundancies, Removing Analysis of Redundancies, Removing Generalizations Generalizations Translation into a Relational Schema Translation into a Relational Schema Database Design — 2 CSC343 – Introduction to Databases Supplier Part supplies Customer orders (1,N) (1,N) Part name Supplier Part Customer Part(Name,Description,Part# ) Supplier(Name , Addr) Customer(Name , Addr) Supplies(Name ,Part#, Date ) Orders(Name ,Part# ) Hierarchical Network Relational Supplier Customer Date (1,N) (1,1) Designing a Database Designing a Database Schema Schema Database Design — 3 CSC343 – Introduction to Databases (Relational) Database Design Given a conceptual schema (ER, but could also be a UML), generate a logical (relational) schema. This is not just a simple translation from one model to another for two main reasons: o not all the constructs of the Entity-Relationship model can be translated naturally into the relational model; o the schema must be restructured in such a way as to make the execution of the projected operations as efficient as possible. The topic is covered in Section 3.5 of the textbook. Database Design — 4 CSC343 – Introduction to Databases Database Design Process

Transcript of Designing a Database Week 10: Database Schema...

Page 1: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

1

Database Design — 1CSC343 – Introduction to Databases

Week 10: DatabaseDesign

Database DesignDatabase DesignFrom an ER Schema to a RelationalFrom an ER Schema to a Relational

OneOneRestructuring an ER schemaRestructuring an ER schema

Performance AnalysisPerformance AnalysisAnalysis of Redundancies, RemovingAnalysis of Redundancies, Removing

GeneralizationsGeneralizationsTranslation into a Relational SchemaTranslation into a Relational Schema

Database Design — 2CSC343 – Introduction to Databases

Supplier

Part

supplies

Customerorders(1,N)(1,N)

Partname

Supplier

Part Customer

Part(Name,Description,Part#)Supplier(Name, Addr)Customer(Name, Addr)Supplies(Name,Part#, Date)Orders(Name,Part#)

HierarchicalNetwork

Relational

Supplier Customer

Date

(1,N)

(1,1)

Designing a DatabaseDesigning a DatabaseSchemaSchema

Database Design — 3CSC343 – Introduction to Databases

(Relational) Database Design Given a conceptual schema (ER, but could also be

a UML), generate a logical (relational) schema. This is not just a simple translation from one

model to another for two main reasons:o not all the constructs of the Entity-Relationship

model can be translated naturally into therelational model;

o the schema must be restructured in such a wayas to make the execution of the projectedoperations as efficient as possible.

The topic is covered in Section 3.5 of the textbook.

Database Design — 4CSC343 – Introduction to Databases

DatabaseDesign

Process

Page 2: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

2

Database Design — 5CSC343 – Introduction to Databases

Logical Design Stepso It is helpful to divide the design into

two steps:o Restructuring of the Entity-

Relationship schema, based oncriteria for the optimization of theschema and the simplification of thefollowing step;

o Translation into the logicalmodel, based on the features of thelogical model (in our case, therelational model).

Database Design — 6CSC343 – Introduction to Databases

Performance Analysis An ER schema is restructured to optimize:

Cost of an operation (evaluated in terms of thenumber of occurrences of entities and relationshipsthat are visited during the execution of anoperation);

Storage requirements (evaluated in terms ofnumber of bytes necessary to store the datadescribed by the schema).

In order to study these parameters, we need to know:Projected volume of data;Projected operation characteristics.

Database Design — 7CSC343 – Introduction to Databases

Cost Model The cost of an operation is measured in terms of

the number of disk accesses required. A diskaccess is, generally, orders of magnitude moreexpensive than in-memory accesses, or CPUoperations.

For a coarse estimate of cost, we assume thata Read operation (for one entity or

relationship) requires 1 disk access;A Write operation (for one entity or

relationship) requires 2 disk accesses (readfrom disk, change, write back to disk).

There are many other cost models depending onuse and type of DBWarehouse (OLAP - On-Line Analysis

Processing)Operational DB (OLTP - On-Line Transaction

Processing)Database Design — 8CSC343 – Introduction to Databases

Employee-Department Example

Page 3: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

3

Database Design — 9CSC343 – Introduction to Databases

Typical Operations

Operation 1: Assign an employee to a project. Operation 2: Find an employee record, including her

department, and the projects she works for. Operation 3: Find records of employees for a

department. Operation 4: For each branch, retrieve its

departments, and for each department, retrieve thelast names of their managers, and the list of theiremployees.

Need operations and their volume/frequency

Database Design — 10CSC343 – Introduction to Databases

Workload Design During initial design and requirements

analysisEstimate operations and frequencyGross estimates at best

After database is operationalTools record actual workload

characteristics

Database Design — 11CSC343 – Introduction to Databases

AnalysisSteps

Database Design — 12CSC343 – Introduction to Databases

Analysis of RedundanciesA redundancy in a conceptual

schema corresponds to a piece ofinformation that can be derived(that is, obtained through a seriesof retrieval operations) from otherdata in the database.

An Entity-Relationship schemamay contain various forms ofredundancy.

Page 4: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

4

Database Design — 13CSC343 – Introduction to Databases

Examples of Redundancies

Database Design — 14CSC343 – Introduction to Databases

Deciding About Redundancies The presence of a redundancy in a database

may bean advantage: a reduction in the number of

accesses necessary to obtain the derivedinformation;

a disadvantage: because of larger storagerequirements, (but, usually at negligible cost) andthe necessity to carry out additional operationsin order to keep the derived data consistent.

The decision to maintain or eliminate aredundancy is made by comparing the cost ofoperations that involve the redundantinformation and the storage needed, in thecase of presence or absence of redundancy.

Database Design — 15CSC343 – Introduction to Databases

Cost Comparison: An Example

In this schema the attributeNumberOfInhabitants isredundant.

Database Design — 16CSC343 – Introduction to Databases

Removing Generalizations The relational model does not allow direct

representation of generalizations that maybe present in an E-R diagram.

For example, here is an ER schema withgeneralizations:

Page 5: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

5

Database Design — 17CSC343 – Introduction to Databases

Option 1

Option 2

PossibleRestructuring

s

Note!

Database Design — 18CSC343 – Introduction to Databases

...TwoMore...

Option 4 (combination)

Option 3

Database Design — 19CSC343 – Introduction to Databases

General Rules For RemovingGeneralization

Option 1 is convenient when the operations involvethe occurrences and the attributes of E0, E1 and E2more or less in the same way.

Option 2 is possible only if the generalizationsatisfies the coverage constraint (i.e., everyinstance of E0 is either an instance of E1 or E2) andis useful when there are operations that apply onlyto occurrences of E1 or E2.

Option 3 is useful when the generalization is notcoverage-compliant and the operations refer toeither occurrences and attributes of E1 (E2) or ofE0, and therefore make distinctions between childand parent entities.

Available options can be combined (see option 4)

Database Design — 20CSC343 – Introduction to Databases

Partitioning and Merging ofEntities and Relationships

Entities and relationships of an E-R schema can bepartitioned or merged to improve the efficiency ofoperations, using the following principle:

The same criteria with those discussed forredundancies are valid in making a decision aboutthis type of restructuring.

Accesses are reduced by separatingattributes of the same concept that areaccessed by different operations andby merging attributes of differentconcepts that are accessed by thesame operations.

Page 6: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

6

Database Design — 21CSC343 – Introduction to Databases

Example of Partitioning

Recall this is a weak entity

Database Design — 22CSC343 – Introduction to Databases

Deletion of Multi-Valued Attribute

Database Design — 23CSC343 – Introduction to Databases

Merging Entities

Two candidate keys

Database Design — 24CSC343 – Introduction to Databases

Partitioning of a Relationship

Suppose thatcompositionrepresentscurrent and

pastcompositions

of a team

Are these equivalent?

Page 7: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

7

Database Design — 25CSC343 – Introduction to Databases

Selecting a Primary Key Every relation must have a unique primary

key. The criteria for this decision are as follows:

Attributes with null values cannot form primarykeys;

One/few attributes is preferable to manyattributes;

Internal key preferable to external ones (weakentity);

A key that is used by many operations toaccess the instances of an entity is preferableto others.

At this stage, if none of the candidate keyssatisfies the above requirements, it may bebest to introduce a new attribute (e.g., socialinsurance #, student #,…)

Database Design — 26CSC343 – Introduction to Databases

Translation into a Logical Schema

The second step of logical design consists of atranslation between different data models.

Starting from an E-R schema, an equivalentrelational schema is constructed. By“equivalent”, we mean a schema capable ofrepresenting the same information.

We will deal with the translation problemsystematically, beginning with thefundamental case, that of entities linked bymany-to-many relationships.

Database Design — 27CSC343 – Introduction to Databases

Many-to-Many Relationships

Employee(Number, Surname, Salary)Project(Code, Name, Budget)

Participation(Number, Code, StartDate)

Database Design — 28CSC343 – Introduction to Databases

Many-to-Many RecursiveRelationships

Product(Code, Name, Cost)Composition(Part, SubPart, Quantity)

Page 8: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

8

Database Design — 29CSC343 – Introduction to Databases

Ternary Relationships

Supplier(SupplierID, SupplierName)Product(Code, Type)

Department(Name, Telephone)Supply(Supplier, Product, Department, Quantity)

Database Design — 30CSC343 – Introduction to Databases

One-to-Many Relationships

Player(Surname, DateOfBirth, Position)Team(Name, Town, TeamColours)

Contract(PlayerSurname, PlayerDateOfBirth,Team, Salary)

OR

Player(Surname, DateOfBirth, Position,TeamName, Salary)

Team(Name, Town, TeamColours)

Database Design — 31CSC343 – Introduction to Databases

Weak Entities

Student(RegistrationNumber, University,Surname, EnrolmentYear)

University(Name, Town, Address)

Database Design — 32CSC343 – Introduction to Databases

One-to-One Relationships

Head(Number, Name, Salary, Department,StartDate)

Department(Name, Telephone, Branch)OR

Head(Number, Name, Salary, StartDate)Department(Name, Telephone,

HeadNumber, Branch)

Page 9: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

9

Database Design — 33CSC343 – Introduction to Databases

Optional One-to-OneRelationships

Employee(Number, Name, Salary)Department(Name, Telephone, Branch, Head,

StartDate)Or, if both entities are optional

Employee(Number, Name, Salary)Department(Name, Telephone, Branch)

Management(Head, Department, StartDate)

Database Design — 34CSC343 – Introduction to Databases

A Sample ER Schema

Database Design — 35CSC343 – Introduction to Databases

Entities with Internal Identifiers

E3(A31, A32)E4(A41, A42)E5(A51, A52)

E6(A61, A62, A63)E3

E4

E5 E6

Database Design — 36CSC343 – Introduction to Databases

1-1 and Optional 1-1 Relationships

E5 E6

E3

E4

R3

R4

R5

E5(A51, A52, A61R3, A62R3, AR3, A61R4,A62R4, A61R5, A62R5, AR5)

1-1 or optional 1-1relationships can

lead to messytransformations

Page 10: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

10

Database Design — 37CSC343 – Introduction to Databases

Weak Entities

E5 E6

E3

E4

R3

R4

R5

R1

R6

E1

E2E1(A11, A51, A12)

E2(A21, A11, A51, A22)

Database Design — 38CSC343 – Introduction to Databases

Many-to-Many Relationships

E5E1

E2

E3

E4

E6

R3

R4

R5

R1

R6

R2

R2(A21, A11, A51, A31, A41, AR21, AR22)

Database Design — 39CSC343 – Introduction to Databases

Homework - Fill in a Translation

E1(A11, A51, A12)E2(A21, A11, A51, A22)

E3(A31, A32)E4(A41,A42)

E5(A51, A52, A61R3, A62R3, AR3, A61R4,A62R4, A61R5, A62R5, AR5)

E6(A61, A62, A63)R2(A21, A11, A51, A31, A41, AR21, AR22)

Database Design — 40CSC343 – Introduction to Databases

Summary of Transformation Rules

Page 11: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

11

Database Design — 41CSC343 – Introduction to Databases

...More Rules...

Database Design — 42CSC343 – Introduction to Databases

…Even More Rules...

Database Design — 43CSC343 – Introduction to Databases

…and the Last One...

Database Design — 44CSC343 – Introduction to Databases

The Training Company Revisited

Page 12: Designing a Database Week 10: Database Schema Designfaye/343/w08/lectures/wk10/10_DBDesignStG-4u… · Two candidate keys CSC343 ... Some tools allow direct connection with a DBMS

12

Database Design — 45CSC343 – Introduction to Databases

Logical Design Using CASE Tools The logical design phase is partially supported by

database design tools: the translation to the relational model is carried out

by such tools semi-automatically; the restructuring step is difficult to automate and

CASE tools provide little or no support for it. Most commercial CASE tools will generate automatically

SQL code for the creation of the database. Some tools allow direct connection with a DBMS and

can construct the corresponding databaseautomatically.

[CASE = Computer-Aided Software Engineering]

Database Design — 46CSC343 – Introduction to Databases

LogicalDesign with aCASETool