Designing a Database Week 10: Database Schema...
Transcript of Designing a Database Week 10: Database Schema...
1
Database Design — 1CSC343 – Introduction to Databases
Week 10: DatabaseDesign
Database DesignDatabase DesignFrom an ER Schema to a RelationalFrom an ER Schema to a Relational
OneOneRestructuring an ER schemaRestructuring an ER schema
Performance AnalysisPerformance AnalysisAnalysis of Redundancies, RemovingAnalysis of Redundancies, Removing
GeneralizationsGeneralizationsTranslation into a Relational SchemaTranslation into a Relational Schema
Database Design — 2CSC343 – Introduction to Databases
Supplier
Part
supplies
Customerorders(1,N)(1,N)
Partname
Supplier
Part Customer
Part(Name,Description,Part#)Supplier(Name, Addr)Customer(Name, Addr)Supplies(Name,Part#, Date)Orders(Name,Part#)
HierarchicalNetwork
Relational
Supplier Customer
Date
(1,N)
(1,1)
Designing a DatabaseDesigning a DatabaseSchemaSchema
Database Design — 3CSC343 – Introduction to Databases
(Relational) Database Design Given a conceptual schema (ER, but could also be
a UML), generate a logical (relational) schema. This is not just a simple translation from one
model to another for two main reasons:o not all the constructs of the Entity-Relationship
model can be translated naturally into therelational model;
o the schema must be restructured in such a wayas to make the execution of the projectedoperations as efficient as possible.
The topic is covered in Section 3.5 of the textbook.
Database Design — 4CSC343 – Introduction to Databases
DatabaseDesign
Process
2
Database Design — 5CSC343 – Introduction to Databases
Logical Design Stepso It is helpful to divide the design into
two steps:o Restructuring of the Entity-
Relationship schema, based oncriteria for the optimization of theschema and the simplification of thefollowing step;
o Translation into the logicalmodel, based on the features of thelogical model (in our case, therelational model).
Database Design — 6CSC343 – Introduction to Databases
Performance Analysis An ER schema is restructured to optimize:
Cost of an operation (evaluated in terms of thenumber of occurrences of entities and relationshipsthat are visited during the execution of anoperation);
Storage requirements (evaluated in terms ofnumber of bytes necessary to store the datadescribed by the schema).
In order to study these parameters, we need to know:Projected volume of data;Projected operation characteristics.
Database Design — 7CSC343 – Introduction to Databases
Cost Model The cost of an operation is measured in terms of
the number of disk accesses required. A diskaccess is, generally, orders of magnitude moreexpensive than in-memory accesses, or CPUoperations.
For a coarse estimate of cost, we assume thata Read operation (for one entity or
relationship) requires 1 disk access;A Write operation (for one entity or
relationship) requires 2 disk accesses (readfrom disk, change, write back to disk).
There are many other cost models depending onuse and type of DBWarehouse (OLAP - On-Line Analysis
Processing)Operational DB (OLTP - On-Line Transaction
Processing)Database Design — 8CSC343 – Introduction to Databases
Employee-Department Example
3
Database Design — 9CSC343 – Introduction to Databases
Typical Operations
Operation 1: Assign an employee to a project. Operation 2: Find an employee record, including her
department, and the projects she works for. Operation 3: Find records of employees for a
department. Operation 4: For each branch, retrieve its
departments, and for each department, retrieve thelast names of their managers, and the list of theiremployees.
Need operations and their volume/frequency
Database Design — 10CSC343 – Introduction to Databases
Workload Design During initial design and requirements
analysisEstimate operations and frequencyGross estimates at best
After database is operationalTools record actual workload
characteristics
Database Design — 11CSC343 – Introduction to Databases
AnalysisSteps
Database Design — 12CSC343 – Introduction to Databases
Analysis of RedundanciesA redundancy in a conceptual
schema corresponds to a piece ofinformation that can be derived(that is, obtained through a seriesof retrieval operations) from otherdata in the database.
An Entity-Relationship schemamay contain various forms ofredundancy.
4
Database Design — 13CSC343 – Introduction to Databases
Examples of Redundancies
Database Design — 14CSC343 – Introduction to Databases
Deciding About Redundancies The presence of a redundancy in a database
may bean advantage: a reduction in the number of
accesses necessary to obtain the derivedinformation;
a disadvantage: because of larger storagerequirements, (but, usually at negligible cost) andthe necessity to carry out additional operationsin order to keep the derived data consistent.
The decision to maintain or eliminate aredundancy is made by comparing the cost ofoperations that involve the redundantinformation and the storage needed, in thecase of presence or absence of redundancy.
Database Design — 15CSC343 – Introduction to Databases
Cost Comparison: An Example
In this schema the attributeNumberOfInhabitants isredundant.
Database Design — 16CSC343 – Introduction to Databases
Removing Generalizations The relational model does not allow direct
representation of generalizations that maybe present in an E-R diagram.
For example, here is an ER schema withgeneralizations:
5
Database Design — 17CSC343 – Introduction to Databases
Option 1
Option 2
PossibleRestructuring
s
Note!
Database Design — 18CSC343 – Introduction to Databases
...TwoMore...
Option 4 (combination)
Option 3
Database Design — 19CSC343 – Introduction to Databases
General Rules For RemovingGeneralization
Option 1 is convenient when the operations involvethe occurrences and the attributes of E0, E1 and E2more or less in the same way.
Option 2 is possible only if the generalizationsatisfies the coverage constraint (i.e., everyinstance of E0 is either an instance of E1 or E2) andis useful when there are operations that apply onlyto occurrences of E1 or E2.
Option 3 is useful when the generalization is notcoverage-compliant and the operations refer toeither occurrences and attributes of E1 (E2) or ofE0, and therefore make distinctions between childand parent entities.
Available options can be combined (see option 4)
Database Design — 20CSC343 – Introduction to Databases
Partitioning and Merging ofEntities and Relationships
Entities and relationships of an E-R schema can bepartitioned or merged to improve the efficiency ofoperations, using the following principle:
The same criteria with those discussed forredundancies are valid in making a decision aboutthis type of restructuring.
Accesses are reduced by separatingattributes of the same concept that areaccessed by different operations andby merging attributes of differentconcepts that are accessed by thesame operations.
6
Database Design — 21CSC343 – Introduction to Databases
Example of Partitioning
Recall this is a weak entity
Database Design — 22CSC343 – Introduction to Databases
Deletion of Multi-Valued Attribute
Database Design — 23CSC343 – Introduction to Databases
Merging Entities
Two candidate keys
Database Design — 24CSC343 – Introduction to Databases
Partitioning of a Relationship
Suppose thatcompositionrepresentscurrent and
pastcompositions
of a team
Are these equivalent?
7
Database Design — 25CSC343 – Introduction to Databases
Selecting a Primary Key Every relation must have a unique primary
key. The criteria for this decision are as follows:
Attributes with null values cannot form primarykeys;
One/few attributes is preferable to manyattributes;
Internal key preferable to external ones (weakentity);
A key that is used by many operations toaccess the instances of an entity is preferableto others.
At this stage, if none of the candidate keyssatisfies the above requirements, it may bebest to introduce a new attribute (e.g., socialinsurance #, student #,…)
Database Design — 26CSC343 – Introduction to Databases
Translation into a Logical Schema
The second step of logical design consists of atranslation between different data models.
Starting from an E-R schema, an equivalentrelational schema is constructed. By“equivalent”, we mean a schema capable ofrepresenting the same information.
We will deal with the translation problemsystematically, beginning with thefundamental case, that of entities linked bymany-to-many relationships.
Database Design — 27CSC343 – Introduction to Databases
Many-to-Many Relationships
Employee(Number, Surname, Salary)Project(Code, Name, Budget)
Participation(Number, Code, StartDate)
Database Design — 28CSC343 – Introduction to Databases
Many-to-Many RecursiveRelationships
Product(Code, Name, Cost)Composition(Part, SubPart, Quantity)
8
Database Design — 29CSC343 – Introduction to Databases
Ternary Relationships
Supplier(SupplierID, SupplierName)Product(Code, Type)
Department(Name, Telephone)Supply(Supplier, Product, Department, Quantity)
Database Design — 30CSC343 – Introduction to Databases
One-to-Many Relationships
Player(Surname, DateOfBirth, Position)Team(Name, Town, TeamColours)
Contract(PlayerSurname, PlayerDateOfBirth,Team, Salary)
OR
Player(Surname, DateOfBirth, Position,TeamName, Salary)
Team(Name, Town, TeamColours)
Database Design — 31CSC343 – Introduction to Databases
Weak Entities
Student(RegistrationNumber, University,Surname, EnrolmentYear)
University(Name, Town, Address)
Database Design — 32CSC343 – Introduction to Databases
One-to-One Relationships
Head(Number, Name, Salary, Department,StartDate)
Department(Name, Telephone, Branch)OR
Head(Number, Name, Salary, StartDate)Department(Name, Telephone,
HeadNumber, Branch)
9
Database Design — 33CSC343 – Introduction to Databases
Optional One-to-OneRelationships
Employee(Number, Name, Salary)Department(Name, Telephone, Branch, Head,
StartDate)Or, if both entities are optional
Employee(Number, Name, Salary)Department(Name, Telephone, Branch)
Management(Head, Department, StartDate)
Database Design — 34CSC343 – Introduction to Databases
A Sample ER Schema
Database Design — 35CSC343 – Introduction to Databases
Entities with Internal Identifiers
E3(A31, A32)E4(A41, A42)E5(A51, A52)
E6(A61, A62, A63)E3
E4
E5 E6
Database Design — 36CSC343 – Introduction to Databases
1-1 and Optional 1-1 Relationships
E5 E6
E3
E4
R3
R4
R5
E5(A51, A52, A61R3, A62R3, AR3, A61R4,A62R4, A61R5, A62R5, AR5)
1-1 or optional 1-1relationships can
lead to messytransformations
10
Database Design — 37CSC343 – Introduction to Databases
Weak Entities
E5 E6
E3
E4
R3
R4
R5
R1
R6
E1
E2E1(A11, A51, A12)
E2(A21, A11, A51, A22)
Database Design — 38CSC343 – Introduction to Databases
Many-to-Many Relationships
E5E1
E2
E3
E4
E6
R3
R4
R5
R1
R6
R2
R2(A21, A11, A51, A31, A41, AR21, AR22)
Database Design — 39CSC343 – Introduction to Databases
Homework - Fill in a Translation
E1(A11, A51, A12)E2(A21, A11, A51, A22)
E3(A31, A32)E4(A41,A42)
E5(A51, A52, A61R3, A62R3, AR3, A61R4,A62R4, A61R5, A62R5, AR5)
E6(A61, A62, A63)R2(A21, A11, A51, A31, A41, AR21, AR22)
Database Design — 40CSC343 – Introduction to Databases
Summary of Transformation Rules
11
Database Design — 41CSC343 – Introduction to Databases
...More Rules...
Database Design — 42CSC343 – Introduction to Databases
…Even More Rules...
Database Design — 43CSC343 – Introduction to Databases
…and the Last One...
Database Design — 44CSC343 – Introduction to Databases
The Training Company Revisited
12
Database Design — 45CSC343 – Introduction to Databases
Logical Design Using CASE Tools The logical design phase is partially supported by
database design tools: the translation to the relational model is carried out
by such tools semi-automatically; the restructuring step is difficult to automate and
CASE tools provide little or no support for it. Most commercial CASE tools will generate automatically
SQL code for the creation of the database. Some tools allow direct connection with a DBMS and
can construct the corresponding databaseautomatically.
[CASE = Computer-Aided Software Engineering]
Database Design — 46CSC343 – Introduction to Databases
LogicalDesign with aCASETool