CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 176 Database Systems I The...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 176 Database Systems I The...
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1
Database Systems I
The Entity-Relationship Model
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 2
Overview of Database Development
Requirements Analysis• What data are to be stored in the enterprise?• What are the required applications?• What are the most important operations?
High-level database design• What are the entities and relationships in the
enterprise?• What information about these entities and
relationships should we store in the database?• What are the integrity constraints or business
rules that hold?
ER model or UML to represent high-level design
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 3
Overview of Database DevelopmentConceptual database design• What data model to implement the DBS?
E.g., relational data model• Map the high-level design (e.g., ER diagram) to a
(conceptual) database schema of the chosen data model.
Physical database design• What DBMS to use?• What are the typical workloads of the DBS?• Build indexes to support efficient query processing.• What redesign of the conceptual database schema
is necessary from the point of view of efficient implementation?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 4
Overview of Database Development
Requirements Analysis / Ideas
High-Level Database Design
Conceptual Database Design / Relational Database Schema
Physical Database Design / Relational DBMS
Similar to software development
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 5
Entity-Relationship ModelShort: ER model.A lot of similarities with other modeling languages such as UML.Concepts • Entities / Entity sets,• Attributes,• Relationships/ Relationship sets, and• Constraints.
Offers more modeling concepts than the relational data model (which only offers relations).Closer to the way in which people think.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 6
Entity-Relationship Diagrams
An Entity-Relationship diagram (ER diagram) is a graph with nodes representing entity sets, attributes and relationship sets.Entity sets denoted by rectangles.Attributes denoted by ovals.Relationship sets denoted by diamonds.Edges (lines) connect entity sets to their attributes and relationship sets to their entity sets.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 7
Entities and Entity SetsEntity: Real-world object distinguishable from other objects, e.g. employee Miller. Entity can be physical or abstract object.An entity is associated with attributes describing its properties. Attribute values are atomic, e.g. strings, integer or real numbers.Some variations of the ER model support structured attributes.Entity set: A collection of similar entities. E.g., all employees.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 8
Entities and Entity Sets
All entities in an entity set have the same set of attributes. (At least, for the moment!)Each entity set has a key, i.e. a minimal set of attributes to uniquely identify an entity of this set. Key attributes are underlined.Each attribute has a domain, i.e. a set of all possible attribute values.
Employees
ssnname
age
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 9
Entities and Entity Sets
A key must be unique across all possible (not just the current) entities of its set.A key can consist of more than one attribute.There can be more than one key for a given entity set, but we choose one (primary key) for the ER diagram.
Employees
firstnamelastname
birthdate
salary
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 10
Relationships and Relationship SetsRelationship: Association among two or more entities. E.g., Miller works in Pharmacy department.Relationship set: Collection of similar relationships among two or more entity sets.
age
dname
budgetdid
name
Works_In DepartmentsEmployees
ssn
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 11
Relationships and Relationship SetsAn n-ary relationship set R relates n entity sets E1 ... En.Each relationship in R involves entities e1 E1, ..., en En.Binary relationship sets most common.Same entity set can participate in different relationship sets, or in different “roles” in same set. Reports_To
age
name
Employees
subor-dinate
super-visor
ssn
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 12
Relationships and Relationship Sets
Relationship sets can also have attributes.Useful for properties that cannot reasonably be associated with one of the participating entity sets.
age
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 13
Instances of an ER Diagram
Entity set contains a set of entities. Each entity has one value for each of its attributes. No duplicate instances.
ssn name age
12345678 “John Miller”
30
14789632 “Paul Li” 25
. . . . . . . . .
Employees
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 14
Instances of an ER Diagram
Relationship set contains a set (no duplicates!) of relationships, each relating a set of entities, one from each of the participating entity sets.Components are entities, not attribute values.
Employee (ssn) Department (did)
12345678 1
14789632 1
56756322 2
. . . . . .
Works_In
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 15
Relationships and Relationship Sets
Multiway relationship sets (n > 2) are used whenever binary relationships cannot capture the application semantics.
TasksWorks_For
name
Employees
ssn age
Projects
pid pbudget
description
tid
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 16
Relationships and Relationship Sets
Works_For
name
Employees
ssn age
Projects
pid pbudget
Employee (ssn)
Tasks (tid) Project (pid)
12345678 1000 101
12345678 1500 106
56756322 1500 106
. . . . . . . . .
Works_For
Tasks
descriptiontid
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 17
Multiplicity of Relationships
An employee can work in many departments; a dept can have many employees.
Each dept has at most one manager, who may manage several (many) departments.
dname
budgetdid
since
age
name
ssn
ManagesEmployees Departments
age
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 18
Multiplicity of RelationshipsThe different types of (binary) relationships from a multiplicity point of view:• One to one • One to many• Many to one • Many to many
many-to-manyone-to-one one-to-many many-to-one
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 19
Key Constraints
A key constraint on a relationship set specifies that the marked entity set participates in at most one relationship of this relationship set.Entity set is marked with an arrow.
dname
budgetdid
since
age
name
ssn
ManagesEmployees Departments
Key constraint
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 20
Participation ConstraintsA participation constraint on a relationship set specifies that the marked entity set participates in at least one relationship of this relationship set.Entity set is marked with a bold line.
age
name dnamebudgetdid
sincename dname
budgetdid
since
Manages
since
DepartmentsEmployees
ssn
Works_In
Participationconstraint
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 21
Weak EntitiesA weak entity exists only in the context of another (owner) entity.The weak entity can be identified uniquely only by considering the primary key of the owner and its own partial key.• Owner entity set and weak entity set must participate in a one-to-many
relationship set (one owner, many weak entities).• Weak entity set must have total participation in this supporting
relationship set.
age
name
agename
DependentsEmployees
ssn
Policy
cost
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 22
Subclasses Sometimes, an entity set contains some
entities that do share many, but not all properties with the entity set. In this case, we want to define class (entity set) hierarchies.
A ISA B: every A entity is also considered to be a B entity. A specializes B, B generalizes A.
A is called subclass, B is called superclass.
A subclass inherits the attributes of a superclass, and may define additional attributes.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 23
Subclasses
Contract_Emps
namessn
Employees
age
hourly_wages
ISA
Hourly_Emps
contractid
hours_worked
Hourly_Emps and Contract_Emps inherit the ssn (key!), name and age attributes from Employees.
They define additional attributes hourly_wages, hours_worked and contractid, resp.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 24
SubclassesOverlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity?
(Hourly_Emps OVERLAPS Contract_Emps)
Covering constraints: Does every Employees entity have to be either an Hourly_Emps or a Contract_Emps entity?
Hourly_Emps AND Contract_Emps COVER Employees
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 25
Subclasses
There are several good reasons for using ISA
relationships and subclasses: • Do not have to redefine all the attributes.• Can add descriptive attributes specific to
a subclass.
• To identify entitity sets that participate in a relationship set as precisely as possible.
ISA relationships form a tree structure (taxonomy) with one entity set serving as root.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 26
Design Principles
Faithfulness• Design must be faithful to the specification /
reality.• Relevant aspects of reality must be
represented in the model.
Avoiding redundancy• Redundant representation blows up ER
diagram and makes it harder to understand.• Redundant representation wastes storage.• Redundancy may lead to inconsistencies in
the database.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 27
Design Principles
Keep it simple• The simpler, the easier to understand for some
(external) reader of the ER diagrams.• Avoid introducing more elements than
necessary.• If possible, prefer attributes over entity sets
and relationship sets.
Formulate constraints as far as possible• A lot of data semantics can (and should) be
captured.• But some constraints cannot be captured in ER
diagrams.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 28
High-Level Design With ER ModelMajor design choices• Should a concept be modeled as an
entity or an attribute?• Should a concept be modeled as an
entity or a relationship?• What relationships to use: binary or
ternary?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 29
Entity vs. AttributeShould address be an attribute of Employees or an entity (connected to Employees by a relationship)?Depends upon the use we want to make of address information, and the semantics of the data:
If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued). If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic).
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 30
Entity vs. Attribute
Works_In2 does not allow an employee to work in the same department for two or more periods (why?).
We want to record several values of the descriptive attributes for each instance of this relationship.
name
Employees
ssn lot
Works_In2
from todname
budgetdid
Departments
dnamebudgetdid
name
Departments
ssn lot
Employees Works_In3
Durationfrom to
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 31
Entity vs. Relationship
This ER diagram o.k. if a manager gets a separate discretionary budget for each dept.But what if a manager gets a discretionary budget that covers all managed depts?• Redundancy of dbudget, which is stored for
each dept managed by the manager.• Misleading: suggests dbudget tied to
managed dept.
Manages2
name dnamebudgetdid
Employees Departments
ssn lot
dbudgetsince
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 32
Entity vs. Relationship
What about this diagram?
The following ER diagram is more appropriate and avoids the above problems!
Employees
since
name dnamebudgetdid
Departments
ssn lot
Mgr_Appts
Manages3
dbudgetapptnum
Manages2
name dnamebudgetdid
Employees Departments
ssn
lotdbudget
since
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 33
Binary vs. Ternary Relationships
If each policy is owned by just one employee:• Key constraint on Policies would mean policy
can only cover 1 dependent!• Bad design!
agepname
DependentsCovers
name
Employees
ssn lot
Policies
policyid cost
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 34
Binary vs. Ternary Relationships
This diagram is a better design.
What are the additional constraints in this diagram?
Beneficiary
agepname
Dependents
policyid cost
Policies
Purchaser
name
Employees
ssn lot
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 35
Binary vs. Ternary Relationships
Previous example illustrated a case when two binary relationships were better than one ternary relationship.An example in the other direction: a ternary relation Contracts relates entity sets Parts, Departments and Suppliers, and has descriptive attribute qty. No combination of binary relationships is an adequate substitute:• S “can-supply” P, D “needs” P, and D “deals-
with” S does not imply that D has agreed to buy P from S.
• How do we record qty?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 36
How to represent• Entity sets,• Relationship sets,• Attributes,• Key and participation constraints,• Subclasses,• Weak entity sets. . . ?
Conceptual Design: ER to Relational
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 37
Entity Sets
Entity sets are translated to tables.
CREATE TABLE Employees (ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn));
Employees
ssnname
lot
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 38
Relationship SetsRelationship sets are also translated to tables.• Keys for each
participating entity set (as foreign keys).
The combination of these keys forms a superkey for the table.
• All descriptive attributesof the relationship set.
CREATE TABLE Works_In( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments);
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 39
Key Constraints
Each dept has at most one manager, according to the key constraint on Manages.
Translation to relational model?
many-to-manyone-to-one one-to-many many-to-one
dname
budgetdid
since
lot
name
ssn
ManagesEmployees Departments
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 40
Key Constraints
Map relationship set to a table:• Separate tables
for Employees and Departments.
• Note that did is the key now!
Since each department has a unique manager, we could instead combine Manages and Departments.
CREATE TABLE Manages( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)
CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, manager CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (manager) REFERENCES Employees)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 41
Participation ConstraintsWe can capture participation constraints involving one entity set in a binary relationship, using NOT NULL.In other cases, we need CHECK constraints.CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, manager CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (manager) REFERENCES Employees, ON DELETE NO ACTION)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 42
Weak Entity SetsA weak entity set can be identified uniquely only by considering the primary key of another (owner) entity set.• Owner entity set and weak entity set must
participate in a one-to-many relationship set (one owner, many weak entities).
• Weak entity set must have total participation in this identifying relationship set.
lot
name
agepname
DependentsEmployees
ssn
Policy
cost
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 43
Weak Entity SetsWeak entity set and identifying relationship set are translated into a single table.• When the owner entity is deleted, all owned
weak entities must also be deleted.CREATE TABLE Dep_Policy ( pname CHAR(20), age INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 44
Subclasses
Contract_Emps
namessn
Employees
lot
hourly_wages
ISA
Hourly_Emps
contractid
hours_worked
If we declare A ISA B, every A entity is also considered to be a B entity. Attributes of B are inherited to A.Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed)
Covering constraints: Does every Employees entity either have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 45
Subclasses
ER style translation• One table for each of the entity sets
(superclass and subclasses).• ISA relationship does not require additional
table.• All tables have the same key, i.e. the key of the
superclass.• E.g.: One table each for Employees,
Hourly_Emps and Contract_Emps.General employee attributes are recorded in Employees. For hourly emps and contract emps, extra info recorded in the respective relations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 46
Subclasses
Queries involving all employees easy, those involving just Hourly_Emps require a join to get their special attributes.
CREATE TABLE Hourly_Emps( ssn CHAR(11), hourly_wages REAL, hours_worked INTEGER, PRIMARY KEY (ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)
CREATE TABLE Employees( ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn))
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 47
SubclassesAlternative translation• Create tables for the subclasses only. These
tables have all attributes of the superclass(es) and the subclass.
• This approach is applicable only if the subclasses cover the superclass.
• E.g.:Hourly_Emps: ssn, name, lot,
hourly_wages,hours_worked.Contract_Emps: ssn, name, lot, contractid.
Queries involving all employees difficult, those on Hourly_Emps and Contract_Emps alone are easy.Only applicable, if Hourly_Emps AND Contract_Emps COVER Employees
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 48
Binary vs. Ternary Relationships
agepname
DependentsCovers
name
Employees
ssn lot
Policies
policyid cost
Beneficiary
agepname
Dependents
policyid cost
Policies
Purchaser
name
Employees
ssn lot
Bad design
Better design
If each policy is owned by just one employee:• Key constraint
on Policies would mean policy can only cover one dependent!
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 49
Binary vs. Ternary Relationships
The key constraints allow us to combine Purchaser with Policies and Beneficiary with Dependents.Participation constraints lead to NOT NULL constraints.
CREATE TABLE Policies ( policyid INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (policyid). FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)
CREATE TABLE Dependents ( pname CHAR(20), age INTEGER, policyid INTEGER NOT NULL, PRIMARY KEY (pname, policyid). FOREIGN KEY (policyid) REFERENCES Policies, ON DELETE CASCADE)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 50
SummaryHigh-level design follows requirements analysis and yields a high-level description of data to be stored. ER model popular for high-level design.• Constructs are expressive, close to the way people
think about their applications.
Basic constructs: entities, relationships, and attributes (of entities and relationships).Some additional constructs: weak entities, subclasses, and constraints.ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 51
Summary
There are guidelines to translate ER diagrams to a relational database schema. However, there are often alternatives that need to be carefully considered.Entity sets and relationship sets are all represented by relations.Some constructs of the ER model cannot be easily translated, e.g. multiple participation constraints.