CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 176 Database Systems I The...

51
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1 Database Systems I The Entity-Relationship Model
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 176 Database Systems I The...

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1

Database Systems I

The Entity-Relationship Model

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 2

Overview of Database Development

Requirements Analysis• What data are to be stored in the enterprise?• What are the required applications?• What are the most important operations?

High-level database design• What are the entities and relationships in the

enterprise?• What information about these entities and

relationships should we store in the database?• What are the integrity constraints or business

rules that hold?

ER model or UML to represent high-level design

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 3

Overview of Database DevelopmentConceptual database design• What data model to implement the DBS?

E.g., relational data model• Map the high-level design (e.g., ER diagram) to a

(conceptual) database schema of the chosen data model.

Physical database design• What DBMS to use?• What are the typical workloads of the DBS?• Build indexes to support efficient query processing.• What redesign of the conceptual database schema

is necessary from the point of view of efficient implementation?

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 4

Overview of Database Development

Requirements Analysis / Ideas

High-Level Database Design

Conceptual Database Design / Relational Database Schema

Physical Database Design / Relational DBMS

Similar to software development

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 5

Entity-Relationship ModelShort: ER model.A lot of similarities with other modeling languages such as UML.Concepts • Entities / Entity sets,• Attributes,• Relationships/ Relationship sets, and• Constraints.

Offers more modeling concepts than the relational data model (which only offers relations).Closer to the way in which people think.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 6

Entity-Relationship Diagrams

An Entity-Relationship diagram (ER diagram) is a graph with nodes representing entity sets, attributes and relationship sets.Entity sets denoted by rectangles.Attributes denoted by ovals.Relationship sets denoted by diamonds.Edges (lines) connect entity sets to their attributes and relationship sets to their entity sets.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 7

Entities and Entity SetsEntity: Real-world object distinguishable from other objects, e.g. employee Miller. Entity can be physical or abstract object.An entity is associated with attributes describing its properties. Attribute values are atomic, e.g. strings, integer or real numbers.Some variations of the ER model support structured attributes.Entity set: A collection of similar entities. E.g., all employees.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 8

Entities and Entity Sets

All entities in an entity set have the same set of attributes. (At least, for the moment!)Each entity set has a key, i.e. a minimal set of attributes to uniquely identify an entity of this set. Key attributes are underlined.Each attribute has a domain, i.e. a set of all possible attribute values.

Employees

ssnname

age

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 9

Entities and Entity Sets

A key must be unique across all possible (not just the current) entities of its set.A key can consist of more than one attribute.There can be more than one key for a given entity set, but we choose one (primary key) for the ER diagram.

Employees

firstnamelastname

birthdate

salary

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 10

Relationships and Relationship SetsRelationship: Association among two or more entities. E.g., Miller works in Pharmacy department.Relationship set: Collection of similar relationships among two or more entity sets.

age

dname

budgetdid

name

Works_In DepartmentsEmployees

ssn

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 11

Relationships and Relationship SetsAn n-ary relationship set R relates n entity sets E1 ... En.Each relationship in R involves entities e1 E1, ..., en En.Binary relationship sets most common.Same entity set can participate in different relationship sets, or in different “roles” in same set. Reports_To

age

name

Employees

subor-dinate

super-visor

ssn

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 12

Relationships and Relationship Sets

Relationship sets can also have attributes.Useful for properties that cannot reasonably be associated with one of the participating entity sets.

age

dname

budgetdid

sincename

Works_In DepartmentsEmployees

ssn

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 13

Instances of an ER Diagram

Entity set contains a set of entities. Each entity has one value for each of its attributes. No duplicate instances.

ssn name age

12345678 “John Miller”

30

14789632 “Paul Li” 25

. . . . . . . . .

Employees

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 14

Instances of an ER Diagram

Relationship set contains a set (no duplicates!) of relationships, each relating a set of entities, one from each of the participating entity sets.Components are entities, not attribute values.

Employee (ssn) Department (did)

12345678 1

14789632 1

56756322 2

. . . . . .

Works_In

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 15

Relationships and Relationship Sets

Multiway relationship sets (n > 2) are used whenever binary relationships cannot capture the application semantics.

TasksWorks_For

name

Employees

ssn age

Projects

pid pbudget

description

tid

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 16

Relationships and Relationship Sets

Works_For

name

Employees

ssn age

Projects

pid pbudget

Employee (ssn)

Tasks (tid) Project (pid)

12345678 1000 101

12345678 1500 106

56756322 1500 106

. . . . . . . . .

Works_For

Tasks

descriptiontid

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 17

Multiplicity of Relationships

An employee can work in many departments; a dept can have many employees.

Each dept has at most one manager, who may manage several (many) departments.

dname

budgetdid

since

age

name

ssn

ManagesEmployees Departments

age

dname

budgetdid

sincename

Works_In DepartmentsEmployees

ssn

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 18

Multiplicity of RelationshipsThe different types of (binary) relationships from a multiplicity point of view:• One to one • One to many• Many to one • Many to many

many-to-manyone-to-one one-to-many many-to-one

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 19

Key Constraints

A key constraint on a relationship set specifies that the marked entity set participates in at most one relationship of this relationship set.Entity set is marked with an arrow.

dname

budgetdid

since

age

name

ssn

ManagesEmployees Departments

Key constraint

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 20

Participation ConstraintsA participation constraint on a relationship set specifies that the marked entity set participates in at least one relationship of this relationship set.Entity set is marked with a bold line.

age

name dnamebudgetdid

sincename dname

budgetdid

since

Manages

since

DepartmentsEmployees

ssn

Works_In

Participationconstraint

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 21

Weak EntitiesA weak entity exists only in the context of another (owner) entity.The weak entity can be identified uniquely only by considering the primary key of the owner and its own partial key.• Owner entity set and weak entity set must participate in a one-to-many

relationship set (one owner, many weak entities).• Weak entity set must have total participation in this supporting

relationship set.

age

name

agename

DependentsEmployees

ssn

Policy

cost

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 22

Subclasses Sometimes, an entity set contains some

entities that do share many, but not all properties with the entity set. In this case, we want to define class (entity set) hierarchies.

A ISA B: every A entity is also considered to be a B entity. A specializes B, B generalizes A.

A is called subclass, B is called superclass.

A subclass inherits the attributes of a superclass, and may define additional attributes.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 23

Subclasses

Contract_Emps

namessn

Employees

age

hourly_wages

ISA

Hourly_Emps

contractid

hours_worked

Hourly_Emps and Contract_Emps inherit the ssn (key!), name and age attributes from Employees.

They define additional attributes hourly_wages, hours_worked and contractid, resp.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 24

SubclassesOverlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity?

(Hourly_Emps OVERLAPS Contract_Emps)

Covering constraints: Does every Employees entity have to be either an Hourly_Emps or a Contract_Emps entity?

Hourly_Emps AND Contract_Emps COVER Employees

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 25

Subclasses

There are several good reasons for using ISA

relationships and subclasses: • Do not have to redefine all the attributes.• Can add descriptive attributes specific to

a subclass.

• To identify entitity sets that participate in a relationship set as precisely as possible.

ISA relationships form a tree structure (taxonomy) with one entity set serving as root.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 26

Design Principles

Faithfulness• Design must be faithful to the specification /

reality.• Relevant aspects of reality must be

represented in the model.

Avoiding redundancy• Redundant representation blows up ER

diagram and makes it harder to understand.• Redundant representation wastes storage.• Redundancy may lead to inconsistencies in

the database.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 27

Design Principles

Keep it simple• The simpler, the easier to understand for some

(external) reader of the ER diagrams.• Avoid introducing more elements than

necessary.• If possible, prefer attributes over entity sets

and relationship sets.

Formulate constraints as far as possible• A lot of data semantics can (and should) be

captured.• But some constraints cannot be captured in ER

diagrams.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 28

High-Level Design With ER ModelMajor design choices• Should a concept be modeled as an

entity or an attribute?• Should a concept be modeled as an

entity or a relationship?• What relationships to use: binary or

ternary?

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 29

Entity vs. AttributeShould address be an attribute of Employees or an entity (connected to Employees by a relationship)?Depends upon the use we want to make of address information, and the semantics of the data:

If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued). If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic).

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 30

Entity vs. Attribute

Works_In2 does not allow an employee to work in the same department for two or more periods (why?).

We want to record several values of the descriptive attributes for each instance of this relationship.

name

Employees

ssn lot

Works_In2

from todname

budgetdid

Departments

dnamebudgetdid

name

Departments

ssn lot

Employees Works_In3

Durationfrom to

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 31

Entity vs. Relationship

This ER diagram o.k. if a manager gets a separate discretionary budget for each dept.But what if a manager gets a discretionary budget that covers all managed depts?• Redundancy of dbudget, which is stored for

each dept managed by the manager.• Misleading: suggests dbudget tied to

managed dept.

Manages2

name dnamebudgetdid

Employees Departments

ssn lot

dbudgetsince

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 32

Entity vs. Relationship

What about this diagram?

The following ER diagram is more appropriate and avoids the above problems!

Employees

since

name dnamebudgetdid

Departments

ssn lot

Mgr_Appts

Manages3

dbudgetapptnum

Manages2

name dnamebudgetdid

Employees Departments

ssn

lotdbudget

since

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 33

Binary vs. Ternary Relationships

If each policy is owned by just one employee:• Key constraint on Policies would mean policy

can only cover 1 dependent!• Bad design!

agepname

DependentsCovers

name

Employees

ssn lot

Policies

policyid cost

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 34

Binary vs. Ternary Relationships

This diagram is a better design.

What are the additional constraints in this diagram?

Beneficiary

agepname

Dependents

policyid cost

Policies

Purchaser

name

Employees

ssn lot

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 35

Binary vs. Ternary Relationships

Previous example illustrated a case when two binary relationships were better than one ternary relationship.An example in the other direction: a ternary relation Contracts relates entity sets Parts, Departments and Suppliers, and has descriptive attribute qty. No combination of binary relationships is an adequate substitute:• S “can-supply” P, D “needs” P, and D “deals-

with” S does not imply that D has agreed to buy P from S.

• How do we record qty?

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 36

How to represent• Entity sets,• Relationship sets,• Attributes,• Key and participation constraints,• Subclasses,• Weak entity sets. . . ?

Conceptual Design: ER to Relational

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 37

Entity Sets

Entity sets are translated to tables.

CREATE TABLE Employees (ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn));

Employees

ssnname

lot

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 38

Relationship SetsRelationship sets are also translated to tables.• Keys for each

participating entity set (as foreign keys).

The combination of these keys forms a superkey for the table.

• All descriptive attributesof the relationship set.

CREATE TABLE Works_In( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments);

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 39

Key Constraints

Each dept has at most one manager, according to the key constraint on Manages.

Translation to relational model?

many-to-manyone-to-one one-to-many many-to-one

dname

budgetdid

since

lot

name

ssn

ManagesEmployees Departments

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 40

Key Constraints

Map relationship set to a table:• Separate tables

for Employees and Departments.

• Note that did is the key now!

Since each department has a unique manager, we could instead combine Manages and Departments.

CREATE TABLE Manages( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, manager CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (manager) REFERENCES Employees)

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 41

Participation ConstraintsWe can capture participation constraints involving one entity set in a binary relationship, using NOT NULL.In other cases, we need CHECK constraints.CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, manager CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (manager) REFERENCES Employees, ON DELETE NO ACTION)

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 42

Weak Entity SetsA weak entity set can be identified uniquely only by considering the primary key of another (owner) entity set.• Owner entity set and weak entity set must

participate in a one-to-many relationship set (one owner, many weak entities).

• Weak entity set must have total participation in this identifying relationship set.

lot

name

agepname

DependentsEmployees

ssn

Policy

cost

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 43

Weak Entity SetsWeak entity set and identifying relationship set are translated into a single table.• When the owner entity is deleted, all owned

weak entities must also be deleted.CREATE TABLE Dep_Policy ( pname CHAR(20), age INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 44

Subclasses

Contract_Emps

namessn

Employees

lot

hourly_wages

ISA

Hourly_Emps

contractid

hours_worked

If we declare A ISA B, every A entity is also considered to be a B entity. Attributes of B are inherited to A.Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed)

Covering constraints: Does every Employees entity either have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 45

Subclasses

ER style translation• One table for each of the entity sets

(superclass and subclasses).• ISA relationship does not require additional

table.• All tables have the same key, i.e. the key of the

superclass.• E.g.: One table each for Employees,

Hourly_Emps and Contract_Emps.General employee attributes are recorded in Employees. For hourly emps and contract emps, extra info recorded in the respective relations.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 46

Subclasses

Queries involving all employees easy, those involving just Hourly_Emps require a join to get their special attributes.

CREATE TABLE Hourly_Emps( ssn CHAR(11), hourly_wages REAL, hours_worked INTEGER, PRIMARY KEY (ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

CREATE TABLE Employees( ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn))

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 47

SubclassesAlternative translation• Create tables for the subclasses only. These

tables have all attributes of the superclass(es) and the subclass.

• This approach is applicable only if the subclasses cover the superclass.

• E.g.:Hourly_Emps: ssn, name, lot,

hourly_wages,hours_worked.Contract_Emps: ssn, name, lot, contractid.

Queries involving all employees difficult, those on Hourly_Emps and Contract_Emps alone are easy.Only applicable, if Hourly_Emps AND Contract_Emps COVER Employees

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 48

Binary vs. Ternary Relationships

agepname

DependentsCovers

name

Employees

ssn lot

Policies

policyid cost

Beneficiary

agepname

Dependents

policyid cost

Policies

Purchaser

name

Employees

ssn lot

Bad design

Better design

If each policy is owned by just one employee:• Key constraint

on Policies would mean policy can only cover one dependent!

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 49

Binary vs. Ternary Relationships

The key constraints allow us to combine Purchaser with Policies and Beneficiary with Dependents.Participation constraints lead to NOT NULL constraints.

CREATE TABLE Policies ( policyid INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (policyid). FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

CREATE TABLE Dependents ( pname CHAR(20), age INTEGER, policyid INTEGER NOT NULL, PRIMARY KEY (pname, policyid). FOREIGN KEY (policyid) REFERENCES Policies, ON DELETE CASCADE)

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 50

SummaryHigh-level design follows requirements analysis and yields a high-level description of data to be stored. ER model popular for high-level design.• Constructs are expressive, close to the way people

think about their applications.

Basic constructs: entities, relationships, and attributes (of entities and relationships).Some additional constructs: weak entities, subclasses, and constraints.ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 51

Summary

There are guidelines to translate ER diagrams to a relational database schema. However, there are often alternatives that need to be carefully considered.Entity sets and relationship sets are all represented by relations.Some constructs of the ER model cannot be easily translated, e.g. multiple participation constraints.