Database Management Systems (CS644) -...
Transcript of Database Management Systems (CS644) -...
Chapter Four
Conceptual Database Design
(ER modeling)
Agenda (Chapter four)
Overview-database design
Conceptual Design (E-R Modeling)
Structural Constraints
EER- Generalization and Specialization
Reducing E-R Model to Table (logical Design)
Logical Database Design Process of Normalization
Objective
Learn skills and methods of Modeling organizational data
using Entity relationship diagram
Implementing the rules of relational data model through the
process of normalization
Overview-database design Database design is the process of coming up with different
kinds of specification for the data to be stored in the database
The database design part is one of the middle phases we have
in information systems development where the system uses a
database approach.
Cont…
Design is the part on which we would be engaged to
describe
how the data should be perceived at different
levels and
finally how it is going to be stored in a
computer system
Cont…
Information System development with Database
application consists of several tasks which include:
Planning of Information systems Design
Requirements Analysis,
Database Design (Conceptual, Logical and Physical
Design)
Interface, program etc designs are also there
Implementation
Operation and Support
Cont… From these different phases, the prime interest of this
courses will be the Database Design part which is again sub divided into other three sub-phases.
These sub-phases are:
Conceptual Design
Logical Design, and
Physical Design
Cont… In general, one has to go back and forth between these
tasks to refine a database design, and decisions in one task can influence the choices in another task.
In developing a good design, one should answer such questions as:
What are the relevant Entities for the Organization What are the important features of each Entity What are the important Relationships What are the important queries from the user What are the other requirements of the Organization and the
Users
Conceptual Database Design Conceptual design revolves around discovering and analyzing
organizational and user data requirements
The important activities are to identify Entities
Attributes
Relationships
Constraints
And based on these components develop the ER model using ER diagram components
Cont…
Designing conceptual model for the database is not a one
linear process but an iterative activity where the design is
refined again and again.
Important steps then could be
1. Identification of componentsi. Entities
ii. Attributes
iii. Relationship
iv. Constraints
2. Use of notations to model
3. Draw, Review, Refine and Validate
Cont…
Before working on the conceptual design of the
database, one has to know and answer the
following basic questions
What are the entities and relationships in the enterprise?
What information about these entities and relationships should we
store in the database?
What are the integrity constraints that hold?
Constraints on each data with respect to update, retrieval and store
Represent this information pictorially in ER diagrams, then map
ER diagram into a relational schema.
Modeling tools
Be professional- use the right tool symbols are languages
There are many ER diagramming tools
Rational Rose
Microsoft Visio
Oracle Designer
Power Designer
E-R Model Constructs (recall)
Entities:
Entity instance–person, place, object, event, concept (often corresponds to a row in a table)
Entity Type–collection of entities (often corresponds to a table)
Relationships:
Relationship instance–link between entities (corresponds to primary key-foreign key equivalencies in related tables)
Relationship type–category of relationship…link between entity types
Attribute–property or characteristic of an entity or relationship type (often corresponds to a field in a table)
How to Find Components of ER… To identify the entities, attributes, relationships, and
constraints on the data, there are different set of methods used during the analysis phase.
These include information gathered through Interviewing end users individually and in a group
Questionnaire survey
Direct observation
Examining different documents
Generally understand the business rules
Entity; Attribute; Relationship
Constructs Identification-ERD
Detailed Steps in Creating an ERD Here are the steps you may follow to create an entity-
relationship diagram.
1. Identify Entities: These are typically the nouns and noun-phrases in the descriptive data
produced in your analysis. Do not include entities that are irrelevant to your domain.
2. Find Relationships: Discover the semantic relationships between entities. These are usually the verbs that connect the nouns. Not all relationships are this blatant, you may have to discover some
on your own. The easiest way to see all possible relationships is to build a table with
the entities across the columns and down the rows, and fill in those cells where a relationship exists between entities.
Cont…3. Draw Rough ERD
Draw the entities and relationships that you have discovered.
4. Fill in Cardinality Determine the cardinality of the relationships. You may want to
decide on cardinality when you are creating a relationship table.
5. Define Primary Keys Identify attribute(s) that uniquely identify each occurrence of
that entity.
6. Draw Key-Based ERD Now add them (the primary key attributes) to your ERD. Revise your diagram to eliminate many-to-many relationships,
and tag all foreign keys .
Cont…7. Identify Attributes
Identify all entity characteristics relevant to the domain being analyzed.
8. Map Attributes Determine to which entity each characteristic belongs.
Do not duplicate attributes across entities.
9. Draw fully attributed ERD Now add these attributes.
The diagram may need to be modified to accommodate necessary new entities.
10. Check Results Is the diagram a consistent and complete representation of the
domain.
Cont…
In this all tasks
Keep in mind –Business rule
Business functions are expressed in terms of
business rules
Example
business function-to-data matching matrix
Sample E-R Diagram- what it looks like
What Should an Entity Be?
SHOULD BE:
An object that will have many instances in the database
An object that will be composed of multiple attributes
An object that we are trying to model
SHOULD NOT BE:
A user of the database system (the one who only uses the system
and his/her information is not important for the process)
Sometimes user data might also be important to capture. For instance, in
a financial system, it is important to register the employee who prepared
the payment or approved a transaction……issue of accountability)
An output of the database system (e.g., a report, display)
Inappropriate
entities
System
user
System
output
Example of inappropriate entities
Appropriate
entities
Attributes
Attribute–property or characteristic of an entity or relationship type
Classifications of attributes:
1. Required versus Optional Attributes
2. Simple versus Composite Attribute
3. Single-Valued versus Multivalued Attribute
4. Stored versus Derived Attributes
5. Entity Identifier Attributes
Identifiers (Keys)
Identifier (Key)–An attribute (or combination of attributes) that uniquely identifies individual instances of an entity type
Simple versus Composite Identifier
Candidate Key– an attribute that could be a key…satisfies the requirements for being an identifier
Will not be null
A composite attribute
An attribute
broken into
component parts
Entity with multivalued attribute (Skill)
and derived attribute (Years_Employed)
Multivaluedan employee can have more than one skill
Derivedfrom date employed and current date
Notation, Notation, Notation, Notation, Notation, Notation
Simple and composite identifier attributes
The identifier is boldfaced and underlined
More on Relationships
Relationship Types vs. Relationship Instances The relationship type is modeled as lines between
entity types…the instance is between specific entity instances
Relationships can have attributes These describe features pertaining to the association
between the entities in the relationship
Associative Entity –combination of relationship and entity
Relationship types and instances
a) Relationship type
b) Relationship
instances
Degree of relationships
Entities of
two different
types related
to each other Entities of three
different types
related to each
other
One entity
related to
another of
the same
entity type
Examples of relationships of different degrees
a) Unary relationships
Examples of relationships of different degrees (cont.)
b) Binary relationships
Examples of relationships of different degrees (cont.)
c) Ternary relationship
Note: a relationship can have attributes of its own
Structural Constraints
(Cardinality Constraints)
Cardinality Constraints - the number of instances of one entity that can or must be associated with each instance of another entityMinimum Cardinality
If zero, then optional
If one or more, then mandatory
Maximum Cardinality The maximum number of tuples taking part in the
relationship
Cont…
One-to-One
Each entity in the relationship will have exactly one related entity
E.g.: Relationship Manages between EMPLOYEE and BRANCHThe multiplicity of the relationship is:➢One branch can have one and only one manager ➢One employee could manage either one or no branches
1..1 0..1Employee BranchManages
Peter Chen’s Notation
Cont…
One-to-Many
An entity on one side of the relationship can have many related entities, but
an entity on the other side will have a maximum of one related entity
E.g.: Relationship Leads between EMPLOYEE and PROJECTThe multiplicity of the relationship➢One employee may Lead no, one or more project(s)➢One project is Lead by one staff
1..1 0..*Employee ProjectLeads
Peter Chen’s Notation
Cont…
Many-to-Many
Entities on both sides of the relationship can have many related
entities on the other side
E.g.: Relationship Teaches between INSTRUCTOR and COURSEThe multiplicity of the relationship➢One Instructor Teaches one or more Course(s)➢One Course Thought by no, one or more Instructor(s)
* ..0 1..*Instructor CourseTeaches
Peter Chen’s Notation
Examples of cardinality constraints
a) Mandatory cardinalities
A patient must have recorded
at least one history, and can
have many
A patient history is
recorded for one and only
one patient
Crow’s Foot Notation
Examples of cardinality constraints (cont.)
b) One optional, one mandatory
An employee can be assigned
to any number of projects, or
may not be assigned to any at
all
A project must be
assigned to at least one
employee, and may be
assigned to many
Peter Chen’s Notation
Examples of cardinality constraints (cont.)
a) Optional cardinalities
A person is married
to at most one other
person, or may not
be married at all
Peter Chen’s Notation
Entities can be related to one another in more than one way
Examples of multiple relationshipsEmployees and departments
Peter Chen’s Notation
Multivalued attributes can be represented as
relationships and entity
simple
composite
Peter Chen’s Notation
Strong entity Weak entity
Identifying relationship
Peter Chen’s Notation
Associative Entities
An Entity–which has attributes
A relationship–links entities together
When should a relationship with attributes instead be an associative entity?
All relationships for the associative entity should have many cardinality
The associative entity could have meaning independent of the other entities
The associative entity preferably has a unique identifier, and should also have other attributes
The associative entity may participate in other relationships other than the entities of the associated relationship
Ternary relationships should be converted to associative entities
Example
What if we want to keep the date the employee completed
the course
A binary relationship with an attribute
Here, the date completed attribute pertains specifically to the
employee’s completion of a course…it is an attribute of the
relationship
An associative entity (CERTIFICATE)
Associative entity is like a relationship with an attribute, but it is
also considered to be an entity in its own right.
Note that the many-to-many cardinality between entities will be
replaced by two one-to-many relationships with the associative
entity.
An associative entity – bill of materials structure
This could just be a relationship with
attributes…it’s a judgment call
Ternary relationship as an associative entity
ER Diagramming
Notation Standards
Cont…
Related diagramming convention techniques: Peter Chen notation
Crow’s Foot Notation
Bachman notation
EXPRESS
IDEF1X[4]
Martin notation
UML class diagrams
Relationship
degrees specify
number of
entity types
involved
Entity
symbols
A special entity
that is also a
relationship
Relationship
symbols
Relationship
cardinalities
specify how
many of each
entity type is
allowed
Attribute
symbols
Basic E-R notation
Symbols☼ ☼
ER Diagramming Example-1
Cont….
The University of Toronto has several departments. Each department is managed by a chair, and has at least one professor working for it. Professors must be assigned to one, but possibly more departments. At least one professor teaches each course, but a professor may be on sabbatical and not teach any course. Each course may be taught more than once by different professors. We know of the department name, the professor name, the professor employee id, the course names, the course schedule, the term/year that the course is taught, the departments the professor is assigned to, the department that offers the course.
Cont….
The University of Toronto has several departments. Each department is managed by a chair, and has at least one professor working for it. Professors must be assigned to one, but possibly more departments. At least one professor teaches each course, but a professor may be on sabbatical and not teach any course. Each course may be taught more than once by different professors. We know of the department name, the professor name, the professor employee id, the course names, the course schedule, the term/year that the course is taught, the departments the professor is assigned to, the department that offers the course.
Solution
Identify Entities
These are typically the nouns and noun-phrases in the
descriptive data produced in your analysis.
Do not include entities that are irrelevant to your domain.
The entity candidates are departments, chair, professor,
course.
Since there is only one instance of the “University of Toronto”,
which is the owner of the system, we exclude it from our
consideration.
Finding Relationships
Department Chair Professor Course
Department Managed_By Is_Assigned Offers
Chair Manages Is_a
Professor Assigned_To Is_a Teaches
Course Offered_By Taught_By
Finding Relationships
Department Chair Professor Course
Department Managed_By Is_Assigned Offers
Chair Manages Is_a
Professor Assigned_To Is_a Teaches
Course Offered_By Taught_By
Draw Rough ERDDraw the entities and relationships that you have discovered.
Department Chair
ProfessorCourse
PK PK
PK PK
Assigned
Is aOffers
Teaches
Managed By
Fill in Cardinality: Determine the cardinality of the relationships. You may
want to decide on cardinality when you are creating a relationship table.
Department Chair
ProfessorCourse
PK PK
PK PK
Assigned
Is aOffers
Teaches
Managed By
Cont…
Define Primary Keys
Identify attribute(s) that uniquely identify each occurrence of
that entity.
Possible PK
Department DeptName OR DeptID
Chair EmpID
Professor EmpID
Course CourseName OR CourseNo
Draw Key-Based ERD
Now add them (the primary key attributes) to your ERD. Revise your
diagram to
eliminate many-to-many relationships, and tag all foreign keys
Department Chair
ProfessorCourse
DeptIDPK EmpIDPK
EmpIDPKCourseNoPK
Assigned
Is aOffers
Teaches
Managed By
Cont… Identify Attributes
Identify all entity characteristics relevant to the domain being analyzed.
Excluding those keys already identified:
Schedule, Term, Professor name, Department Chair (which is an employee ID, a foreign key to Professor)
Map Attributes
Determine to which entity each characteristic belongs.
Do not duplicate attributes across entities.
If necessary, contain them in a new, related, entity. Schedule -- Prof-Course, Term -- Prof-Course, Chair – Department
Draw fully attributed ERD
Now add these attributes
The diagram may need to be modified to accommodate
necessary new entities.
Department
Chair
Professor
Course
DeptIDPK
EmpIDPK
EmpIDPK
CourseNoPK
Assigned
Is a
Offers
Teaches
Managed By
DeptName
EstablishmentYear
Location
CourseTitle
CreditHr
attribute name
attribute name
FullName
Specialization
HireDate
Finally: Review, Modify and Validate
Example 2 A company has several departments. Each department
has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number.
Draw an ER Diagram
Conceptual Database Design
(EER modeling)
Cont… Enhanced E-R (EER) Models
Object-oriented extensions to E-R model
EER is important when: Some tuples have special attribute to be described with
a relationship between two entities is partial
EER Concepts
Generalization…….Superclass/Supertype
Specialization………Subclass/Subtype
Attribute Inheritance Subtype entities inherit values of all attributes of the super-type
An instance of a subtype is also an instance of the super-type
Constraints on specialization and generalization73
Supertypes and Subtypes Subclass/Subtype
An entity type whose tuples have attributes that distinguish its members from tuples of the generalized or Superclass entities.
Subclasses can be either mutually exclusive (disjoint) or overlapping (inclusive).
A single subclass may inherit attributes from two distinct superclasses.
A mutually exclusive category/subclass is when an entity instance can be in only one of the subclasses. E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but not both.
An overlapping category/subclass is when an entity instance may be in two or more subclasses. E.g.: A PERSON who works for a university can be both EMPLOYEE and a
STUDENT at the same time.74
Supertypes and Subtypes…
Superclass /Supertype
An entity type whose tuples share common attributes.
Attributes that are shared by all entity occurrences (including
the identifier) are associated with the supertype.
Is the generalized entity
75
An instance ca not only be a member of a subclass. i.e. Every instance of a subclass is also an instance in the Superclass.
A member of a subclass is represented as a distinct database object, a distinct record that is related via foreign key attribute to its super-class.
The relationship between a subclass and a Superclass is an “IS A” or
“IS PART OF” type.
Subclass IS PART OF Superclass
Manager IS AN Employee
76
Relationship of Superclass and Subclass
Notation
• All subclasses or specializedentity sets should beconnected with thesuperclass using a line to acircle where there is a subsetsymbol indicating thedirection ofsubclass/superclassrelationship.
77
Basic notation for supertype/subtype notation
a) EER notation
78
Cont…
Attribute InheritanceAn entity that is a member of a subclass inherits
all the attributes of the entity as a member of the superclass.
The entity also inherits all the relationships in which the superclass participates.
An entity may have more than one subclass categories.
All subclasses of a superclass share a common unique identifier attribute (primary key). i.e.
79
Consider the EMPLOYEE supertype
entity shown above.
This entity can have several different
subtype entities (for example: HOURLY
and SALARIED), each with distinct
properties not shared by other subtypes.
But whether the employee is HOURLY
or SALARIED, same attributes
(EmployeeId, Name, and DateHired) are
shared.
The Supertype EMPLOYEE stores all
properties that subclasses have in
common.
And HOURLY employees have the
unique attribute Wage (hourly wage rate),
while SALARIED employees have two
unique attributes, Pension and Salary. 80
Hours
Generalization and Specialization
Generalization:The process of defining a more general
entity type from a set of more specialized entity types.
BOTTOM-UP
Specialization:The process of defining one or more
subtypes of the supertype and forming supertype/subtype
relationships.
TOP-DOWN
81
Example of generalization
a) Three entity types: CAR, TRUCK, and MOTORCYCLE
All these types of vehicles have common attributes
82
Example of generalization (cont.)
So we put
the shared
attributes in
a supertype
Note: no subtype for motorcycle, since it has no unique attributes
b) Generalization to VEHICLE supertype
83
Example of specialization
a) Entity type PART
Only applies to manufactured parts
Applies only to purchased parts
84
b) Specialization to MANUFACTURED PART and PURCHASED PART
Created 2
subtypes
Example of specialization (cont.)
85
SupplierID
Unite_Price
b) Specialization to MANUFACTURED PART and PURCHASED PART
Note: multivalued attribute was replaced by an
associative entity relationship to another entity
Created 2
subtypes
Example of specialization (cont.)
86
Relationship without Enhanced ER Model
Example of specialization (cont.)
87
88
Specialization of EMPLOYEE with Job Type
Constraints in Supertype
Completeness Constraints: Whether an
instance of a supertype must also be a member of
at least one subtype
Total Specialization Rule: Yes (double line)
Partial Specialization Rule: No (single line)
89
Constraints in Supertype The Total Specialization Rule specifies that an entity occurrence
should at least be a member of one of the subclasses.
E.g.: If we have EXTENTION and REGULAR as subclasses
of a superclass STUDENT, then it is mandatory that each
student to be either EXTENTION or REGULAR student.
Thus the participation of instances of STUDENT in
EXTENTION and REGULAR subclasses will be total.
90
Examples of completeness constraints
a) Total specialization rule
A patient must be either
an outpatient or a
resident patient
91
Constraints in Supertype The Partial Specialization Rule specifies that it is not
necessary for all entity occurrences in the superclass to be a member of one of the subclasses.
Here we have an optional participation on the specialization. E.g.: If we have MANAGER and SECRETARY as subclasses of a
superclass EMPLOYEE, then it is not the case that all employees are either manager or secretary.
Thus the participation of instances of employee in MANAGER and SECRETARY subclasses will be partial.
92
b) Partial specialization rule
A vehicle
could be a
car, a truck,
or neither
Examples of completeness constraints (cont.)
93
Constraints in Supertype
Disjointness Constraints: Whether an instance of a supertype
may simultaneously be a member of two (or more) subtypes
Disjoint Rule: An instance of the supertype can be only ONE of the
subtypes
Eg: an EMPLOYEE can either be SALARIED or PART-TIMER, but not the both at the
same time.
Overlap Rule: An instance of the supertype could be more than one of
the subtypes
Eg: EMPLOYEE working at the university can be both a STUDENT and an
EMPLOYEE at the same time.
This is diagrammed by placing either the letter "d" for disjoint
or "o" for overlapping inside the circle on the Generalization
Hierarchy portion of the E-R diagram. 94
a) Disjoint rule
Examples of disjointness constraints
A patient can either be outpatient
or resident, but not both
95
b) Overlap rule
A part may be both
purchased and
manufactured
Examples of disjointness constraints (cont.)
96
Constraints in Supertype The two types of constraints on generalization and specialization
(Disjointness and Completeness constraints) are not dependent on one another.
That is, being disjoint will not favour whether the tuples in the superclass should have Total or Partial participation for that specific specialization.
From the two types of constraints we can have four possible constraints
Disjoint ANDTotal
Disjoint AND Partial
Overlapping ANDTotal
Overlapping AND Partial
97
Constraints in Supertype/
Subtype Discriminators
Subtype Discriminator: An attribute of the supertype
whose values determine the target subtype(s)
Disjoint – a simple attribute with alternative values to indicate the
possible subtypes
Overlapping – a composite attribute whose subparts pertain to
different subtypes. Each subpart contains a boolean value to indicate
whether or not the instance belongs to the associated subtype
98
Introducing a subtype discriminator (disjoint rule)
A simple attribute with
different possible values
indicating the subtype
99
Subtype discriminator (overlap rule)
A composite
attribute with
sub-attributes
indicating “yes”
or “no” to
determine
whether it is of
each subtype
100
Example of supertype/subtype hierarchy
101
ERD-to- Relation
Starting point for Logical
Database Design
Cont…
Why ER
ER is very high level representation and is closer
to end users understanding
ER can not directly be implemented
Why Mapping
Relational Data model requires everything
be represented in a form of a table
Cont…
How
The first step is converting the conceptual design
to a form suitable for relational logical model,
which is in a form of tables- mapping
The next is Normalization (applying the rules
in relational data model)-
ER Diagrams into RelationsMapping Regular Entities to Relations/Tables
Every Entity will be a Tables
1. Simple attributes: E-R attributes map directly onto the
relation
2. Composite attributes: Use only their simple, component
attributes
3. Multivalued Attribute–Becomes a separate relation with a
foreign key taken from the superior entity
(a) CUSTOMER
entity type with
simple
attributes
Mapping a regular entity
(b) CUSTOMER relation
(a) CUSTOMER
entity type with
composite
attribute
Mapping a composite attribute
(b) CUSTOMER relation with address detail
Mapping an entity with a multivalued attribute
One–to–many relationship between original entity and new relation
(a)
Multivalued attribute becomes a separate relation with foreign key
(b)
Mapping Weak Entities
Becomes a separate relation with a foreign key
taken from the superior/Strong entity
Primary key composed of:
Partial identifier of weak entity
and/or
Primary key of identifying relation (strong entity)
ER Diagrams into Relations
Example of mapping a weak entity
a) Weak entity DEPENDENT
NOTE: the domain constraint
for the foreign key should
NOT allow null value if
DEPENDENT is a weak
entity
Foreign key
Composite primary key
Example of mapping a weak entity (cont.)
b) Relations resulting from weak entity
The cardinality will be the basis for mapping relationship
One-to-One:
Primary key of one of the side can be taken into the other as a
foreign key.
Recommendation: Primary key on the optional side becomes a
foreign key on the mandatory side
One-to-Many:
Primary key on the one side becomes a foreign key on the
many side
Many-to-Many:
Create a new relation with the primary keys of the two entities
becomes the foreign key in the new table. The foreign keys will
as well be part of the primary key in the new table.
Mapping Relationships
Example of mapping a 1:M relationship
a) Relationship between customers and orders
Note the mandatory one
b) Mapping the relationship
Again, no null value in the
foreign key…this is because
of the mandatory minimum
cardinalityForeign key
Example of mapping an M:N relationship
a) Many to Many relationship (M:N)
The Completes relationship will need to become a separate relation
Foreign key
Foreign key
Composite primary key
Example of mapping an M:N relationship (cont.)
a) Three resulting relations
New intersection
relation
Example of mapping an M:N relationship
b) Many to Many relationship (M:N) with attribute
✓ What if the relationship has additional attribute called
“Date Completed”
✓ The Completes relationship will need to become a
separate relation with the newly added attribute
New
intersection
relation
Foreign key
Foreign key
Composite primary key
Example of mapping an M:N relationship (cont.)
b) Three resulting relations
Example of mapping a binary 1:1 relationship
a) One to one relationship (1:1)
Often in 1:1 relationships, one direction is optional.
b) Resulting relations
Example of mapping a binary 1:1 relationship (cont.)
Foreign key goes in the relation on the optional side,
Matching the primary key on the mandatory side
Example of mapping an associative entity
a) An associative entity
Example of mapping an associative entity (cont.)
b) Three resulting relations
Composite primary key formed from the two foreign keys
Example of mapping an associative entity with
an identifier
a) SHIPMENT associative entity
Example of mapping an associative entity with
an identifier (cont.)
b) Three resulting relations
Primary key differs from foreign keys
ER Diagrams into RelationsMapping Unary Relationships
One-to-One:
Recursive foreign key in the same relation
One-to-Many:
Recursive foreign key in the same relation
Many-to-Many:
Create a new relation and the primary key of the entity will be
taken as a foreign key twice playing both roles in the recursive
relationship. (N.B: make sure that the two foreign key names are
different from one another even though they refer to same primary
key value in the main table)
Mapping a unary 1:N relationship
➢EMPLOYEE
entity with
unary
relationship
➢ EMPLOYEE
relation with
recursive
foreign key
Mapping a unary M:N relationship
➢ Bill-of-materials
relationships (M:N)
➢ ITEM and
COMPONENT
relations
Mapping Ternary (and n-ary) Relationships
Create a new relation an post the primary key of all
participating entities into the new table.
Include if there are any attributes that would be
meaningful only if the relationship takes place.
ER Diagrams into Relations
Mapping a ternary relationship
a) PATIENT TREATMENT Ternary relationship with
associative entity
b) Mapping the ternary relationship PATIENT TREATMENT
Remember
that the
primary key
MUST be
unique
Mapping a ternary relationship (cont.)
This is why
treatment date
and time are
included in the
composite
primary key
But this makes a
very
cumbersome
key…
It would be
better to create a
surrogate key
like Treatment#
EER Diagrams into RelationsMapping Supertype/Subtype Relationships
A separate relation for supertype and for each subtype
Supertype attributes (including the subtype discriminator) go into supertype relation
Subtype attributes go into each subtype; primary key of supertype relation also becomes primary key of subtype relation
Supertype/subtype relationships
Mapping Supertype/subtype relationships to relations
These are implemented as one-to-one
relationships
Example
Example
DepartmentDepartment
EmployeeEmployee
ProjectProject
Works_forWorks_for
assignedassigned
Run_ByRun_By
Solution
Convert the ERD into Relation
The simple process is
1. Create a table/relation for each entity from the ERD.
2. The entity attributes become the attributes/columns
for the table/relation.
3. Potentially, include some foreign keys to represent
relationships shown in the ERD.
Cont…
Create a table for each ERD entity
There are 6 entities in the above Figure so there are six tables
with the same names:
Department,
Supervisor,
EmployeeDepartment,
Employee,
EmployeeProject,
Project.
Cont…
Entity attributes become table attributes
DEPARTMENT( DepartmentName )
SUPERVISOR( SupervisorNumber, SupervisorName )
EMPLOYEEDEPARTMENT( DepartmentName, EmployeeNumber )
EMPLOYEE( EmployeeNumber, EmployeeName )
EMPLOYEEPROJECT( EmployeeNumber, ProjectNumber )
PROJECT( ProjectNumber )
Cont… Include some foreign keys to represent relationships
Let's list the relationships and take each in turn
Run By
Can be represented by adding SupervisorNumber to the Department table
DEPARTMENT( DepartmentName, SupervisorNumber )
Is assigned (employee-department)
Is already represented in the EmployeeDepartment table
Cont…
Involves
Also represented in the EmployeeDepartment table
Works on
Represented in the EmployeeProject table
Is assigned (employ-project)
Also represented by the EmployeeProject table.
Cont… This means no other foreign keys are really needed.
This leaves the Project table with only a primary key.
But
The primary key is meant to uniquely identify instances of an entity.
The only data being stored in this table is the primary key, it has nothing to identify.
The choices are Remove the table. It's not needed if it only has the primary key.
Add more attributes to the table.
So the Project table becomes
PROJECT(ProjectNo, ProjectName, Budget, EndDate )
Cont…
Add more required attributes to other table EMPLOYEEDEPARTMENT
( DepartmentName, EmployeeNumber, StartDate )
EMPLOYEEPROJECT
( EmployeeNumber, ProjectNumber, StartDate )
Normally, this sort of process would not be required.
CLIENT
ClientID
ClientName
PostalAddress
(Street, Suburb,
State, PostCode)
Phone {(PhoneType,
PhoneNbr)}
ClientType
Capricorn Computing Consultants (c3)
Entity relationship model
Author: E.E. Tansley
Last modified: 30/6/09
PROJECT
ProjectID
Description
EstimatedPrice
EstimatedStartDate
EstimatedFinishDate
ActualStartDate
ActualFinishDate
CurrentStatus
[ActualDuration]
TASK
TaskID
Description
EstimatedHours
ActualStartDate
ActualFinishDate
CurrentStatus
[ActualDuration]
STAFF
StaffID
StaffName
Mobile
JobTitle
managed
byrequests
TIMESHEET
TimesheetID
WorkDate
WorkDescription
HoursWorked
Billable
involves
submitsincurs
pre-req
Exercise
Next on:
Process of Normalization
Objectives
144
Recalling Relational concepts Understand different anomalies and functional
dependency concepts Use normalization to convert anomalous tables to
well-structured relations Why normalize What is normalization Identify three problems solved by normalization Example of how to normalize
Relation
145
Definition:
A relation is a data in a form of a named, two-dimensional table
Major requirements for a table to qualify as a relation: A Relation must have a unique name Every attribute value must be atomic
not multi-valued, not composite Every row must be unique (can’t have two rows with exactly the
same values for all its fields) Attributes (columns) in tables must have unique names The order of the columns and rows must be irrelevant
NOTE: all relations are in 1st Normal form
Why normalize
146
Data design aims to identify data stored in a system
Almost certainly stored in a relational database
Normalization intended to
Eliminate redundancy Reduce the potential for anomalies
Organize data efficiently
What is normalization
147
Normalization is the process to decompose a relation
into a set of smaller relations
A relation is in a specific normal form (NF) if it
Satisfies requirements of all previous NFs
Satisfies requirements of the current NF
The focus is on the first three NFs (1NF, 2NF & 3NF)
Data/Database normalization is a series of stepsfollowed to obtain a database design that allowsfor consistent storage and efficient access ofdata in a relational database.
Cont…
148
Formal definitionNORMALIZATION is the process of identifying the
logical associations between data items and designing a database that will represent such associations but without suffering the update anomalies which are;
Insertion Anomalies Deletion Anomalies Modification Anomalies
Reading Assignment: Read and Understand the three kinds of anomalies
Functional Dependencies and Keys
153
The mechanism to avoid the update anomalies and do normalization is to analyze relationship between data values/attributes of a relation
Such analysis is done using the concept of Functional Dependency
Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute
Candidate Key: A unique identifier. One of the candidate keys will become the
primary key
Each non-key field is functionally dependent on the candidate key/primary key
Data Dependency
154
FDs are derived from the real-world constraints on the attributes
The essence of this idea is that if the existence of
something, call it A, implies that B must exist and
have a certain value, then we say:
"B is functionally dependent on A."
We also often express this idea by saying that
"A determines B," or
"B is a function of A," or
"A functionally governs B."
Data Dependency
155
Often, the notions of functionality and functional dependency are expressed briefly by the statement, "If A, then B."
It is important to note that the value B must be uniquefor a given value of A,
i.e., any given value of A must imply just one and only one value of B, in order for the relationship to qualify for the name "function."
However, this does not necessarily prevent different values of A from implying the same value of B.
Data Dependency
156
X →Y holds whenever two tuples have the same value for X, they must have the same value for Y
Notation:
A→B which is read as; B is functionally dependent on A
or A functionally determines B
In general, functional dependency is a relationship among attributes.
Who will tell us this FD? How do we know?
Most of the time we need sample data
Data Dependency…
157
Partial Dependency
If an attribute which is not a member of the primary key is
dependent on some part of the primary key (if we have
composite primary key) then that attribute is partially
functionally dependent on the primary key.
Let {A,B} be the Primary Key and C is non key attribute.
Then if {A,B}→C and B→C or A→ C
Then C is partially functionally dependent on {A,B}
Data Dependency…
158
Full Dependency
If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key
(if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is non key attribute
Then if {A,B}→C and B→C and A→C does not hold
Then C Fully functionally dependent on {A,B}
Data Dependency…
159
Transitive Dependency In mathematics and logic, a transitive relationship is a relationship
of the following form: "If A implies B, and if also B implies C, then A implies C."
Example: If Mr X is a Human, and if every Human is an Animal, then Mr X must be
an Animal.
Generalized description of transitive dependency is that: If A functionally governs B, AND If B functionally governs C
THEN A functionally governs C Provided that neither C nor B determines A
i.e. (B A and C A)
In the normal notation: {(A→B) AND (B→C)} ==> A→C provided that B A and C A
Steps of Normalization
160
We have various levels or steps in normalization called Normal Forms.
The level of complexity, strength of the rule and decomposition increases as we move from one lower level Normal Form to the higher.
A table in a relational database is said to be in a certain normal form if it satisfies certain constraints.
Normal form below(next) represents a stronger condition than the previous one
161
Steps in normalization- Pictorial representation
First Normal Form (1NF)
162
Requires that all column values in a table are atomic (e.g., a number is an atomic value, while a list or a set is not).
Solution
Moving this repeating groups to a new row by repeating the common attributes. If so then Find the key with which you can find all data
Thus No multivalued attributes Every attribute value is atomic For example Fig. 5-25 is not in 1st Normal Form (multivalued
attributes) ➔ it is not a relation While Fig. 5-26 is in 1st Normal form
Remark All relations are in 1st Normal Form
Cont…
163
Formal Definition: a table (relation) is in 1NF
If
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF )
164
165
FIRST NORMAL FORM (1NF)Remove all repeating groups. Distribute the multi-valued attributes into different rows and identify a unique identifier for the relation so that is can be said is a relation in relational database.
166
Example 2: Table with multivalued attributes, not in 1st
normal form
Note: this is NOT a relation
167
Table with no multivalued attributes and unique rows, in 1st
normal form
Note: this is relation, but not a well-structured one
Second Normal form 2NF
168
No partial dependency of a non key attribute on part of the
primary key.
This will result in a set of relations with a level of Second
Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a
non-composite) key is automatically also in 2NF.
Cont…
169
Formal Definition: a table (relation) is in 2NF
If
It is in 1NF and
If all non-key attributes are dependent on the entire
primary key. i.e. no partial dependency.
Example 1: for 2NF
170
EmpID EmpName ProjNo ProjName ProjLoc ProjFund PrrojMangID Incentive
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive
EMP_PROJ rearranged
EMP_PROJ
Business rule: Whenever an employee participates in a project, he/she will be entitled for an incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with multi-valued property.
Cont…
171
To convert it to a 2NF we need to remove all partial
dependencies of non key attributes on part of the primary
key.
{EmpID, ProjNo}→ EmpName, ProjName, ProjLoc, ProjFund,
ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}→EmpNameFD2: {ProjNo}→ProjName, ProjLoc, ProjFund, ProjMangIDFD3: {EmpID, ProjNo}→ Incentive
Cont…
172
As we can see, some non key attributes are partially
dependent on some part of the primary key.
This can be witnessed by analyzing the first two
functional dependencies (FD1 and FD2).
Thus, each Functional Dependencies, with their
dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for
each.
Cont…
173
174
Order_ID ➔ Order_Date, Customer_ID, Customer_Name, Customer_Address
Therefore, NOT in 2nd Normal Form
Customer_ID ➔ Customer_Name, Customer_Address
Product_ID ➔ Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID ➔ Order_Quantity
Example 2: Functional dependency diagram for INVOICE
175
Partial dependencies are removed, but there
are still transitive dependencies
Getting it into Second Normal Form
Removing partial dependencies
Third Normal Form (3NF )
176
Eliminate Columns Dependent on another non-Primary Key - If attributes do not contribute to a description of the key, remove them to a separate table. This level avoids update and delete anomalies.
Formal Definition: a Table (Relation) is in 3NF
If It is in 2NF and There are no transitive dependencies between a
primary key and non-primary key attributes.
Cont…
177
2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes) Note: This is called transitive, because the primary key is a determinant
for another attribute, which in turn is a determinant for a third
Solution: Non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as
foreign key in the old table
178
Transitive dependencies are removed
Removing transitive dependencies
Getting it into Third Normal Form
2nd Example for (3NF)
Assumption: Students of same batch (same year) live in one building or dormitory
Student
StudID Stud_F_Name Stud_L_Name Dept Year Dormitary
125/97 Abebe Mekuria Info Sc 1 401
654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompSc 3 403
165/97 Alem Kebede InfoSc 1 401
985/95 Almaz Belay Geog 3 403
This schema is in its 2NF since the primary key is a single attribute.
179
Cont…
180
StudID→Year ANDYear→Dormitary
And
Year can not determine StudID and Dormitary can not
determine StudID
Then transitively StudID→Dormitary
To convert it to a 3NF we need to remove all transitive dependencies of non key attributes on another non-key attribute.
The non-primary key attributes, dependent on each other will be moved to another table and linked with the main table using Candidate Key- Foreign Key relationship.
Cont…
181
Cont…
182
Generally, eventhough there are other four additional levels
of Normalization, a table is said to be normalized if it reaches
3NF.
A database with all tables in the 3NF is said to be Normalized
Database.
Reading Assignment Boyce-Codd Normal Form (BCNF)
Forth Normal form (4NF)
Fifth Normal Form (5NF)
Domain-Key Normal Form (DKNF)
Summary of the Process of
Normalization
183
To understand normalisation you need to know these
problems
1NF: Repeating groups
2NF: Partial Dependency
3NF: Transitive Dependency and Derived Attributes
Normalisation is the process of decomposing relations
Cont…
184
Some important indicators
No Repeating or Redundancy: no repeating fields.
The Fields Depend Upon the Key: the table should solely
depend on the key.
The Whole Key: no partial key dependency.
And Nothing But The Key: no inter data dependency.
Cont…
185
Pitfalls (Limitations) of Normalization
Requires data to see the problems
May reduce performance of the system
Is time consuming,
Difficult to design and apply and
Prone to human error
Physical Database Design
186
Physical Objectives
Understand Purpose of physical database design
Describe the physical database design process
Choose storage formats for attributes
Describe indexes and their appropriate use
187
Logical vs Physical Database Design
Logical database design is independent of
implementation details, such as functionality of target
DBMS.
Logical database design concerned with the what,
physical database design is concerned with the how.
Physical Design is description of the
implementation of the database on secondary
storage using the target DBMS
188
Physical Database Design
Process of producing a description ofimplementation of the database on secondarystorage.
The main tasks including: field specification,
choosing indexes and file organization, and to
refine the conceptual and external schemas (if
necessary) to meet performance goals.
189
Cont…
The primary goal of physical database design is
data processing efficiency (Time and Storage
impact)
We will concentrate on choices often available to
optimize performance of database services
Many physical database design decisions are
implicit in the technology adopted
190
Cont…
Design decision about Storage Format
Choosing the storage format of each field (attribute).
The DBMS provides some set of data types that can
be used for the physical storage of fields in the
database
Data Type (format) is chosen to optimize storage
space and maximize data integrity
191
192
Cont…
It is also called Designing Fields Field
Smallest unit of named application data recognized by system software
Attributes from relations will be represented as fields In addition to Choosing data type you may also
consider Coding (optional) Controlling data integrity
Data Type A coding scheme recognized by system software for
representing organizational data
192
Cont…
Choosing Data Types
CHAR–fixed-length character
VARCHAR–variable-length character (memo)
LONG–large number
NUMBER–positive/negative number
INEGER–positive/negative whole number
DATE–actual date
BLOB–binary large object (good for graphics, sound
clips, etc.)
193
Cont…
Objectives of choosing data types1. Minimize storage space
2. Represent all possible values of the field
3. Improve data integrity of the field
4. Support all data manipulations desired on
the field
194
195
Example storage format (field design) for
the “Branch” table
195
196
Cont…
Choose indexes Index in a database is similar to an index in a book Primary indexes is automatic by the primary key Clustering index- by non key attribute
Guidelines for Choosing Indexes Specify a unique index for the primary key of each table. Specify an index for foreign keys. Specify an index for nonkey fields that are referenced in
qualification, sorting and grouping commands for the purpose of retrieving data.
196
Rules for Using Indexes
1. Use on larger tables
2. Index the primary key of each table
3. Index search fields (fields frequently in WHERE clause)
4. Fields in SQL ORDER BY and GROUP BY commands
5. When there are >100 values but not when there are <30 values
197
Rules for Using Indexes (cont.)
6. Avoid use of indexes for fields with long values; perhaps compress values first
Name is costly to be indexed
7. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s)
8. Null values will not be referenced from an index
9. Use indexes heavily for non-volatile databases; limit the use of indexes for volatile databases
Why? Because modifications (e.g. inserts, deletes) require updates to occur in index files
198
End of Chapter Four
199