8/13/2019 Database Lecture Note
1/106
AAU
2006
Database
Instructor Seble
M.T
W W W . A A U . E D U . E T
8/13/2019 Database Lecture Note
2/106
Fundamentals of Database
8/13/2019 Database Lecture Note
3/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 1
CHAPTER ONE
Database System
Database systems are designed to manage large data set in an organization. The data
management involves both definition and the manipulation of the data which ranges from
simple representation of the data to considerations of structures for the storage of information.
The data management also consider the provision of mechanisms for the manipulation of
information.
Today, Databases are essential to every business. They are used to maintain internal records,
to present data to customers and clients on the World-Wide-Web, and to support many other
commercial processes. Databases are likewise found at the core of many modern
organizations.
The power of databases comes from a body of knowledge and technology that has developed
over several decades and is embodied in specialized software called a database management
system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of
data efficiently and allowing it to persist over long periods of time, safely. These systems are
among the most complex types of software available.
Thus, for our question: What is a database? In essence a database is nothing more than a
collection of shared information that exists over a long period of time, often many years. Incommon dialect, the term database refers to a collection of data that is managed by a DBMS.
Thus the DB course is about:
How to organize data
Supporting multiple users
Efficient and effective data retrieval
Secured and reliable storage of data
Maintaining consistent data Making information useful for decision making
Data management passes through the different levels of development along with the
development in technology and services. These levels could best be described by categorizing
the levels into three levels of development. Even though there is an advantage and a problem
overcome at each new level, all methods of data handling are in use to some extent. The major
three levels are;
8/13/2019 Database Lecture Note
4/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 2
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual ApproachIn the manual approach, data storage and retrieval follows the primitive and traditional way
of information handling where cards and paper are used for the purpose.
Files for as many event and objects as the organization has are used to store
information.
Each of the files containing various kinds of information is labeled and stored in one or
more cabinets.
The cabinets could be kept in safe places for security purpose based on the sensitivity
of the information contained in it.
Insertion and retrieval is done by searching first for the right cabinet then for the right
the file then the information.
One could have an indexing system to facilitate access to the data
Limitations of the Manual approach
Prone to error Difficult to update, retrieve, integrate
You have the data but it is difficult to compile the information
Limited to small size information
Cross referencing is difficult
An alternative approach of data handling is a computerized way of dealing with the
information. The computerized approach could also be either decentralized or centralizedbase on where the data resides in the system.
2. Traditional File Based Approach
After the introduction of Computer for data processing to the business community, the need
to use the device for data storage and processing increase. There were, and still are, several
8/13/2019 Database Lecture Note
5/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 3
computer applications with file based processing used for the purpose of data handling. Even
though the approach evolved over time, the basic structure is still similar if not identical.
File based systems were an early attempt to computerize the manual filing system.
This approach is the decentralized computerized data handling method.
A collection of application programs perform services for the end-users. In such
systems, every application program that provides service to end users, define and
manage its own data.
Such systems have number of programs for each of the different applications in the
organization.
Since every application defines and manages its own data, the system is subjected to
serious data duplication problem.
File, in traditional file based approach, is a collection of records which contains
logically related data.
8/13/2019 Database Lecture Note
6/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 4
Limitations of the Traditional File Based approach
As business application become more complex demanding more flexible and reliable data
handling methods, the shortcomings of the file based system became evident. These
shortcomings include, but not limited to:
Separation or Isolation of Data: Available information in one application may not be
known.
Limited data sharing
Lengthy development and maintenance time
Duplication or redundancy of data
Data dependency on the application
Incompatible file formats between different applications and programs creating
inconsistency.
Fixed query processing which is defined during application development
The limitations for the traditional file based data handling approach arise from two basic
reasons.
1. Definition of the data is embedded in the application program which makes it
difficult to modify the database definition easily.
2. No control over the access and manipulation of the data beyond that imposed by
the application programs.
The most significant problem experienced by the traditional file based approach of data
handling is the update anomalies. We have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one ore more data value is
modified on one application program but not on others containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.
3. Insertion Anomalies: a problem encountered where one cannot decide whether the data
to be inserted is valid and consistent with other similar data set.
8/13/2019 Database Lecture Note
7/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 5
3. Database Approach
Following a famous paper written by Ted Codd in 1970, database systems changed
significantly. Codd proposed that database systems should present the user with a view of
data organized as tables called relations. Behind the scenes, there might be a complex data
structure that allowed rapid response to a variety of queries. But, unlike the user of earlier
database systems, the user of a relational system would not be concerned with the storage
structure. Queries could be expressed in a very high-level language, which greatly increased
the efficiency of database programmers. The database approach emphasizes the integration
and sharing of data throughout the organization.
Thus in Database Approach:
Database is just a computerized record keeping system or a kind of electronic filing
cabinet.
Database is a repository for collection of computerized data files.
Database is a shared collection of logically related data designed to meet the information
needs of an organization. Since it is a shared corporate resource, the database is
integrated with minimum amount of or no duplication.
Database is a collection of logically related data where these logically related data
comprises entities, attributes, relationships, and business rules of an organization's
information.
In addition to containing data required by an organization, database also contains a
description of the data which called as Metadata or Data Dictionary or Systems
Catalogue or Data about Data.
Since a database contains information about the data (metadata), it is called a self
descriptive collection on integrated records.
The purpose of a database is to store information and to allow users to retrieve and
update that information on demand.
Database is deigned once and used simultaneously by many users.
Unlike the traditional file based approach in database approach there is program data
independence. That is the separation of the data definition from the application. Thus the
application is not affected by changes made in the data structure and file organization.
Each database application will perform the combination of: Creating database, Reading,
Updating and Deleting data.
8/13/2019 Database Lecture Note
8/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 6
Benefits of the database approach
Data can be shared: two or more users can access and use same data instead of storing
data in redundant manner for each user.
Improved accessibility of data: by using structured query languages, the users can easily
access data without programming experience.
Redundancy can be reduced: isolated data is integrated in database to decrease the
redundant data stored at different applications.
Quality data can be maintained: the different integrity constraints in the database
approach will maintain the quality leading to better decision making
Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the
data in the database to some extent.
Transaction support can be provided: basic demands of any transaction support systems
are implanted in a full scale DBMS.
Integrity can be maintained: data at different applications will be integrated together with
additional constraints to facilitate shared data resource.
Security majors can be enforced: the shared data can be secured by having different levels
of clearance and other data security mechanisms.
Improved decision support: the database will provide information useful for decision
making.
Standards can be enforced: the different ways of using and dealing with data by different
unite of an organization can be balanced and standardized by using database
approach.
Compactness: since it is an electronic data handling method, the data is stored compactly
(no voluminous papers).
Speed: data storage and retrieval is fast as it will be using the modern fast computer
systems.
Less labour: unlike the other data handling methods, data maintenance will not demand
much resource.
Centralized information control: since relevant data in the organization will be stored at
one repository, it can be controlled and managed at the central level.
8/13/2019 Database Lecture Note
9/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 7
Limitations and risk of Database Approach
Introduction of new professional and specialized personnel.
Complexity in designing and managing data
Te cost and risk during conversion from the old to the new system
High cost incurred to develop and maintain
Complex backup and recover services from the users perspective
Reduced performance due to centralization
High impact on the system when failure occur
8/13/2019 Database Lecture Note
10/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 8
Database Management System (DBMS)
Database Management System (DBMS) is a Software package used for providing EFFICIENT,
CONVENIENT and SAFE MULTI-USER (many people/programs accessing same database, or even same
data, simultaneously) storage of and access to MASSIVE amounts of PERSISTENT (data outlivesprograms that operate on it) data. A DBMS also provides a systematic method for creating,
updating, storing, retrieving data in a database. DBMS also provides the service of controlling
data access, enforcing data integrity, managing concurrency control, and recovery. Having
this in mind, a full scale DBMS should at least have the following services to provide
to the user.
1. Data storage, retrieval and update in the database
2. A user accessible catalogue
3. Transaction support service: ALL or NONE transaction, which minimize data
inconsistency.
4. Concurrency Control Services: access and update on the database by different
users simultaneously should be implemented correctly.
5.Recovery Services: a mechanism for recovering the database after a failure must
be available.6. Authorization Services (Security): must support the implementation of access
and authorization service to database administrator and users.
7. Support for Data Communication: should provide the facility to integrate with
data transfer software or data communication managers.
8.Integrity Services: rules about data and the change that took place on the data,
correctness and consistency of stored data, and quality of data based on
business constraints.
9. Services to promote data independency between the data and the application
10. Utility services: sets of utility service facilities like
Importing data Statistical analysis support Index reorganization Garbage collection
8/13/2019 Database Lecture Note
11/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 9
DBMS and Components of DBMS Environment
A DBMS is software package used to design, manage, and maintain databases. It provides the
following facilities:
Data Definition Language (DDL):
o Language used to define each data element required by the organization.
o Commands for setting up schema of database
Data Manipulation Language (DML):
o Language used by end-users and programmers to store, retrieve, and access the
data e.g. SQL
o Also called "query language"
Data Dictionary: tool used to store and organize information about the data
The DBMS is software that helps to design, handle, and use data using the database approach.
Taking a DBMS as a system, one can describe it with respect to it environment or other
systems interacting with the DBMS. The DBMS environment has five components.
1. Hardware: Components that are comprised of personal computers, mainframe or
any server computers, network infrastructure, etc.
2. Software: those components like the DBMS software, application programs,
operating systems, network software, and other relevant software.
3. Data: This is the most important component to the user of the database. There are two
types of data in a database approach that is Operational andMetadata.
The structure of the data in the database is called the schema, which is composed of the
Entities, Properties of entities, and relationship between entities.
4. Procedure: this is the rules and regulations on how to design and use a database. It
includes procedures like how to log on to the DBMS, how to use facilities, how to start
and stop transaction, how to make backup, how to treat hardware and software failure,
how to change the structure of the database.
5. People: people in the organization responsible to designing, implement, manage,
administer and use of the database.
8/13/2019 Database Lecture Note
12/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 10
Database Development Life Cycle
As it is one component in most information system development tasks, there are several steps
in designing a database system. Here more emphasis is given to the design phases of the
system development life cycle. The major steps in database design are;
1. Planning: that is identifying information gap in an organization and propose a
database solution to solve the problem.
2. Analysis: that concentrates more on fact finding about the problem or the
opportunity. Feasibility analysis, requirement determination and structuring, and
selection of best design method are also performed at this phase.
3. Design: in database designing more emphasis is given to this phase. The phase is
further divided into three sub-phases.
a. Conceptual Design: concise description of the data, data type, relationship
between data and constraints on the data.
There is no implementation or physical detail consideration.
Used to elicit and structure all information requirements
b. Logical Design: a higher level conceptual abstraction with selected specific data
model to implement the data structure.
It is particular DBMS independent and with no other physical
considerations.
c. Physical Design: physical implementation of the upper level design of the
database with respect to internal storage and file structure of the database for
the selected DBMS.
To develop all technology and organizational specification.
4. Implementation: the testing and deployment of the designed database for use.
5. Operation and Support: administering and maintaining the operation of the
database system and providing support to users.
8/13/2019 Database Lecture Note
13/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 11
Roles in Database Design and Use
As people are one of the components in DBMS environment, there are group of roles played
by different stakeholders of the designing and operation of a database system.
1. DataBase Administrator (DBA) Responsible to oversee, control and manage the database resources (the database itself,
the DBMS and other related software)
Authorizing access to the database
Coordinating and monitoring the use of the database
Responsible for determining and acquiring hardware and software resources
Accountable for problems like poor security, poor performance of the system
Involves in all steps of database developmentWe can have further classifications of this role in big organizations having huge amount of
data and user requirement.
1. Data Administrator (DA): is responsible on management of data resources.
Involves in database planning, development, maintenance of standards policies
and procedures at the conceptual and logical design phases.
2. DataBase Administrator (DBA): is more technically oriented role. Responsible
for the physical realization of the database. Involves in physical design,
implementation, security and integrity control of the database.
2. DataBase Designer (DBD)
Identifies the data to be stored and choose the appropriate structures to represent and
store the data.
Should understand the user requirement and should choose how the user views the
database.
Involve on the design phase before the implementation of the database system.
We have two distinctions of database designers, one involving in the logical and conceptual
design and another involving in physical design.
8/13/2019 Database Lecture Note
14/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 12
1. Logical and Conceptual DBD
Identifies data (entity, attributes and relationship) relevant to the
organization
Identifies constraints on each data
Understand data and business rules in the organization
Sees the database independent of any data model at conceptual level and
consider one specific data model at logical design phase.
2. Physical DBD
Take logical design specification as input and decide how it should be
physically realized.
Map the logical data model on the specified DBMS with respect to tables and
integrity constraints. (DBMS dependent designing)
Select specific storage structure and access path to the database
Design security measures required on the database
3. Application Programmer and Systems Analyst
System analyst determines the user requirement and how the user wants to view
the database.
The application programmer implements these specifications as programs; code,
test, debug, document and maintain the application program.
Determines the interface on how to retrieve, insert, update and delete data in the
database.
The application could use any high level programming language according to the
availability, the facility and the required service.
4. End Users
Workers, whose job requires accessing the database frequently for various purpose. There are
different group of users in this category.
1. Nave Users:
Sizable proportion of users
Unaware of the DBMS
8/13/2019 Database Lecture Note
15/106
Chapter One: Introduction to Database Systems
AAU, Computer Science Department, 2013 Page 13
Only access the database based on their access level and demand
Use standard and pre-specified types of queries.
2. Sophisticated Users
Are users familiar with the structure of the Database and facilities of the
DBMS.
Have complex requirements
Have higher level queries
Are most of the time engineers, scientists, business analysts, etc
3. Casual Users
Users who access the database occasionally.
Need different information from the database each time.
Use sophisticated database queries to satisfy their needs.
Are most of the time middle to high level managers.
These users can be again classified as Actors on the Scene and Workers Behind the Scene.
Actors On the Scene:
Data Administrator
Database Administrator
Database Designer
End Users
Workers Behind the Scene
DBMS designers and implementers: who design and implement different DBMS
software.
Tool Developers: experts who develop software packages that facilitates databasesystem designing and use. Prototype, simulation, code generator developers could be
an example. Independent software vendors could also be categorized in this group.
Operators and Maintenance Personnel: system administrators who are responsible
for actually running and maintaining the hardware and software of the database
system and the information technology facilities.
8/13/2019 Database Lecture Note
16/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 1
CHAPTER TWO
DATABASE SYSTEM ARCHITECTURE
Data Models, Schemas, and Instances
One fundamental characteristic of the database approach is that it provides some level of data
abstraction by hiding details of data storage that are not needed by most database users. A data
model is a collection of concepts that can be used to describe the structure of a database and also
it provides the necessary means to achieve this abstraction. By structure of a database, we mean
the data types, relationships, and constraints that should hold for the data. Most data models also
include a set of basic operations for specifying retrievals and updates on the database.
In addition to the basic operations provided by the data model, it is becoming more common to
include concepts in the data model to specify the dynamic aspect or behavior of a database
application. This allows the database designer to specify a set of valid user defined operations
that arc allowed on the database objects. An example of a user-defined operation could be
COMPUTE_GPA, which can be applied to a STUDENT object. On the other hand, generic
operations to insert, delete, modify, or retrieve any kind of object are often included in the basic
data model operations.
Categories of Data Models
Many data models have been proposed, which we can categorize according to the types of
concepts they use to describe the database structure. High-level or conceptual data models
provide concepts that are close to the way many users perceive data, whereas low-level or
physical data models provide concepts that describe the details of how data is stored in the
computer. Concepts provided by low-level data models are generally meant for computer
specialists, not for typical end users. Between these two extremes is a class of representational
(or implementation) data models, which provide concepts that may be understood by end users
but that are not too far removed from the way data is organized within the computer.
Representational data models hide some details of data storage but can be implemented on a
computer system in a direct way.
Conceptual data models use concepts such as entities, attributes, and relationships. An entity
represents a real-world object or concept, such as an employee or a project that is described in
8/13/2019 Database Lecture Note
17/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 2
the database. An attribute represents some property of interest that further describes an entity,
such as the employee's name or salary. A relationship among two or more entities represents an
association among two or more entities, for example, a works-on relationship between an
employee and a project.
Representational or implementation data models are the models used most frequently in
traditional commercial DBMSs. These include the widely used relational data model, as well as
the so-called legacy data models-the network and hierarchical models-that have been widely
used in the past. Representational data models represent data by using record structures and
hence are sometimes called record-based data models. We can regard object data models as a
new family of higher-level implementation data models that are closer to conceptual data
models.
FIGURE 1.2 A database that stores student and course information.
8/13/2019 Database Lecture Note
18/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 3
Hierarchical Data Model
In the hierarchical data model, information is organized as a collection of inverted trees of
records. The inverted trees may be of arbitrary depth. The record at the root of a tree has zero or
more child records; the child records, in turn, serve as parent records for their immediate
descendants. This parent-child relationship recursively continues down the tree.
The records consists of fields, where each field may contain simple data values (e.g. integer, real,
text)., or a pointer to a record. The pointer graph is not allowed to contain cycles. Some
combinations of fields may form the key for a record relative to its parent.
Applications can navigate a hierarchical database by starting at a root and successively navigate
downward from parent to children until the desired record is found. Applications can interleave
parent-child navigation with traversal of pointers. Searching down a hierarchical tree is very fast
since the storage layer for hierarchical databases use contiguous storage for hierarchical
structures. All other types of queries require sequential search techniques.
The hierarchical data model is impoverished for expressing complex information models. Often
a natural hierarchy does not exist and it is awkward to impose a parent-child relationship.
Pointers partially compensate for this weakness, but it is still difficult to specify suitable
hierarchical schemas for large models.
Hierarchical Data Model Summary
The simplest data model
Record type is referred to as node or segment
The top node is the root node
Nodes are arranged in a hierarchical structure as sort of upside-down tree
A parent node can have more than one child node
A child node can only have one parent node
The relationship between parent and child is one-to-many
Relation is established by creating physical link between stored records (each is stored with a
predefined access path to other records)
Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in
manufacturing, personnel organization in companies
8/13/2019 Database Lecture Note
19/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 4
Fig. Example of Hierarchical data model
Network Data Model
In the network data model, information is organized as a collection of graphs of record that are
related with pointers. Network data models represent data in a symmetric manner, unlike the
hierarchical data model (distinction between a parent and a child). A network data model is more
flexible than a hierarchical data model and still permits efficient navigation. The records consists of lists of fields (fixed or variable length with maximum length), where
each field contains a simple value (fixed or variable size). The network data model also
introduces the notion of indexes of fields and records, sets of pointers, and physical placement of
records.
Network Data Model Summary
Allows record types to have more that one parent unlike hierarchical model
A network data models sees records as set members
Each set has an owner and one or more members
Allow no many to many relationship between entities
Like hierarchical model network model is a collection of physically linked records.
Allow member records to have more than one owner
Database contains a complex array of pointers that thread through a set of records.
8/13/2019 Database Lecture Note
20/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 5
Fig. Many-to-many relationship defined using Network data type
Relational Data Model
In the relational data model, information is organized in relations (two-dimensional tables). Each
relation contain a set of tuples (records). Each tuple contain a number of fields. A field may
contain a simple value (fixed or variable size) from some domain (e.g. integer, real, text, etc.).
The relational data model is based on a mathematical foundation, called relational algebra. This
mathematical foundation is the cornerstone to some of the very attractive properties of relationaldatabases, since it first of all offers data independence, and offers a mathematical framework for
many of the optimizations possible in relational databases (e.g. query optimization).
Relational Data Model Summary
Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational Model for
Large Shared Data Banks')
Viewed as a collection of tables called Relations equivalent to collection of record
types
Stores information or data in the form of tables .. rows and columns
A row of the table is called tuple.. equivalent to record
A column of a table is called attribute.. equivalent to field
The tables seem to be independent but are related some how.
Conducts searches by using data in specified columns of one table to find additional data
in another table
In conducting searches, a relational database matches information from a field in one
table with information in a corresponding field of another table to produce a third table
that combines requested data from both tables
Can define more flexible and complex relationship
8/13/2019 Database Lecture Note
21/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 6
Schemas, Instances, and Database State
In any data model, it is important to distinguish between the descriptions of the database and the
database itself. The description of a database is called the database schema, which is specified
during database design and is not expected to change frequently. Most data models have certainconventions for displaying schemas as diagrams. A displayed schema is called a schema
diagram. Figure 2.1 shows a schema diagram for the database shown in Figure 1.2; the diagram
displays the structure of each record type but not the actual instances of records. We call each
object in the schema-such as STUDENT or COURSE-a schemaconstruct.
A schema diagram displays only some aspects of a schema, such as the names of record types
and data items, and some types of constraints. Other aspects are not specified in the schema
diagram; for example, Figure 2.1 shows neither the data type of each data item nor the
relationships among the various files. Many types of constraints are not represented in schema
diagrams. A constraint such as "students majoring in computer science must take CS1310 before
the end of their sophomore year" is quite difficult to represent.
The actual data in a database may change quite frequently. For example, the database shown in
Figure 1.2 changes every time we add a student or enter a new grade for a student. The data in
the database at a particular moment in time is called a database state or snapshot. It is also
called the current set of occurrences or instancesin the database. In a given database state, each
schema construct has its own current set of instances; for example, the STUDENT construct will
contain the set of individual student entities (records) as its instances. Many database states can
be constructed to correspond to a particular database schema. Every time we insert or delete a
record or change the value of a data item in a record, we change one state of the database into
another state.
The distinction between database schema and database state is very important. When we define a
new database, we specify its database schema only to the DBMS. At this point, the
corresponding database state is the empty state with no data. We get the initial state of the
databasewhen the database is first populated or loaded with the initial data. From then on, every
time an update operation is applied to the database, we get another database state. At any point in
time, the database has a current state. The DBMS is partly responsible for ensuring that every
state of the database is a valid state-that is, a state that satisfies the structure and constraints
8/13/2019 Database Lecture Note
22/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 7
specified in the schema. Hence, specifying a correct schema to the DBMS is extremely
important, and the schema must be designed with the utmost care. The DBMS stores the
descriptions of the schema constructs and constraints-also called the meta-data-in the DBMS
catalog so that DBMS software can refer to the schema whenever it needs to. The schema is
sometimes called the intension, and a database state an extensionof the schema.
Although, as mentioned earlier, the schema is not supposed to change frequently, it is not
uncommon that changes need to be occasionally applied to the schema as the application
requirements change. For example, we may decide that another data item needs to be stored for
each record in a file, such as adding the DateOfBirth to the STUDENT schema in Figure 2.1.
This is known as schemaevolution. Most modern DBMSs include some operations for schema
evolution that can be applied while the database is operational.
STUDENT
CourseName CourseNumber CreditHours Department
COURSE
SectionIdentifier CourseNumber Semester Year Instructor
SECTION
FIGURE 2.1 Schema diagram for the database in Figure 1.2.
Name StudentNumber Class Major
8/13/2019 Database Lecture Note
23/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 8
THREE-SCHEMA ARCHITECTURE AND DATA INDEPENDENCE
Three of the four important characteristics of the database approach, are (1) insulation of
program; and data (program-data and program-operation independence), (2) support of multiple
user views, and (3) use of a catalog to store the database description (schema). In this section wespecify an architecture for database systems, called the three-schema architecture also known
as the ANSI/SPARC architecture, that was proposed to help achieve and visualize these
characteristics. We then further discuss the concept of data independence.
The Three-Schema Architecture
The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user
applications and the physical database. In this architecture, schemas can be defined at the
following three levels:
Internal Level
The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete details of
data storage and access paths for the database.
Conceptual Level
The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical storage
structures and concentrates on describing entities, data types, relationships, user operations, and
constraints. Usually, a representational data model is used to describe the conceptual schema
when a database system is implemented. This implementation conceptual schema is often based
on a conceptual schema design in a high-level data model.
External Level
The external or view level includes a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and hides
the rest of the database from that user group. As in the previous case, each external schema is
typically implemented using a representational data model, possibly based on an external schema
design in a high level data model.
8/13/2019 Database Lecture Note
24/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 9
The three-schema architecture is a convenient tool with which the user can visualize the schema
levels in a database system. Most DBMSs do not separate the three levels completely, but
support the three-schema architecture to some extent. Some DBMSs may include physical-level
details in the conceptual schema. In most DBMSs that support user views, external schemas are
specified in the same data model that describes the conceptual-level information. Some DBMSs
allow different data models to be used at the conceptual and external levels.
Notice that the three schemas are only descriptions of data; the only data that actually exists is at
the physical level. In a DBMS based on the three-schema architecture, each user group refers
only to its own external schema. Hence, the DBMS must transform a request specified on an
external schema into a request against the conceptual schema, and then into a request on the
internal schema for processing over the stored database. If the request is a database retrieval, the
data extracted from the stored database must be reformatted to match the user's external view.
The processes of transforming requests and results between levels are called mappings. These
mappings may be time-consuming, so some DBMSs-especially those that are meant to support
small databases-do not support external views. Even in such systems, however, a certain amount
of mapping is necessary to transform requests between the conceptual and internal levels.
ANSI-SPARC Three-level Architecture
8/13/2019 Database Lecture Note
25/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 10
External View 1
Staff_No FName Salary
External View 2
FName DOB
Conceptual Level
Staff_No FName LName DOB Salary Branch_No
I nternal Level
struct STAFF{
int Staff_No;
int Branch_No;
char FName[15];char LName[15];
struct DOB;
float salary;
}
Fig. Differences between Three Levels of ANSI-SPARC Architecture
The ANSI-SPARC Architecture Defines DBMS schemas at three levels:
Internal schema
o at the internal level to describe physical storage structures and access paths.
Typically uses aphysical data model.
Conceptual schema
o at the conceptual level to describe the structure and constraints for the whole
database for a community of users. Uses a conceptual or an implementation data
model.
External schemas
o at the external level to describe the various user views. Usually uses the same data
model as the conceptual level.
8/13/2019 Database Lecture Note
26/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 11
Data Independence
The three-schema architecture can be used to further explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of dataindependence:
Logical Data Independence
Logical data independence is the capacity to change the conceptual schema without having to
change external schemas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item). In the last case, external schemas that
refer only to the remaining data should not be affected. For example, the external schema of
Figure l.4a should not be affected by changing the GRADE_REPORT file shown in Figure 1.2
into the one shown in Figure 1.5a. Only the view definition and the mappings need be changed in
a DBMS that supports logical data independence. After the conceptual schema undergoes a
logical reorganization, application programs that reference the external schema constructs must
work as before. Changes to constraints can be applied to the conceptual schema without affecting
the external schemas or application programs.
Logical Data Independence Summary
Refers to immunity of external schemas to changes in conceptual schema.
Conceptual schema changes e.g. addition/removal of entities should not require changes
to external schema or rewrites of application programs.
The capacity to change the conceptual schema without having to change the external
schemas and their application programs.
Physical Data Independence
Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized-for example, by creating additional access structures-to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to
8/13/2019 Database Lecture Note
27/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 12
change the conceptual schema. For example, providing an access path to improve retrieval speed
of SECTION records (Figure 1.2) by Semester and Year should not require a query such as "list
all sections offered in fall 1998" to be changed, although the query would be executed more
efficiently by the DBMS by utilizing the new access path.
Physical Data Independence Summary
The ability to modify the physical schema without changing the logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels and components should be well
defined so that changes in some parts do not seriously influence others.
The capacity to change the internal schema without having to change the conceptual
schema
Refers to immunity of conceptual schema to changes in the internal schema
Internal schema changes e.g. using different file organizations, storage structures/devices
should not require change to conceptual or external schemas.
Whenever we have a multiple-level DBMS, its catalog must be expanded to include information
on how to map requests and data among the various levels. The DBMS uses additional software
to accomplish these mappings by referring to the mapping information in the catalog. Data
independence occurs because when the schema is changed at some level, the schema at the next
higher level remains unchanged; only the mapping between the two levels is changed. Hence,
application programs referring to the higher-level schema need not be changed.
The three-schema architecture can make it easier to achieve true data independence, both
physical and logical. However, the two levels of mappings create an overhead during
compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because
of this, few DBMSs have implemented the full three-schema architecture.
8/13/2019 Database Lecture Note
28/106
Fundamental of Database Systems: Data Models and Database System Architecture
Page 13
Fig. Data Independence and the ANSI-SPARC Three-level Architecture
8/13/2019 Database Lecture Note
29/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 1
CHAPTER THREE
Relational Data Model & ER-Model
In database development life cycle, once all the requirements have been collected and analyzed,
the next step is to create a conceptualschemafor the database, using a high-level conceptual
datamodel(one of which is ER-Model). This step is called conceptualdesign. The conceptual
schema is a concise description of the data requirements of the users and includes detailed
descriptions of the entity types, relationships, and constraints; these are expressed using the
concepts provided by the high-level data model. Because these concepts do not include
implementation details, they are usually easier to understand and can be used to communicate
with nontechnical users. The high-level conceptual schema enables the database designers to
concentrate on specifying the properties of the data, without being concerned with storage
details. Consequently, it is easier for them to come up with a good conceptual database design.
The next step in database design is the actual implementation of the database, using a
commercial DBMS. Most current commercial DBMSs use an implementation data model such
as the relational or the object-relational database model-so the conceptual schema is transformed
from the high-level data model into the implementation data model. This step is called logical
designor datamodelmapping, and its result is a database schema in the implementation datamodel of the DBMS.
The last step is the physicaldesignphase, during which the internal storage structures, indexes,
access paths, and file organizations for the database files are specified. In this chapter only the
basic ER model concepts for conceptual schema design are discussed.
Relational Data Model
The building blocks of the relational data model are:
Entities: real world physical or logical object Attributes: properties used to describe each Entity or real world object. Relationship: the association between Entities Constraints: rules that should be obeyed while manipulating the data.
8/13/2019 Database Lecture Note
30/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 2
1. Entities/Relation/TableThe basic object that the ER model represents is an entity, which is a "thing" in the real world
with an independent existence. An entity may be an object with a physical existence (for
example, a particular person, car, house, or employee) or it may be an object with a conceptualexistence (for example, a company, a job, or a university course). Each entity has attributes-the
particular properties that describe it. For example, an employee entity may be described by the
employee's name, age, address, salary, and job. A particular entity will have a value for each of
its attributes.
NB:
The name given to an entity should always be a singular noun descriptive of each item tobe stored in it. E.g.: student NOT students.
Every relation has a schema, which describes the columns, or fields The relation itself corresponds to our familiar notion of a table A relation is a collection of tuples, each of which contains values for a fixed number of
attributes
2. Attributes/Fields/ColumnsAttributes are pieces of information ABOUT entities. The analysis must of course identify those
which are actually relevant to the proposed application. At this level we need to know such
things as:
Attribute name (be explanatory words or phrases) The domain from which attribute values are taken (A DOMAINis a set of values from
which attribute values may be taken.) Each attribute has values taken from a domain. For
example, the domain of Name is string and that for salary is real
Whether the attribute is part of the entity identifier (attributes which just describe anentity and those which help to identify it uniquely)
Whether it is permanent or time-varying (which attributes may change their values overtime)
Whether it is required or optional for the entity (whose values will sometimes beunknown or irrelevant)
8/13/2019 Database Lecture Note
31/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 3
Several types of attributes occur in the ER model: simple versus composite, single-valued versus
multi-valued, and stored versus derived.
Composite versus Simple (Atomic) AttributesComposite attributes can be divided into smaller subparts, which represent more basic
attributes with independent meanings. For example, Address attribute of an employee entity can
be subdivided into StreetAddress, City, State, and postal code. Attributes that are not divisible
are called simpleor atomicattributes. Composite attributes can form a hierarchy; for example,
StreetAddress can be further subdivided into three simple attributes: Number, Street, and
ApartmentNumber. The value of a composite attribute is the concatenation of the values of its
constituent simple attributes.
Single-Valued versus Multi-valued AttributesMost attributes have a single value for a particular entity; such attributes are called single-
valued. For example, Age is a single-valued attribute of a person. In some cases an attribute can
have a set of values for the same entity-for example, Colors attribute for a car, or a
CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone
cars have two values for Colors. Similarly, one person may not have a college degree, another
person may have one, and a third person may have two or more degrees; therefore, different
persons can have different numbers of values for the CollegeDegrees attribute. Such attributes
are called multi-valued. A multi-valued attribute may have lower and upper bounds to constrain
the number of values allowed for each individual entity. For example, the Colors attribute of a
car may have between one and three values, if we assume that a car can have at most three
colors.
Stored versus Derived AttributesIn some cases, two (or more) attribute values are related-for example, the Age and BirthDate
attributes of a person. For a particular person entity, the value of Age can be determined from the
current (today's) date and the value of that person's BirthDate. The Age attribute is hence called a
derived attribute and is said to be derivable from the BirthDate attribute, which is called a
stored attribute. Some attribute values can be derived from related entities; for example, an
8/13/2019 Database Lecture Note
32/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 4
attribute NumberOfEmployees of a department entity can be derived by counting the number of
employees related to (working for) that department.
Null ValuesIn some cases a particular entity may not have an applicable value for an attribute. For example,
the ApartmentNumber attribute of an address applies only to addresses that are in apartment
buildings and not to other types of residences, such as single-family homes. Similarly, a
CollegeDegrees attribute applies only to persons with college degrees. For such situations, a
special value called nullis created. An address of a single-family home would have null for its
ApartmentNumber attribute, and a person with no college degree would have null for
CollegeDegrees.
NB:
NULL applies to attributes which are not applicable or which do not have values. You may enter the value NA (meaning not applicable) Value of a key attribute cannot be null. Default value- assumed value if no explicit value
Key Attributes of Entity
An important constraint on the entities of an entity is the key or uniqueness constraint on
attributes. An entity usually has an attribute whose values are distinct for each individual entity
in the entity set. Such an attribute is called a keyattribute (Primary Key), and its values can be
used to identify each entity uniquely. For example, a Name attribute can be a key for a
COMPANY entity type, because no two companies are allowed to have the same name. For the
PERSON entity type, a typical key attribute is SocialSecurityNumber. Sometimes, several
attributes together form a key, meaning that the combination of the attribute values must be
distinct for each entity. If a set of attributes possesses this property, the proper way to represent
this is to define a composite attribute and designate it as a key attribute of the entity type.
Notice that such a composite key must be minimal; that is, all component attributes must be
included in the composite attribute to have the uniqueness property. An entity type may also
have nokey, in which case it is called a weakentity.
8/13/2019 Database Lecture Note
33/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 5
Summary of Key Attributes
o Each row of a table is uniquely identified by a PRIMARYKEYcomposed of one ormore columns
oGroup of columns, that uniquely identifies a row in a table is called a CANDIDATEKEY
o A column or combination of columns that matches the primary key of another table iscalled a FOREIGNKEY. Used to cross-reference tables
Value Sets (Domains) of AttributesEach simple attribute of an entity type is associated with a value set (or domain of values), which
specifies the set of values that may be assigned to that attribute for each individual entity. For
instance, if the range of ages allowed for employees is between 16 and 70, we can specify the
value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and
70. Similarly, we can specify the value set for the Name
3. RelationshipsA relationship type R among n entity types E1, E2, --- En, defines a set of associations among
entities. For example, consider a relationship type WORKS_FOR between two entity types
EMPLOYEE and DEPARTMENT, which associates each employee with the department for
which the employee works.
Degree of a Relationship
The degree of a relationship type is the number of participating entity types. Hence, the
WORKS-FOR relationship is of degree two. A relationship type of degree two is called binary,
and one of degree three is called ternary. Relationships can generally be of any degree, but the
ones most common are binary relationships. Higher-degree relationships are
For example, suppose that we want to capture which employees use which skills on which
project. We might try to represent this data in a database as three binary relationships between
skills and project, project and employee, and employee and skill.
8/13/2019 Database Lecture Note
34/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 6
Fig. A ternary relationship
Works-on
o Abebe and Kebede have worked on projectsA andB. Used-on
o Abebe has used programming skills on project x Have skill
o An employee has a certain skill. This is different than used on because there aresome skills that an employee has that he or she may not have used on a particular
project.
Neededo A project needs a particular skill. This is different than used on because there may
be some skills for which employees have not been assigned to the project yet.
Manageso An employee manages a project. This is a completely different dimension than
skill so it could not be captured by used on.
Role Names and Recursive Relationships
Each entity that participates in a relationship plays a particular role in the relationship. The role
name signifies the role that a participating entity from the entity type plays in each relationship
instance, and helps to explain what the relationship means. For example, in the WORKS_FOR
relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays
the role of department or employer.
8/13/2019 Database Lecture Note
35/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 7
In some cases the same entity participates more than once in a relationship with the same entity
in different roles. In such cases the role name is essential for distinguishing the meaning of each
participation. Such relationship types are called recursive relationships. For example, the
SUPERVISION relationship type relates an employee to a supervisor, where both employee and
supervisor entities are members of the same EMPLOYEE entity.
Cardinality Ratios for Binary Relationships
An important concept about relationship is the number of instances/tuples that can be associated
with a single instance from one entity in a single relationship. The cardinalityratiofor a binary
relationship specifies the maximum number of relationship instances that an entity can participate
in. The number of instances participating or associated with a single instance from an entity in a
relationship is called the CARDINALITYof the relationship.
For example, in the WORKS_FOR binary relationship type, DEPARTMENT: EMPLOYEE is of
cardinality ratio 1: N, meaning that each department can be related to (that is, employs) any
number of employees. But an employee can be related to (work for) only one department. The
possible cardinality ratios for binary relationship types are 1:1, l:M, M:l, and M:M.
ONE-TO-ONE:one tuple is associated with only one other tuple An example of a 1:1 binary relationship is MANAGES, which relates a
department entity to the employee who manages that department. This means any
point in time an employee can manage only one department and a department has
only one manager
Another example of 1:1 relationship: Building Location as a single buildingwill be located in a single location and as a single location will only accommodate
a single Building.
8/13/2019 Database Lecture Note
36/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 8
ONE-TO-MANY: one tuple can be associated with many other tuples, but not thereverse.
Eg1. Department-Student: as one department can have multiple students. Eg 2: Employee-Deprtament: as one employee can work in multiple departments.
MANY-TO-ONE:many tuples are associated with one tuple but not the reverse. E.g. Employee Department: as many employees belong to a single
department.
MANY-TO-MANY: one tuple is associated with many other tuples and from theother side, with a different role name one tuple will be associated with many tuples
E.g. StudentCourse: as a student can take many courses and a single coursecan be attended by many students.
E.g. Employee- Project: as an employee works-on many projects and a singleproject can be done by many employees.
8/13/2019 Database Lecture Note
37/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 9
4. Relational Constraints/Integrity Rules Relational Integrity
o Domain Integrity: No value of the attribute should be beyond the allowablelimits
o Entity Integrity: In a base relation, no attribute of a primary key can be nullo Referential Integrity: If a foreign key exists in a relation, either the foreign key
value must match a candidate key in its home relation or the foreign key value
must be null foreign key to primary key match-ups
o Enterprise Integrity: Additional rules specified by the users or databaseadministrators of a database are incorporated
Relational languages and views
The languages in relational database management systems are the DDL and the DML that are
used to define or create the database and perform manipulation on the database. We have the two
kinds of relation in relational database. The difference is on how the relation is created, used and
updated:
1. Base RelationA Named Relation corresponding to an entity in the conceptual schema, whose tuples arephysically stored in the database.
2. ViewIs the dynamic result of one or more relational operations operating on the base relations to
produce another virtual relation. So a view virtually derived relation that does not necessarily
exist in the database but can be produced upon request by a particular user at the time of request.
Purpose of a view
Hides unnecessary information from users Provide powerful flexibility and security Provide customized view of the database for users A view of one base relation can be updated. Update on views derived from various relations is not allowed
8/13/2019 Database Lecture Note
38/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 10
Conceptual Database Design
Conceptual design revolves around discovering and analyzing organizational and user data
requirements. The important activities are to identify: Entities, Attributes, Relationships and
Constraints and based on these components develop the ER model using ER diagrams.
The Entity Relationship (E-R) Model
Entity-Relationship modeling is used to represent conceptual view of the database. The main
components of ER Modeling are: Entities, Attributes, Relationships and Constraints. Before
working on the conceptual design of the database, one has to know and answer the following
basic questions.
What are the entities and relationships in the enterprise What information about these entities and relationships should we store in the
database?
What are the in tegri ty constraints that hold? Constraints on each data with respect toupdate, retrieval and store.
Represent this information pictorially in ER diagrams, then map ER diagram into arelational database schema.
ER-Diagrams
Entity is represented by a RECTANGLE containing the name of the entity
Attributes are represented by OVALS and are connected to the entity by a line
8/13/2019 Database Lecture Note
39/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 11
A derived attribute is indicated by a DOTTED LINE
PRIMARY KEYS are underlined
Relationships are represented by DIAMOND shaped symbols
Structural Constraints on Relationships
Relationship types usually have certain constraints that limit the possible combinations of entities
that may participate in the corresponding relationship set. These constraints are determined from
the miniworld situation that the relationships represent. For example, if a company has a rule that
each employee must work for exactly one department, then we would like to describe this
constraint in the schema. We can distinguish two main types of relationship structural
constraints: cardinality ratio and participation.
1. Constraints on Relationship / Multiplicity/ Cardinality ConstraintsMultiplicity constraint is the number of or range of possible occurrence of an entity
type/relation that may relate to a single occurrence/tuple of an entity type/relation through a
particular relationship. These constraints are mostly used to insure appropriate enterprise
constraints.
Specifies that each entity e in E participates in at least min and at most maxrelationship instances in R
Default(no constraint): min=0, max=n (signifying no limit) Must have minmax, min0, max 1
Key
8/13/2019 Database Lecture Note
40/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 12
One-to-One RelationshipRelationship Managesbetween Employee and Department
The multiplicity of the relationship is:o
One department can only have one managero One employee could manage either one or no departments
01 11
One-to-Many RelationshipRelationship Leadsbetween Employeeand Project
The multiplicity of the relationshipo One staff may Lead one or more project(s)o One project is Lead by one employee
0* 11
Many-to-Many RelationshipRelationship Teachesbetween Instructor and Course
The multiplicity of the relationshipo One Instructor Teaches one or more Course(s)o One Course is thought by zero or more instructor(s)
1* 0*
2. Participation of an Entity Set in a Relationship SetTotal participation (indicated by double line): every entity in the entity set participates in at
least one relationship in the relationship set. The entity with total participation will be connected
with the relationship using a double line.
Employee DepartmentManages
Employee ProjectLeads
Instructor CourseTeaches
8/13/2019 Database Lecture Note
41/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 13
E.g. 1: Participation of EMPLOYEE in belongs to relationship with DEPARTMENT is total
since every employee should belong to a department.
E.g. 2: Participation of EMPLOYEE in manages relationship with DEPARTMENT,
DEPARTMENT will have total participation but not EMPLOYEE
Partial participation: some entities may not participate in any relationship in the relationship
set
E.g. 1: Participation of EMPLOYEE in manages relationship with DEPARTMENT,
EMPLOYEE will have partial participation since not all employees are managers.
8/13/2019 Database Lecture Note
42/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 14
Attributes of Relationship Types
Relationship types can also have attributes, similar to those of entity types. For example, to
record the number of hours per week that an employee works on a particular project, we can
include an attribute Hours for the WORKS_ON relationship type. Another example is to include
the date on which a manager started managing a department via an attribute StartDate for the
MANAGES relationship type.
Notice that attributes of 1:1 or 1:M relationship types can be migrated to one of the participating
entity types. For example, the StartDate attribute for the MANAGES relationship can be an
attribute of either EMPLOYEE or DEPARTMENT, although conceptually it belongs to
8/13/2019 Database Lecture Note
43/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 15
MANAGES. This is because MANAGES is a 1:1 relationship, so every department or employee
entity participates in at most one relationship instance. Hence, the value of the StartDate attribute
can be determined separately, either by the participating department entity or by the participating
employee (manager) entity.
Weak Entity Types
Entity types that do not have key attributes of their own are called weakentitytypes. In contrast,
regular entity types that do have a key attribute are called strongentitytypes. Entities belonging
to a weak entity type are identified by being related to specific entities from another entity type
in combination with one of their attribute values.
A weak entity type always has a total participation constraint (existence dependency) with
respect to its identifying relationship, because a weak entity cannot be identified without an
owner entity. However, not every existence dependency results in a weak entity type. For
example, a DRIVER_LICENSE entity cannot exist unless it is related to a PERSON entity, even
though it has its own key (LicenseNumber) and hence is not a weak entity.
Consider the entity type DEPENDENT, related to EMPLOYEE, which is used to keep track of
the dependents of each employee via a l:N relationship (Figure 3.2). The attributes of
DEPENDENT are Name (the first name of the dependent), BirthDate, Sex, and Relationship (to
the employee). Two dependents of two distinct employees may, by chance, have the same values
for Name, BirthDate, Sex, and Relationship, but they are still distinct entities. They are identified
as distinct entities only after determining the particular employee entity to which each dependent
is related. Each employee entity is said to own the dependent entities that are related to it. A
weak entity type normally has a partial key, which is the set of attributes that can uniquely
identify weak entities that are related to the same owner entity. In our example, if we assume that
no two dependents of the same employee ever have the same first name, the attribute Name of
DEPENDENT is the partial key. In the worst case, a composite attribute of all the weak entity's
attributes will be the partial key.
8/13/2019 Database Lecture Note
44/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 16
ER-to-Relational Mapping Algorithm
Step 1:Mapping of Regular Entity TypesFor each regular (strong) entity type E in the ER schema, create a relation R that includes all the
simple attributes of E. Include only the simple component attributes of a composite attribute.
Choose one of the key attributes of E as primary key for R. If the chosen key of E is composite,
the set of simple attributes that form it will together form the primary key of R.
Example: For the company database ER (Figure 3.2)
o regular entities are:EMPLOYEE, DEPARTMENT, and PROJ ECTo The foreign key and relationship attributes, if any, are not included yet; they will
be added during subsequent steps. These include the attributes SUPERSSN and
DNO of EMPLOYEE, MGRSSN and MGRSTARTDATE of DEPARTMENT,
and DNUM of PROJECT.
o In our example, we choose empid, DNUMBER, and PNUMBER as primary keysfor the relations EMPLOYEE, DEPARTMENT, and PROJECT,
Step 2: Mapping of Weak Entity TypesFor each weak entity W in the ER schema with owner entity type E, create a relation R and
include all simple attributes (or simple components of composite attributes) of W as attributes of
R. In addition, include as foreign key attributes of R the primary key attributes) of the relations)
that correspond to the owner entity types): this takes care of the identifying relationship type of
W. The primary key of R is the combination of the primary keys of the owners and the partial
key of the weak entity type W, if any.
In our example, we create the relation DEPENDENT in this step to correspond to theweak entity type DEPENDENT.
We include the primary key empId of the EMPLOYEE relation which corresponds to theowner entity type-as a foreign key attribute of DEPENDENT;
The primary key of the DEPENDENT relation is the combination {empId,DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of
DEPENDENT.
8/13/2019 Database Lecture Note
45/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 17
Step 3: Mapping of Binary 1:1 Relationship Types.For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that
correspond to the entity types participating in R. Choose one of the relations-S, say-and include
as a foreign key in S the primary key of T. It is better to choose an entity type with total
participation in R in the role of S. Include all the simple attributes (or simple components of
composite attributes) of the 1:1 relationship type R as attributes of S.
In our example, we map the 1:1 relationship type MANAGES by choosing theparticipating entity type DEPARTMENT to serve in the role of S, because its
participation in the MANAGES relationship type is total (every department has a
manager). We include the primary key of the EMPLOYEE relation as foreign key in the
DEPARTMENT relation and rename it MGRSSN. We also include the simple attributeSTARTDATE of the MANAGES relationship type in the DEPARTMENT relation and
rename it MGRSTARTDATE.
Note that it is possible to include the primary key of S as a foreign key in T instead. Inour example, this amounts to having a foreign key attribute, say
DEPARTMENT_MANAGED in the EMPLOYEE relation, but it will have a null value
for employee tuples who do not manage a department.
Step 4: Mapping of Binary 1:N Relationship Types.For each regular binary 1:N relationship type R, identify the relation S that represents the
participating entity type at the N-side of the relationship type. Include as foreign key in S the
primary key of the relation T that represents the other entity type participating in R; this is done
because each entity instance on the N-side is related to at most one entity instance on the 1-side
of the relationship type. Include any simple attributes (or simple components of composite
attributes) of the 1:N relationship type as attributes of S.
In our example, we now map the 1:N relationship types WORKS_FOR, CONTROLS,and SUPERVISION .
For WORKS_FOR we include the primary key DNUMBER of the DEPARTMENTrelation as foreign key in the EMPLOYEE relation and call it DNO.
8/13/2019 Database Lecture Note
46/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 18
For SUPERVISION we include the primary key of the EMPLOYEE relation as foreignkey in the EMPLOYEE relation itself because the relationship is recursive-and call it
SUPERSSN.
The CONTROLS relationship is mapped to the foreign key attribute DNUM ofPROJECT, which references the primary key DNUMBER of the DEPARTMENT
relation.
Step 5: Mapping of Binary M:N Relationship Types.For each binary M:N relationship type R, create a new relation S to represent R. Include as
foreign key attributes in S the primary keys of the relations that represent the participating entity
types; their combination will form the primary key of S. Also include any simple attributes of the
M:N relationship type (or simple components of composite attributes) as attributes of S. Notice
that we cannot represent an M:N relationship type by a single foreign key attribute in one of the
participating relations (as we did for 1:1 or I:N relationship types) because of the M:N
cardinality ratio; we must create a separate relationship relationS.
In our example, we map the M:N relationship type WORKS_ON by creating therelation WORKS_ON.
We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keysin WORKS_ON and rename them PNO and ESSN, respectively.
We also include an attribute HOURS in WORKS_ON to represent the HOURS attributeof the relationship type. The primary key of the WORKS_ON relation is the combination
of the foreign key attributes {ESSN, PNO}.
Step 6: Mapping of Multivalued Attributes.For each multivalued attribute A, create a new relation R. This relation R will include an
attribute corresponding to A, plus the primary key attribute K-as a foreign key in R-of the
relation that represents the entity type or relationship type that has A as an attribute. The primary
key of R is the combination of A and K. If the multivalued attribute is composite, we include its
simple components.
8/13/2019 Database Lecture Note
47/106
Fundamental of Database Systems: Relational Data Model & ER-Model
Page 19
In our example, we create a relation DEPT_LOCATIONS. The attribute DLOCATIONrepresents the multivalued attribute LOCATIONS of DEPARTMENT, while
DNUMBER-as foreign key represents the primary key of the DEPARTMENT relation.
The primary key of DEPT_LOCATIONS is the combination of {DNUMBER,
DLOCATION}. A separate tuple will exist in DEPT_LOCATIONS for each location that
a department has.
Step 7: Mapping of N-ary Relationship Types.For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as
foreign key attributes in S the primary keys of the relations that represent the participating entity
types. Also include any simple attributes of the n-ary relationship type (or simple components of
composite attributes) as attributes of S. The primary key of S is usually a combination of all theforeign keys that reference the relations representing the participating entity types.
For example, consider the relationship type SUPPLY of the Figure above. This can be mapped to
the relation SUPPLY shown in below; whose primary key is the combination of the three foreign
keys {SNAME, PARTNO, PROJNAME}.
8/13/2019 Database Lecture Note
48/106
Fundamental of Database Systems: Functional Dependency and Normalization
Page 1
CHAPTER FOUR
Functional Dependency and Normalization
So far, we have assumed that attributes are grouped to form a relation schema by using the common sense
of the database designer or by mapping a database schema design from a conceptual data model such as
the ER or some other conceptual data model. These models make the designer identify entity types and
relationship types and their respective attributes, which leads to a natural and logical grouping of the
attributes into relations when the mapping procedures in Chapter 3 are followed. However, we still need
some formal measure of why one grouping of attributes into a relation schema may be better than another.
So far in our discussion of conceptual design and its mapping into the relational model, we have not
developed any measure of appropriateness or "goodness" to measure the quality of the design, other than
the intuition of the designer. In this chapter we discuss some of the theory that has been developed with
the goal of evaluating relational schemas for design quality-that is, to measure formally why one set of
groupings of attributes into relation schemas is better than another.
In this chapter, the concept of functional dependency, a formal constraint among attributes that is the
main tool for formally measuring the appropriateness of attribute groupings into relation schemas is
defined. We show how functional dependencies can be used to group attributes into relation schemas that
are in a normal form. A relation schema is in a normal form when it satisfies certain desirable properties.
The process of normalization consists of analyzing relations to meet increasingly more stringent normal
forms leading to progressively better groupings of attributes. Normal forms are specified in terms offunctional dependencies-which are identified by the database designer-and key attributes of relation
schemas.
Informal Measures to goodness of a Relation
The ease with which the meaning of a relation's attributes can be explained is an informal measure of how
well the relation is designed.
GUIDELINE 1. Design a relation schema so that it is easy to explain its meaning. Do not combine
attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation
schema corresponds to one entity type or one relationship type, it is straightforward to explain its
meaning. Otherwise, if the relation corresponds to a mixture of multiple entities and relationships,
semantic ambiguities will result and the relation cannot be easily explained.
8/13/2019 Database Lecture Note
49/106
Fundamental of Database Systems: Functional Dependency and Normalization
Page 2
GUIDELINE 2. Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that
the programs that update the database will operate correctly.
GUIDELINE 3. As far as possible, avoid placing attributes in a base relation whose values may
frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases only and do
not apply to a majority of tuples in the relation.
Normalization
A relational database is merely a collection of data, organized in a particular manner. As the father of the
relational database approach, Codd created a series of rules called normal forms that help define that
organization.
Database normalization is a series of steps followed to obtain a database design that allows for consistent
storage and efficient access of data in a relational database. These steps reduce data redundancy and the
risk of data becoming inconsistent.
Normalization is the process of identifying the logical associations between data items and designing a
database that will represent such associations but without suffering the update anomalies which are;
Insertion Anomalies, Deletion Anomalies and Modification Anomalies.
All the normalization rules will eventually remove the update anomalies that may exist during data
manipulation after the implementation. The underlying ideas in normalization are simple enough.
Through normalization we want to design for our relational database a set of tables that;
Contain all the data necessary for the purposes that the database is to serve, Have as little redundancy as possible, Accommodate multiple values for types of data that require them, Permit efficient updates of the data in the database, and Avoid the danger of losing data unknowingly
Normalization may reduce system performance since data will be cross referenced from many tables.
Thus denormalization is sometimes used to improve performance, at the cost of reduced consistency
guarantees.
8/13/2019 Database Lecture Note
50/106
Fundamental of Database Systems: Functional Dependency and Normalization
Page 3
Drawbacks of Normalization
Requires data to see the problems May reduce performance of the system
Is time consuming, Difficult to design and apply and Prone to human error
The type of problems that could occur in insufficiently normalized table is called update anomalies which
includes;
1. Insertion anomaliesAn "insertion anomaly" is a failure to place information about a new database entry into all the places in
the database where information about that new entry needs to be stored. In a properly normalized
database, information about a new entry needs to be inserted into only one place in the database; in an
inadequately normalized database, information about a new entry may need to be inserted into more than
one place and, human fallibility being what it is, some of the needed additional insertions may be missed.
2. Deletion anomaliesA "deletion anomaly" is a failure to remove information about an existing database entry when it is time
to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of
entry needs to be deleted from only one place in the database; in an inadequately normalized database,
information about that old entry may need to be deleted from more than one place, and, human fallibility
being what it is, some of the needed additional deletions may be missed.
3. Modification anomaliesA modification of a database involves changing some value of the attribute of a table. In a properly
normalized database table, whatever information is modified by the user, the change will be effected and
used accordingly.
The purpose of normalization is to reduce the chances for anomalies to occur in a database.
8/13/2019 Database Lecture Note
51/106
Fundamental of Database Systems: Functional Dependency and Normalization
Page 4
Example of problems related with Anomalies
EmpID FName LName SkillID Skill SkillType School SchoolAddress Skill
Level
12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede
Top Related