Download - Database Lecture Note

8/13/2019 Database Lecture Note

1/106

AAU

2006

Database

Instructor Seble

M.T

W W W . A A U . E D U . E T


2/106

Fundamentals of Database


3/106

Chapter One: Introduction to Database Systems

AAU, Computer Science Department, 2013 Page 1

CHAPTER ONE

Database System

Database systems are designed to manage large data set in an organization. The data

management involves both definition and the manipulation of the data which ranges from

simple representation of the data to considerations of structures for the storage of information.

The data management also consider the provision of mechanisms for the manipulation of

information.

Today, Databases are essential to every business. They are used to maintain internal records,

to present data to customers and clients on the World-Wide-Web, and to support many other

commercial processes. Databases are likewise found at the core of many modern

organizations.

The power of databases comes from a body of knowledge and technology that has developed

over several decades and is embodied in specialized software called a database management

system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of

data efficiently and allowing it to persist over long periods of time, safely. These systems are

among the most complex types of software available.

Thus, for our question: What is a database? In essence a database is nothing more than a

collection of shared information that exists over a long period of time, often many years. Incommon dialect, the term database refers to a collection of data that is managed by a DBMS.

Thus the DB course is about:

How to organize data

Supporting multiple users

Efficient and effective data retrieval

Secured and reliable storage of data

Maintaining consistent data Making information useful for decision making

Data management passes through the different levels of development along with the

development in technology and services. These levels could best be described by categorizing

the levels into three levels of development. Even though there is an advantage and a problem

overcome at each new level, all methods of data handling are in use to some extent. The major

three levels are;


4/106



1. Manual Approach

2. Traditional File Based Approach

3. Database Approach

1. Manual ApproachIn the manual approach, data storage and retrieval follows the primitive and traditional way

of information handling where cards and paper are used for the purpose.

Files for as many event and objects as the organization has are used to store

information.

Each of the files containing various kinds of information is labeled and stored in one or

more cabinets.

The cabinets could be kept in safe places for security purpose based on the sensitivity

of the information contained in it.

Insertion and retrieval is done by searching first for the right cabinet then for the right

the file then the information.

One could have an indexing system to facilitate access to the data

Limitations of the Manual approach

Prone to error Difficult to update, retrieve, integrate

You have the data but it is difficult to compile the information

Limited to small size information

Cross referencing is difficult

An alternative approach of data handling is a computerized way of dealing with the

information. The computerized approach could also be either decentralized or centralizedbase on where the data resides in the system.

2. Traditional File Based Approach

After the introduction of Computer for data processing to the business community, the need

to use the device for data storage and processing increase. There were, and still are, several


5/106



computer applications with file based processing used for the purpose of data handling. Even

though the approach evolved over time, the basic structure is still similar if not identical.

File based systems were an early attempt to computerize the manual filing system.

This approach is the decentralized computerized data handling method.

A collection of application programs perform services for the end-users. In such

systems, every application program that provides service to end users, define and

manage its own data.

Such systems have number of programs for each of the different applications in the

organization.

Since every application defines and manages its own data, the system is subjected to

serious data duplication problem.

File, in traditional file based approach, is a collection of records which contains

logically related data.


6/106



Limitations of the Traditional File Based approach

As business application become more complex demanding more flexible and reliable data

handling methods, the shortcomings of the file based system became evident. These

shortcomings include, but not limited to:

Separation or Isolation of Data: Available information in one application may not be

known.

Limited data sharing

Lengthy development and maintenance time

Duplication or redundancy of data

Data dependency on the application

Incompatible file formats between different applications and programs creating

inconsistency.

Fixed query processing which is defined during application development

The limitations for the traditional file based data handling approach arise from two basic

reasons.

1. Definition of the data is embedded in the application program which makes it

difficult to modify the database definition easily.

2. No control over the access and manipulation of the data beyond that imposed by

the application programs.

The most significant problem experienced by the traditional file based approach of data

handling is the update anomalies. We have three types of update anomalies;

1. Modification Anomalies: a problem experienced when one ore more data value is

modified on one application program but not on others containing the same data set.

2. Deletion Anomalies: a problem encountered where one record set is deleted from one

application but remain untouched in other application programs.

3. Insertion Anomalies: a problem encountered where one cannot decide whether the data

to be inserted is valid and consistent with other similar data set.


7/106



3. Database Approach

Following a famous paper written by Ted Codd in 1970, database systems changed

significantly. Codd proposed that database systems should present the user with a view of

data organized as tables called relations. Behind the scenes, there might be a complex data

structure that allowed rapid response to a variety of queries. But, unlike the user of earlier

database systems, the user of a relational system would not be concerned with the storage

structure. Queries could be expressed in a very high-level language, which greatly increased

the efficiency of database programmers. The database approach emphasizes the integration

and sharing of data throughout the organization.

Thus in Database Approach:

Database is just a computerized record keeping system or a kind of electronic filing

cabinet.

Database is a repository for collection of computerized data files.

Database is a shared collection of logically related data designed to meet the information

needs of an organization. Since it is a shared corporate resource, the database is

integrated with minimum amount of or no duplication.

Database is a collection of logically related data where these logically related data

comprises entities, attributes, relationships, and business rules of an organization's

information.

In addition to containing data required by an organization, database also contains a

description of the data which called as Metadata or Data Dictionary or Systems

Catalogue or Data about Data.

Since a database contains information about the data (metadata), it is called a self

descriptive collection on integrated records.

The purpose of a database is to store information and to allow users to retrieve and

update that information on demand.

Database is deigned once and used simultaneously by many users.

Unlike the traditional file based approach in database approach there is program data

independence. That is the separation of the data definition from the application. Thus the

application is not affected by changes made in the data structure and file organization.

Each database application will perform the combination of: Creating database, Reading,

Updating and Deleting data.


8/106



Benefits of the database approach

Data can be shared: two or more users can access and use same data instead of storing

data in redundant manner for each user.

Improved accessibility of data: by using structured query languages, the users can easily

access data without programming experience.

Redundancy can be reduced: isolated data is integrated in database to decrease the

redundant data stored at different applications.

Quality data can be maintained: the different integrity constraints in the database

approach will maintain the quality leading to better decision making

Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the

data in the database to some extent.

Transaction support can be provided: basic demands of any transaction support systems

are implanted in a full scale DBMS.

Integrity can be maintained: data at different applications will be integrated together with

additional constraints to facilitate shared data resource.

Security majors can be enforced: the shared data can be secured by having different levels

of clearance and other data security mechanisms.

Improved decision support: the database will provide information useful for decision

making.

Standards can be enforced: the different ways of using and dealing with data by different

unite of an organization can be balanced and standardized by using database

approach.

Compactness: since it is an electronic data handling method, the data is stored compactly

(no voluminous papers).

Speed: data storage and retrieval is fast as it will be using the modern fast computer

systems.

Less labour: unlike the other data handling methods, data maintenance will not demand

much resource.

Centralized information control: since relevant data in the organization will be stored at

one repository, it can be controlled and managed at the central level.


9/106



Limitations and risk of Database Approach

Introduction of new professional and specialized personnel.

Complexity in designing and managing data

Te cost and risk during conversion from the old to the new system

High cost incurred to develop and maintain

Complex backup and recover services from the users perspective

Reduced performance due to centralization

High impact on the system when failure occur


10/106



Database Management System (DBMS)

Database Management System (DBMS) is a Software package used for providing EFFICIENT,

CONVENIENT and SAFE MULTI-USER (many people/programs accessing same database, or even same

data, simultaneously) storage of and access to MASSIVE amounts of PERSISTENT (data outlivesprograms that operate on it) data. A DBMS also provides a systematic method for creating,

updating, storing, retrieving data in a database. DBMS also provides the service of controlling

data access, enforcing data integrity, managing concurrency control, and recovery. Having

this in mind, a full scale DBMS should at least have the following services to provide

to the user.

1. Data storage, retrieval and update in the database

2. A user accessible catalogue

3. Transaction support service: ALL or NONE transaction, which minimize data

inconsistency.

4. Concurrency Control Services: access and update on the database by different

users simultaneously should be implemented correctly.

5.Recovery Services: a mechanism for recovering the database after a failure must

be available.6. Authorization Services (Security): must support the implementation of access

and authorization service to database administrator and users.

7. Support for Data Communication: should provide the facility to integrate with

data transfer software or data communication managers.

8.Integrity Services: rules about data and the change that took place on the data,

correctness and consistency of stored data, and quality of data based on

business constraints.

9. Services to promote data independency between the data and the application

10. Utility services: sets of utility service facilities like

Importing data Statistical analysis support Index reorganization Garbage collection


11/106



DBMS and Components of DBMS Environment

A DBMS is software package used to design, manage, and maintain databases. It provides the

following facilities:

Data Definition Language (DDL):

o Language used to define each data element required by the organization.

o Commands for setting up schema of database

Data Manipulation Language (DML):

o Language used by end-users and programmers to store, retrieve, and access the

data e.g. SQL

o Also called "query language"

Data Dictionary: tool used to store and organize information about the data

The DBMS is software that helps to design, handle, and use data using the database approach.

Taking a DBMS as a system, one can describe it with respect to it environment or other

systems interacting with the DBMS. The DBMS environment has five components.

1. Hardware: Components that are comprised of personal computers, mainframe or

any server computers, network infrastructure, etc.

2. Software: those components like the DBMS software, application programs,

operating systems, network software, and other relevant software.

3. Data: This is the most important component to the user of the database. There are two

types of data in a database approach that is Operational andMetadata.

The structure of the data in the database is called the schema, which is composed of the

Entities, Properties of entities, and relationship between entities.

4. Procedure: this is the rules and regulations on how to design and use a database. It

includes procedures like how to log on to the DBMS, how to use facilities, how to start

and stop transaction, how to make backup, how to treat hardware and software failure,

how to change the structure of the database.

5. People: people in the organization responsible to designing, implement, manage,

administer and use of the database.


12/106



Database Development Life Cycle

As it is one component in most information system development tasks, there are several steps

in designing a database system. Here more emphasis is given to the design phases of the

system development life cycle. The major steps in database design are;

1. Planning: that is identifying information gap in an organization and propose a

database solution to solve the problem.

2. Analysis: that concentrates more on fact finding about the problem or the

opportunity. Feasibility analysis, requirement determination and structuring, and

selection of best design method are also performed at this phase.

3. Design: in database designing more emphasis is given to this phase. The phase is

further divided into three sub-phases.

a. Conceptual Design: concise description of the data, data type, relationship

between data and constraints on the data.

There is no implementation or physical detail consideration.

Used to elicit and structure all information requirements

b. Logical Design: a higher level conceptual abstraction with selected specific data

model to implement the data structure.

It is particular DBMS independent and with no other physical

considerations.

c. Physical Design: physical implementation of the upper level design of the

database with respect to internal storage and file structure of the database for

the selected DBMS.

To develop all technology and organizational specification.

4. Implementation: the testing and deployment of the designed database for use.

5. Operation and Support: administering and maintaining the operation of the

database system and providing support to users.


13/106



Roles in Database Design and Use

As people are one of the components in DBMS environment, there are group of roles played

by different stakeholders of the designing and operation of a database system.

1. DataBase Administrator (DBA) Responsible to oversee, control and manage the database resources (the database itself,

the DBMS and other related software)

Authorizing access to the database

Coordinating and monitoring the use of the database

Responsible for determining and acquiring hardware and software resources

Accountable for problems like poor security, poor performance of the system

Involves in all steps of database developmentWe can have further classifications of this role in big organizations having huge amount of

data and user requirement.

1. Data Administrator (DA): is responsible on management of data resources.

Involves in database planning, development, maintenance of standards policies

and procedures at the conceptual and logical design phases.

2. DataBase Administrator (DBA): is more technically oriented role. Responsible

for the physical realization of the database. Involves in physical design,

implementation, security and integrity control of the database.

2. DataBase Designer (DBD)

Identifies the data to be stored and choose the appropriate structures to represent and

store the data.

Should understand the user requirement and should choose how the user views the

database.

Involve on the design phase before the implementation of the database system.

We have two distinctions of database designers, one involving in the logical and conceptual

design and another involving in physical design.


14/106



1. Logical and Conceptual DBD

Identifies data (entity, attributes and relationship) relevant to the

organization

Identifies constraints on each data

Understand data and business rules in the organization

Sees the database independent of any data model at conceptual level and

consider one specific data model at logical design phase.

2. Physical DBD

Take logical design specification as input and decide how it should be

physically realized.

Map the logical data model on the specified DBMS with respect to tables and

integrity constraints. (DBMS dependent designing)

Select specific storage structure and access path to the database

Design security measures required on the database

3. Application Programmer and Systems Analyst

System analyst determines the user requirement and how the user wants to view

the database.

The application programmer implements these specifications as programs; code,

test, debug, document and maintain the application program.

Determines the interface on how to retrieve, insert, update and delete data in the

database.

The application could use any high level programming language according to the

availability, the facility and the required service.

4. End Users

Workers, whose job requires accessing the database frequently for various purpose. There are

different group of users in this category.

1. Nave Users:

Sizable proportion of users

Unaware of the DBMS


15/106



Only access the database based on their access level and demand

Use standard and pre-specified types of queries.

2. Sophisticated Users

Are users familiar with the structure of the Database and facilities of the

DBMS.

Have complex requirements

Have higher level queries

Are most of the time engineers, scientists, business analysts, etc

3. Casual Users

Users who access the database occasionally.

Need different information from the database each time.

Use sophisticated database queries to satisfy their needs.

Are most of the time middle to high level managers.

These users can be again classified as Actors on the Scene and Workers Behind the Scene.

Actors On the Scene:

Data Administrator

Database Administrator

Database Designer

End Users

Workers Behind the Scene

DBMS designers and implementers: who design and implement different DBMS

software.

Tool Developers: experts who develop software packages that facilitates databasesystem designing and use. Prototype, simulation, code generator developers could be

an example. Independent software vendors could also be categorized in this group.

Operators and Maintenance Personnel: system administrators who are responsible

for actually running and maintaining the hardware and software of the database

system and the information technology facilities.


16/106

Fundamental of Database Systems: Data Models and Database System Architecture

Page 1

CHAPTER TWO

DATABASE SYSTEM ARCHITECTURE

Data Models, Schemas, and Instances

One fundamental characteristic of the database approach is that it provides some level of data

abstraction by hiding details of data storage that are not needed by most database users. A data

model is a collection of concepts that can be used to describe the structure of a database and also

it provides the necessary means to achieve this abstraction. By structure of a database, we mean

the data types, relationships, and constraints that should hold for the data. Most data models also

include a set of basic operations for specifying retrievals and updates on the database.

In addition to the basic operations provided by the data model, it is becoming more common to

include concepts in the data model to specify the dynamic aspect or behavior of a database

application. This allows the database designer to specify a set of valid user defined operations

that arc allowed on the database objects. An example of a user-defined operation could be

COMPUTE_GPA, which can be applied to a STUDENT object. On the other hand, generic

operations to insert, delete, modify, or retrieve any kind of object are often included in the basic

data model operations.

Categories of Data Models

Many data models have been proposed, which we can categorize according to the types of

concepts they use to describe the database structure. High-level or conceptual data models

provide concepts that are close to the way many users perceive data, whereas low-level or

physical data models provide concepts that describe the details of how data is stored in the

computer. Concepts provided by low-level data models are generally meant for computer

specialists, not for typical end users. Between these two extremes is a class of representational

(or implementation) data models, which provide concepts that may be understood by end users

but that are not too far removed from the way data is organized within the computer.

Representational data models hide some details of data storage but can be implemented on a

computer system in a direct way.

Conceptual data models use concepts such as entities, attributes, and relationships. An entity

represents a real-world object or concept, such as an employee or a project that is described in


17/106


Page 2

the database. An attribute represents some property of interest that further describes an entity,

such as the employee's name or salary. A relationship among two or more entities represents an

association among two or more entities, for example, a works-on relationship between an

employee and a project.

Representational or implementation data models are the models used most frequently in

traditional commercial DBMSs. These include the widely used relational data model, as well as

the so-called legacy data models-the network and hierarchical models-that have been widely

used in the past. Representational data models represent data by using record structures and

hence are sometimes called record-based data models. We can regard object data models as a

new family of higher-level implementation data models that are closer to conceptual data

models.

FIGURE 1.2 A database that stores student and course information.


18/106


Page 3

Hierarchical Data Model

In the hierarchical data model, information is organized as a collection of inverted trees of

records. The inverted trees may be of arbitrary depth. The record at the root of a tree has zero or

more child records; the child records, in turn, serve as parent records for their immediate

descendants. This parent-child relationship recursively continues down the tree.

The records consists of fields, where each field may contain simple data values (e.g. integer, real,

text)., or a pointer to a record. The pointer graph is not allowed to contain cycles. Some

combinations of fields may form the key for a record relative to its parent.

Applications can navigate a hierarchical database by starting at a root and successively navigate

downward from parent to children until the desired record is found. Applications can interleave

parent-child navigation with traversal of pointers. Searching down a hierarchical tree is very fast

since the storage layer for hierarchical databases use contiguous storage for hierarchical

structures. All other types of queries require sequential search techniques.

The hierarchical data model is impoverished for expressing complex information models. Often

a natural hierarchy does not exist and it is awkward to impose a parent-child relationship.

Pointers partially compensate for this weakness, but it is still difficult to specify suitable

hierarchical schemas for large models.

Hierarchical Data Model Summary

The simplest data model

Record type is referred to as node or segment

The top node is the root node

Nodes are arranged in a hierarchical structure as sort of upside-down tree

A parent node can have more than one child node

A child node can only have one parent node

The relationship between parent and child is one-to-many

Relation is established by creating physical link between stored records (each is stored with a

predefined access path to other records)

Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in

manufacturing, personnel organization in companies


19/106


Page 4

Fig. Example of Hierarchical data model

Network Data Model

In the network data model, information is organized as a collection of graphs of record that are

related with pointers. Network data models represent data in a symmetric manner, unlike the

hierarchical data model (distinction between a parent and a child). A network data model is more

flexible than a hierarchical data model and still permits efficient navigation. The records consists of lists of fields (fixed or variable length with maximum length), where

each field contains a simple value (fixed or variable size). The network data model also

introduces the notion of indexes of fields and records, sets of pointers, and physical placement of

records.

Network Data Model Summary

Allows record types to have more that one parent unlike hierarchical model

A network data models sees records as set members

Each set has an owner and one or more members

Allow no many to many relationship between entities

Like hierarchical model network model is a collection of physically linked records.

Allow member records to have more than one owner

Database contains a complex array of pointers that thread through a set of records.


20/106


Page 5

Fig. Many-to-many relationship defined using Network data type

Relational Data Model

In the relational data model, information is organized in relations (two-dimensional tables). Each

relation contain a set of tuples (records). Each tuple contain a number of fields. A field may

contain a simple value (fixed or variable size) from some domain (e.g. integer, real, text, etc.).

The relational data model is based on a mathematical foundation, called relational algebra. This

mathematical foundation is the cornerstone to some of the very attractive properties of relationaldatabases, since it first of all offers data independence, and offers a mathematical framework for

many of the optimizations possible in relational databases (e.g. query optimization).

Relational Data Model Summary

Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational Model for

Large Shared Data Banks')

Viewed as a collection of tables called Relations equivalent to collection of record

types

Stores information or data in the form of tables .. rows and columns

A row of the table is called tuple.. equivalent to record

A column of a table is called attribute.. equivalent to field

The tables seem to be independent but are related some how.

Conducts searches by using data in specified columns of one table to find additional data

in another table

In conducting searches, a relational database matches information from a field in one

table with information in a corresponding field of another table to produce a third table

that combines requested data from both tables

Can define more flexible and complex relationship


21/106


Page 6

Schemas, Instances, and Database State

In any data model, it is important to distinguish between the descriptions of the database and the

database itself. The description of a database is called the database schema, which is specified

during database design and is not expected to change frequently. Most data models have certainconventions for displaying schemas as diagrams. A displayed schema is called a schema

diagram. Figure 2.1 shows a schema diagram for the database shown in Figure 1.2; the diagram

displays the structure of each record type but not the actual instances of records. We call each

object in the schema-such as STUDENT or COURSE-a schemaconstruct.

A schema diagram displays only some aspects of a schema, such as the names of record types

and data items, and some types of constraints. Other aspects are not specified in the schema

diagram; for example, Figure 2.1 shows neither the data type of each data item nor the

relationships among the various files. Many types of constraints are not represented in schema

diagrams. A constraint such as "students majoring in computer science must take CS1310 before

the end of their sophomore year" is quite difficult to represent.

The actual data in a database may change quite frequently. For example, the database shown in

Figure 1.2 changes every time we add a student or enter a new grade for a student. The data in

the database at a particular moment in time is called a database state or snapshot. It is also

called the current set of occurrences or instancesin the database. In a given database state, each

schema construct has its own current set of instances; for example, the STUDENT construct will

contain the set of individual student entities (records) as its instances. Many database states can

be constructed to correspond to a particular database schema. Every time we insert or delete a

record or change the value of a data item in a record, we change one state of the database into

another state.

The distinction between database schema and database state is very important. When we define a

new database, we specify its database schema only to the DBMS. At this point, the

corresponding database state is the empty state with no data. We get the initial state of the

databasewhen the database is first populated or loaded with the initial data. From then on, every

time an update operation is applied to the database, we get another database state. At any point in

time, the database has a current state. The DBMS is partly responsible for ensuring that every

state of the database is a valid state-that is, a state that satisfies the structure and constraints


22/106


Page 7

specified in the schema. Hence, specifying a correct schema to the DBMS is extremely

important, and the schema must be designed with the utmost care. The DBMS stores the

descriptions of the schema constructs and constraints-also called the meta-data-in the DBMS

catalog so that DBMS software can refer to the schema whenever it needs to. The schema is

sometimes called the intension, and a database state an extensionof the schema.

Although, as mentioned earlier, the schema is not supposed to change frequently, it is not

uncommon that changes need to be occasionally applied to the schema as the application

requirements change. For example, we may decide that another data item needs to be stored for

each record in a file, such as adding the DateOfBirth to the STUDENT schema in Figure 2.1.

This is known as schemaevolution. Most modern DBMSs include some operations for schema

evolution that can be applied while the database is operational.

STUDENT

CourseName CourseNumber CreditHours Department

COURSE

SectionIdentifier CourseNumber Semester Year Instructor

SECTION

FIGURE 2.1 Schema diagram for the database in Figure 1.2.

Name StudentNumber Class Major


23/106


Page 8

THREE-SCHEMA ARCHITECTURE AND DATA INDEPENDENCE

Three of the four important characteristics of the database approach, are (1) insulation of

program; and data (program-data and program-operation independence), (2) support of multiple

user views, and (3) use of a catalog to store the database description (schema). In this section wespecify an architecture for database systems, called the three-schema architecture also known

as the ANSI/SPARC architecture, that was proposed to help achieve and visualize these

characteristics. We then further discuss the concept of data independence.

The Three-Schema Architecture

The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user

applications and the physical database. In this architecture, schemas can be defined at the

following three levels:

Internal Level

The internal level has an internal schema, which describes the physical storage structure of the

database. The internal schema uses a physical data model and describes the complete details of

data storage and access paths for the database.

Conceptual Level

The conceptual level has a conceptual schema, which describes the structure of the whole

database for a community of users. The conceptual schema hides the details of physical storage

structures and concentrates on describing entities, data types, relationships, user operations, and

constraints. Usually, a representational data model is used to describe the conceptual schema

when a database system is implemented. This implementation conceptual schema is often based

on a conceptual schema design in a high-level data model.

External Level

The external or view level includes a number of external schemas or user views. Each external

schema describes the part of the database that a particular user group is interested in and hides

the rest of the database from that user group. As in the previous case, each external schema is

typically implemented using a representational data model, possibly based on an external schema

design in a high level data model.


24/106


Page 9

The three-schema architecture is a convenient tool with which the user can visualize the schema

levels in a database system. Most DBMSs do not separate the three levels completely, but

support the three-schema architecture to some extent. Some DBMSs may include physical-level

details in the conceptual schema. In most DBMSs that support user views, external schemas are

specified in the same data model that describes the conceptual-level information. Some DBMSs

allow different data models to be used at the conceptual and external levels.

Notice that the three schemas are only descriptions of data; the only data that actually exists is at

the physical level. In a DBMS based on the three-schema architecture, each user group refers

only to its own external schema. Hence, the DBMS must transform a request specified on an

external schema into a request against the conceptual schema, and then into a request on the

internal schema for processing over the stored database. If the request is a database retrieval, the

data extracted from the stored database must be reformatted to match the user's external view.

The processes of transforming requests and results between levels are called mappings. These

mappings may be time-consuming, so some DBMSs-especially those that are meant to support

small databases-do not support external views. Even in such systems, however, a certain amount

of mapping is necessary to transform requests between the conceptual and internal levels.

ANSI-SPARC Three-level Architecture


25/106


Page 10

External View 1

Staff_No FName Salary

External View 2

FName DOB

Conceptual Level

Staff_No FName LName DOB Salary Branch_No

I nternal Level

struct STAFF{

int Staff_No;

int Branch_No;

char FName[15];char LName[15];

struct DOB;

float salary;

}

Fig. Differences between Three Levels of ANSI-SPARC Architecture

The ANSI-SPARC Architecture Defines DBMS schemas at three levels:

Internal schema

o at the internal level to describe physical storage structures and access paths.

Typically uses aphysical data model.

Conceptual schema

o at the conceptual level to describe the structure and constraints for the whole

database for a community of users. Uses a conceptual or an implementation data

model.

External schemas

o at the external level to describe the various user views. Usually uses the same data

model as the conceptual level.


26/106


Page 11

Data Independence

The three-schema architecture can be used to further explain the concept of data independence,

which can be defined as the capacity to change the schema at one level of a database system

without having to change the schema at the next higher level. We can define two types of dataindependence:

Logical Data Independence

Logical data independence is the capacity to change the conceptual schema without having to

change external schemas or application programs. We may change the conceptual schema to

expand the database (by adding a record type or data item), to change constraints, or to reduce

the database (by removing a record type or data item). In the last case, external schemas that

refer only to the remaining data should not be affected. For example, the external schema of

Figure l.4a should not be affected by changing the GRADE_REPORT file shown in Figure 1.2

into the one shown in Figure 1.5a. Only the view definition and the mappings need be changed in

a DBMS that supports logical data independence. After the conceptual schema undergoes a

logical reorganization, application programs that reference the external schema constructs must

work as before. Changes to constraints can be applied to the conceptual schema without affecting

the external schemas or application programs.

Logical Data Independence Summary

Refers to immunity of external schemas to changes in conceptual schema.

Conceptual schema changes e.g. addition/removal of entities should not require changes

to external schema or rewrites of application programs.

The capacity to change the conceptual schema without having to change the external

schemas and their application programs.

Physical Data Independence

Physical data independence is the capacity to change the internal schema without having to

change the conceptual schema. Hence, the external schemas need not be changed as well.

Changes to the internal schema may be needed because some physical files had to be

reorganized-for example, by creating additional access structures-to improve the performance of

retrieval or update. If the same data as before remains in the database, we should not have to


27/106


Page 12

change the conceptual schema. For example, providing an access path to improve retrieval speed

of SECTION records (Figure 1.2) by Semester and Year should not require a query such as "list

all sections offered in fall 1998" to be changed, although the query would be executed more

efficiently by the DBMS by utilizing the new access path.

Physical Data Independence Summary

The ability to modify the physical schema without changing the logical schema

Applications depend on the logical schema

In general, the interfaces between the various levels and components should be well

defined so that changes in some parts do not seriously influence others.

The capacity to change the internal schema without having to change the conceptual

schema

Refers to immunity of conceptual schema to changes in the internal schema

Internal schema changes e.g. using different file organizations, storage structures/devices

should not require change to conceptual or external schemas.

Whenever we have a multiple-level DBMS, its catalog must be expanded to include information

on how to map requests and data among the various levels. The DBMS uses additional software

to accomplish these mappings by referring to the mapping information in the catalog. Data

independence occurs because when the schema is changed at some level, the schema at the next

higher level remains unchanged; only the mapping between the two levels is changed. Hence,

application programs referring to the higher-level schema need not be changed.

The three-schema architecture can make it easier to achieve true data independence, both

physical and logical. However, the two levels of mappings create an overhead during

compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because

of this, few DBMSs have implemented the full three-schema architecture.


28/106


Page 13

Fig. Data Independence and the ANSI-SPARC Three-level Architecture


29/106

Fundamental of Database Systems: Relational Data Model & ER-Model

Page 1

CHAPTER THREE

Relational Data Model & ER-Model

In database development life cycle, once all the requirements have been collected and analyzed,

the next step is to create a conceptualschemafor the database, using a high-level conceptual

datamodel(one of which is ER-Model). This step is called conceptualdesign. The conceptual

schema is a concise description of the data requirements of the users and includes detailed

descriptions of the entity types, relationships, and constraints; these are expressed using the

concepts provided by the high-level data model. Because these concepts do not include

implementation details, they are usually easier to understand and can be used to communicate

with nontechnical users. The high-level conceptual schema enables the database designers to

concentrate on specifying the properties of the data, without being concerned with storage

details. Consequently, it is easier for them to come up with a good conceptual database design.

The next step in database design is the actual implementation of the database, using a

commercial DBMS. Most current commercial DBMSs use an implementation data model such

as the relational or the object-relational database model-so the conceptual schema is transformed

from the high-level data model into the implementation data model. This step is called logical

designor datamodelmapping, and its result is a database schema in the implementation datamodel of the DBMS.

The last step is the physicaldesignphase, during which the internal storage structures, indexes,

access paths, and file organizations for the database files are specified. In this chapter only the

basic ER model concepts for conceptual schema design are discussed.

Relational Data Model

The building blocks of the relational data model are:

Entities: real world physical or logical object Attributes: properties used to describe each Entity or real world object. Relationship: the association between Entities Constraints: rules that should be obeyed while manipulating the data.


30/106


Page 2

1. Entities/Relation/TableThe basic object that the ER model represents is an entity, which is a "thing" in the real world

with an independent existence. An entity may be an object with a physical existence (for

example, a particular person, car, house, or employee) or it may be an object with a conceptualexistence (for example, a company, a job, or a university course). Each entity has attributes-the

particular properties that describe it. For example, an employee entity may be described by the

employee's name, age, address, salary, and job. A particular entity will have a value for each of

its attributes.

NB:

The name given to an entity should always be a singular noun descriptive of each item tobe stored in it. E.g.: student NOT students.

Every relation has a schema, which describes the columns, or fields The relation itself corresponds to our familiar notion of a table A relation is a collection of tuples, each of which contains values for a fixed number of

attributes

2. Attributes/Fields/ColumnsAttributes are pieces of information ABOUT entities. The analysis must of course identify those

which are actually relevant to the proposed application. At this level we need to know such

things as:

Attribute name (be explanatory words or phrases) The domain from which attribute values are taken (A DOMAINis a set of values from

which attribute values may be taken.) Each attribute has values taken from a domain. For

example, the domain of Name is string and that for salary is real

Whether the attribute is part of the entity identifier (attributes which just describe anentity and those which help to identify it uniquely)

Whether it is permanent or time-varying (which attributes may change their values overtime)

Whether it is required or optional for the entity (whose values will sometimes beunknown or irrelevant)


31/106


Page 3

Several types of attributes occur in the ER model: simple versus composite, single-valued versus

multi-valued, and stored versus derived.

Composite versus Simple (Atomic) AttributesComposite attributes can be divided into smaller subparts, which represent more basic

attributes with independent meanings. For example, Address attribute of an employee entity can

be subdivided into StreetAddress, City, State, and postal code. Attributes that are not divisible

are called simpleor atomicattributes. Composite attributes can form a hierarchy; for example,

StreetAddress can be further subdivided into three simple attributes: Number, Street, and

ApartmentNumber. The value of a composite attribute is the concatenation of the values of its

constituent simple attributes.

Single-Valued versus Multi-valued AttributesMost attributes have a single value for a particular entity; such attributes are called single-

valued. For example, Age is a single-valued attribute of a person. In some cases an attribute can

have a set of values for the same entity-for example, Colors attribute for a car, or a

CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone

cars have two values for Colors. Similarly, one person may not have a college degree, another

person may have one, and a third person may have two or more degrees; therefore, different

persons can have different numbers of values for the CollegeDegrees attribute. Such attributes

are called multi-valued. A multi-valued attribute may have lower and upper bounds to constrain

the number of values allowed for each individual entity. For example, the Colors attribute of a

car may have between one and three values, if we assume that a car can have at most three

colors.

Stored versus Derived AttributesIn some cases, two (or more) attribute values are related-for example, the Age and BirthDate

attributes of a person. For a particular person entity, the value of Age can be determined from the

current (today's) date and the value of that person's BirthDate. The Age attribute is hence called a

derived attribute and is said to be derivable from the BirthDate attribute, which is called a

stored attribute. Some attribute values can be derived from related entities; for example, an


32/106


Page 4

attribute NumberOfEmployees of a department entity can be derived by counting the number of

employees related to (working for) that department.

Null ValuesIn some cases a particular entity may not have an applicable value for an attribute. For example,

the ApartmentNumber attribute of an address applies only to addresses that are in apartment

buildings and not to other types of residences, such as single-family homes. Similarly, a

CollegeDegrees attribute applies only to persons with college degrees. For such situations, a

special value called nullis created. An address of a single-family home would have null for its

ApartmentNumber attribute, and a person with no college degree would have null for

CollegeDegrees.

NB:

NULL applies to attributes which are not applicable or which do not have values. You may enter the value NA (meaning not applicable) Value of a key attribute cannot be null. Default value- assumed value if no explicit value

Key Attributes of Entity

An important constraint on the entities of an entity is the key or uniqueness constraint on

attributes. An entity usually has an attribute whose values are distinct for each individual entity

in the entity set. Such an attribute is called a keyattribute (Primary Key), and its values can be

used to identify each entity uniquely. For example, a Name attribute can be a key for a

COMPANY entity type, because no two companies are allowed to have the same name. For the

PERSON entity type, a typical key attribute is SocialSecurityNumber. Sometimes, several

attributes together form a key, meaning that the combination of the attribute values must be

distinct for each entity. If a set of attributes possesses this property, the proper way to represent

this is to define a composite attribute and designate it as a key attribute of the entity type.

Notice that such a composite key must be minimal; that is, all component attributes must be

included in the composite attribute to have the uniqueness property. An entity type may also

have nokey, in which case it is called a weakentity.


33/106


Page 5

Summary of Key Attributes

o Each row of a table is uniquely identified by a PRIMARYKEYcomposed of one ormore columns

oGroup of columns, that uniquely identifies a row in a table is called a CANDIDATEKEY

o A column or combination of columns that matches the primary key of another table iscalled a FOREIGNKEY. Used to cross-reference tables

Value Sets (Domains) of AttributesEach simple attribute of an entity type is associated with a value set (or domain of values), which

specifies the set of values that may be assigned to that attribute for each individual entity. For

instance, if the range of ages allowed for employees is between 16 and 70, we can specify the

value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and

70. Similarly, we can specify the value set for the Name

3. RelationshipsA relationship type R among n entity types E1, E2, --- En, defines a set of associations among

entities. For example, consider a relationship type WORKS_FOR between two entity types

EMPLOYEE and DEPARTMENT, which associates each employee with the department for

which the employee works.

Degree of a Relationship

The degree of a relationship type is the number of participating entity types. Hence, the

WORKS-FOR relationship is of degree two. A relationship type of degree two is called binary,

and one of degree three is called ternary. Relationships can generally be of any degree, but the

ones most common are binary relationships. Higher-degree relationships are

For example, suppose that we want to capture which employees use which skills on which

project. We might try to represent this data in a database as three binary relationships between

skills and project, project and employee, and employee and skill.


34/106


Page 6

Fig. A ternary relationship

Works-on

o Abebe and Kebede have worked on projectsA andB. Used-on

o Abebe has used programming skills on project x Have skill

o An employee has a certain skill. This is different than used on because there aresome skills that an employee has that he or she may not have used on a particular

project.

Neededo A project needs a particular skill. This is different than used on because there may

be some skills for which employees have not been assigned to the project yet.

Manageso An employee manages a project. This is a completely different dimension than

skill so it could not be captured by used on.

Role Names and Recursive Relationships

Each entity that participates in a relationship plays a particular role in the relationship. The role

name signifies the role that a participating entity from the entity type plays in each relationship

instance, and helps to explain what the relationship means. For example, in the WORKS_FOR

relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays

the role of department or employer.


35/106


Page 7

In some cases the same entity participates more than once in a relationship with the same entity

in different roles. In such cases the role name is essential for distinguishing the meaning of each

participation. Such relationship types are called recursive relationships. For example, the

SUPERVISION relationship type relates an employee to a supervisor, where both employee and

supervisor entities are members of the same EMPLOYEE entity.

Cardinality Ratios for Binary Relationships

An important concept about relationship is the number of instances/tuples that can be associated

with a single instance from one entity in a single relationship. The cardinalityratiofor a binary

relationship specifies the maximum number of relationship instances that an entity can participate

in. The number of instances participating or associated with a single instance from an entity in a

relationship is called the CARDINALITYof the relationship.

For example, in the WORKS_FOR binary relationship type, DEPARTMENT: EMPLOYEE is of

cardinality ratio 1: N, meaning that each department can be related to (that is, employs) any

number of employees. But an employee can be related to (work for) only one department. The

possible cardinality ratios for binary relationship types are 1:1, l:M, M:l, and M:M.

ONE-TO-ONE:one tuple is associated with only one other tuple An example of a 1:1 binary relationship is MANAGES, which relates a

department entity to the employee who manages that department. This means any

point in time an employee can manage only one department and a department has

only one manager

Another example of 1:1 relationship: Building Location as a single buildingwill be located in a single location and as a single location will only accommodate

a single Building.


36/106


Page 8

ONE-TO-MANY: one tuple can be associated with many other tuples, but not thereverse.

Eg1. Department-Student: as one department can have multiple students. Eg 2: Employee-Deprtament: as one employee can work in multiple departments.

MANY-TO-ONE:many tuples are associated with one tuple but not the reverse. E.g. Employee Department: as many employees belong to a single

department.

MANY-TO-MANY: one tuple is associated with many other tuples and from theother side, with a different role name one tuple will be associated with many tuples

E.g. StudentCourse: as a student can take many courses and a single coursecan be attended by many students.

E.g. Employee- Project: as an employee works-on many projects and a singleproject can be done by many employees.


37/106


Page 9

4. Relational Constraints/Integrity Rules Relational Integrity

o Domain Integrity: No value of the attribute should be beyond the allowablelimits

o Entity Integrity: In a base relation, no attribute of a primary key can be nullo Referential Integrity: If a foreign key exists in a relation, either the foreign key

value must match a candidate key in its home relation or the foreign key value

must be null foreign key to primary key match-ups

o Enterprise Integrity: Additional rules specified by the users or databaseadministrators of a database are incorporated

Relational languages and views

The languages in relational database management systems are the DDL and the DML that are

used to define or create the database and perform manipulation on the database. We have the two

kinds of relation in relational database. The difference is on how the relation is created, used and

updated:

1. Base RelationA Named Relation corresponding to an entity in the conceptual schema, whose tuples arephysically stored in the database.

2. ViewIs the dynamic result of one or more relational operations operating on the base relations to

produce another virtual relation. So a view virtually derived relation that does not necessarily

exist in the database but can be produced upon request by a particular user at the time of request.

Purpose of a view

Hides unnecessary information from users Provide powerful flexibility and security Provide customized view of the database for users A view of one base relation can be updated. Update on views derived from various relations is not allowed


38/106


Page 10

Conceptual Database Design

Conceptual design revolves around discovering and analyzing organizational and user data

requirements. The important activities are to identify: Entities, Attributes, Relationships and

Constraints and based on these components develop the ER model using ER diagrams.

The Entity Relationship (E-R) Model

Entity-Relationship modeling is used to represent conceptual view of the database. The main

components of ER Modeling are: Entities, Attributes, Relationships and Constraints. Before

working on the conceptual design of the database, one has to know and answer the following

basic questions.

What are the entities and relationships in the enterprise What information about these entities and relationships should we store in the

database?

What are the in tegri ty constraints that hold? Constraints on each data with respect toupdate, retrieval and store.

Represent this information pictorially in ER diagrams, then map ER diagram into arelational database schema.

ER-Diagrams

Entity is represented by a RECTANGLE containing the name of the entity

Attributes are represented by OVALS and are connected to the entity by a line


39/106


Page 11

A derived attribute is indicated by a DOTTED LINE

PRIMARY KEYS are underlined

Relationships are represented by DIAMOND shaped symbols

Structural Constraints on Relationships

Relationship types usually have certain constraints that limit the possible combinations of entities

that may participate in the corresponding relationship set. These constraints are determined from

the miniworld situation that the relationships represent. For example, if a company has a rule that

each employee must work for exactly one department, then we would like to describe this

constraint in the schema. We can distinguish two main types of relationship structural

constraints: cardinality ratio and participation.

1. Constraints on Relationship / Multiplicity/ Cardinality ConstraintsMultiplicity constraint is the number of or range of possible occurrence of an entity

type/relation that may relate to a single occurrence/tuple of an entity type/relation through a

particular relationship. These constraints are mostly used to insure appropriate enterprise

constraints.

Specifies that each entity e in E participates in at least min and at most maxrelationship instances in R

Default(no constraint): min=0, max=n (signifying no limit) Must have minmax, min0, max 1

Key


40/106


Page 12

One-to-One RelationshipRelationship Managesbetween Employee and Department

The multiplicity of the relationship is:o

One department can only have one managero One employee could manage either one or no departments

01 11

One-to-Many RelationshipRelationship Leadsbetween Employeeand Project

The multiplicity of the relationshipo One staff may Lead one or more project(s)o One project is Lead by one employee

0* 11

Many-to-Many RelationshipRelationship Teachesbetween Instructor and Course

The multiplicity of the relationshipo One Instructor Teaches one or more Course(s)o One Course is thought by zero or more instructor(s)

1* 0*

2. Participation of an Entity Set in a Relationship SetTotal participation (indicated by double line): every entity in the entity set participates in at

least one relationship in the relationship set. The entity with total participation will be connected

with the relationship using a double line.

Employee DepartmentManages

Employee ProjectLeads

Instructor CourseTeaches


41/106


Page 13

E.g. 1: Participation of EMPLOYEE in belongs to relationship with DEPARTMENT is total

since every employee should belong to a department.

E.g. 2: Participation of EMPLOYEE in manages relationship with DEPARTMENT,

DEPARTMENT will have total participation but not EMPLOYEE

Partial participation: some entities may not participate in any relationship in the relationship

set

E.g. 1: Participation of EMPLOYEE in manages relationship with DEPARTMENT,

EMPLOYEE will have partial participation since not all employees are managers.


42/106


Page 14

Attributes of Relationship Types

Relationship types can also have attributes, similar to those of entity types. For example, to

record the number of hours per week that an employee works on a particular project, we can

include an attribute Hours for the WORKS_ON relationship type. Another example is to include

the date on which a manager started managing a department via an attribute StartDate for the

MANAGES relationship type.

Notice that attributes of 1:1 or 1:M relationship types can be migrated to one of the participating

entity types. For example, the StartDate attribute for the MANAGES relationship can be an

attribute of either EMPLOYEE or DEPARTMENT, although conceptually it belongs to


43/106


Page 15

MANAGES. This is because MANAGES is a 1:1 relationship, so every department or employee

entity participates in at most one relationship instance. Hence, the value of the StartDate attribute

can be determined separately, either by the participating department entity or by the participating

employee (manager) entity.

Weak Entity Types

Entity types that do not have key attributes of their own are called weakentitytypes. In contrast,

regular entity types that do have a key attribute are called strongentitytypes. Entities belonging

to a weak entity type are identified by being related to specific entities from another entity type

in combination with one of their attribute values.

A weak entity type always has a total participation constraint (existence dependency) with

respect to its identifying relationship, because a weak entity cannot be identified without an

owner entity. However, not every existence dependency results in a weak entity type. For

example, a DRIVER_LICENSE entity cannot exist unless it is related to a PERSON entity, even

though it has its own key (LicenseNumber) and hence is not a weak entity.

Consider the entity type DEPENDENT, related to EMPLOYEE, which is used to keep track of

the dependents of each employee via a l:N relationship (Figure 3.2). The attributes of

DEPENDENT are Name (the first name of the dependent), BirthDate, Sex, and Relationship (to

the employee). Two dependents of two distinct employees may, by chance, have the same values

for Name, BirthDate, Sex, and Relationship, but they are still distinct entities. They are identified

as distinct entities only after determining the particular employee entity to which each dependent

is related. Each employee entity is said to own the dependent entities that are related to it. A

weak entity type normally has a partial key, which is the set of attributes that can uniquely

identify weak entities that are related to the same owner entity. In our example, if we assume that

no two dependents of the same employee ever have the same first name, the attribute Name of

DEPENDENT is the partial key. In the worst case, a composite attribute of all the weak entity's

attributes will be the partial key.


44/106


Page 16

ER-to-Relational Mapping Algorithm

Step 1:Mapping of Regular Entity TypesFor each regular (strong) entity type E in the ER schema, create a relation R that includes all the

simple attributes of E. Include only the simple component attributes of a composite attribute.

Choose one of the key attributes of E as primary key for R. If the chosen key of E is composite,

the set of simple attributes that form it will together form the primary key of R.

Example: For the company database ER (Figure 3.2)

o regular entities are:EMPLOYEE, DEPARTMENT, and PROJ ECTo The foreign key and relationship attributes, if any, are not included yet; they will

be added during subsequent steps. These include the attributes SUPERSSN and

DNO of EMPLOYEE, MGRSSN and MGRSTARTDATE of DEPARTMENT,

and DNUM of PROJECT.

o In our example, we choose empid, DNUMBER, and PNUMBER as primary keysfor the relations EMPLOYEE, DEPARTMENT, and PROJECT,

Step 2: Mapping of Weak Entity TypesFor each weak entity W in the ER schema with owner entity type E, create a relation R and

include all simple attributes (or simple components of composite attributes) of W as attributes of

R. In addition, include as foreign key attributes of R the primary key attributes) of the relations)

that correspond to the owner entity types): this takes care of the identifying relationship type of

W. The primary key of R is the combination of the primary keys of the owners and the partial

key of the weak entity type W, if any.

In our example, we create the relation DEPENDENT in this step to correspond to theweak entity type DEPENDENT.

We include the primary key empId of the EMPLOYEE relation which corresponds to theowner entity type-as a foreign key attribute of DEPENDENT;

The primary key of the DEPENDENT relation is the combination {empId,DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of

DEPENDENT.


45/106


Page 17

Step 3: Mapping of Binary 1:1 Relationship Types.For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that

correspond to the entity types participating in R. Choose one of the relations-S, say-and include

as a foreign key in S the primary key of T. It is better to choose an entity type with total

participation in R in the role of S. Include all the simple attributes (or simple components of

composite attributes) of the 1:1 relationship type R as attributes of S.

In our example, we map the 1:1 relationship type MANAGES by choosing theparticipating entity type DEPARTMENT to serve in the role of S, because its

participation in the MANAGES relationship type is total (every department has a

manager). We include the primary key of the EMPLOYEE relation as foreign key in the

DEPARTMENT relation and rename it MGRSSN. We also include the simple attributeSTARTDATE of the MANAGES relationship type in the DEPARTMENT relation and

rename it MGRSTARTDATE.

Note that it is possible to include the primary key of S as a foreign key in T instead. Inour example, this amounts to having a foreign key attribute, say

DEPARTMENT_MANAGED in the EMPLOYEE relation, but it will have a null value

for employee tuples who do not manage a department.

Step 4: Mapping of Binary 1:N Relationship Types.For each regular binary 1:N relationship type R, identify the relation S that represents the

participating entity type at the N-side of the relationship type. Include as foreign key in S the

primary key of the relation T that represents the other entity type participating in R; this is done

because each entity instance on the N-side is related to at most one entity instance on the 1-side

of the relationship type. Include any simple attributes (or simple components of composite

attributes) of the 1:N relationship type as attributes of S.

In our example, we now map the 1:N relationship types WORKS_FOR, CONTROLS,and SUPERVISION .

For WORKS_FOR we include the primary key DNUMBER of the DEPARTMENTrelation as foreign key in the EMPLOYEE relation and call it DNO.


46/106


Page 18

For SUPERVISION we include the primary key of the EMPLOYEE relation as foreignkey in the EMPLOYEE relation itself because the relationship is recursive-and call it

SUPERSSN.

The CONTROLS relationship is mapped to the foreign key attribute DNUM ofPROJECT, which references the primary key DNUMBER of the DEPARTMENT

relation.

Step 5: Mapping of Binary M:N Relationship Types.For each binary M:N relationship type R, create a new relation S to represent R. Include as

foreign key attributes in S the primary keys of the relations that represent the participating entity

types; their combination will form the primary key of S. Also include any simple attributes of the

M:N relationship type (or simple components of composite attributes) as attributes of S. Notice

that we cannot represent an M:N relationship type by a single foreign key attribute in one of the

participating relations (as we did for 1:1 or I:N relationship types) because of the M:N

cardinality ratio; we must create a separate relationship relationS.

In our example, we map the M:N relationship type WORKS_ON by creating therelation WORKS_ON.

We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keysin WORKS_ON and rename them PNO and ESSN, respectively.

We also include an attribute HOURS in WORKS_ON to represent the HOURS attributeof the relationship type. The primary key of the WORKS_ON relation is the combination

of the foreign key attributes {ESSN, PNO}.

Step 6: Mapping of Multivalued Attributes.For each multivalued attribute A, create a new relation R. This relation R will include an

attribute corresponding to A, plus the primary key attribute K-as a foreign key in R-of the

relation that represents the entity type or relationship type that has A as an attribute. The primary

key of R is the combination of A and K. If the multivalued attribute is composite, we include its

simple components.


47/106


Page 19

In our example, we create a relation DEPT_LOCATIONS. The attribute DLOCATIONrepresents the multivalued attribute LOCATIONS of DEPARTMENT, while

DNUMBER-as foreign key represents the primary key of the DEPARTMENT relation.

The primary key of DEPT_LOCATIONS is the combination of {DNUMBER,

DLOCATION}. A separate tuple will exist in DEPT_LOCATIONS for each location that

a department has.

Step 7: Mapping of N-ary Relationship Types.For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as

foreign key attributes in S the primary keys of the relations that represent the participating entity

types. Also include any simple attributes of the n-ary relationship type (or simple components of

composite attributes) as attributes of S. The primary key of S is usually a combination of all theforeign keys that reference the relations representing the participating entity types.

For example, consider the relationship type SUPPLY of the Figure above. This can be mapped to

the relation SUPPLY shown in below; whose primary key is the combination of the three foreign

keys {SNAME, PARTNO, PROJNAME}.


48/106

Fundamental of Database Systems: Functional Dependency and Normalization

Page 1

CHAPTER FOUR

Functional Dependency and Normalization

So far, we have assumed that attributes are grouped to form a relation schema by using the common sense

of the database designer or by mapping a database schema design from a conceptual data model such as

the ER or some other conceptual data model. These models make the designer identify entity types and

relationship types and their respective attributes, which leads to a natural and logical grouping of the

attributes into relations when the mapping procedures in Chapter 3 are followed. However, we still need

some formal measure of why one grouping of attributes into a relation schema may be better than another.

So far in our discussion of conceptual design and its mapping into the relational model, we have not

developed any measure of appropriateness or "goodness" to measure the quality of the design, other than

the intuition of the designer. In this chapter we discuss some of the theory that has been developed with

the goal of evaluating relational schemas for design quality-that is, to measure formally why one set of

groupings of attributes into relation schemas is better than another.

In this chapter, the concept of functional dependency, a formal constraint among attributes that is the

main tool for formally measuring the appropriateness of attribute groupings into relation schemas is

defined. We show how functional dependencies can be used to group attributes into relation schemas that

are in a normal form. A relation schema is in a normal form when it satisfies certain desirable properties.

The process of normalization consists of analyzing relations to meet increasingly more stringent normal

forms leading to progressively better groupings of attributes. Normal forms are specified in terms offunctional dependencies-which are identified by the database designer-and key attributes of relation

schemas.

Informal Measures to goodness of a Relation

The ease with which the meaning of a relation's attributes can be explained is an informal measure of how

well the relation is designed.

GUIDELINE 1. Design a relation schema so that it is easy to explain its meaning. Do not combine

attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation

schema corresponds to one entity type or one relationship type, it is straightforward to explain its

meaning. Otherwise, if the relation corresponds to a mixture of multiple entities and relationships,

semantic ambiguities will result and the relation cannot be easily explained.


49/106


Page 2

GUIDELINE 2. Design the base relation schemas so that no insertion, deletion, or modification

anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that

the programs that update the database will operate correctly.

GUIDELINE 3. As far as possible, avoid placing attributes in a base relation whose values may

frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases only and do

not apply to a majority of tuples in the relation.

Normalization

A relational database is merely a collection of data, organized in a particular manner. As the father of the

relational database approach, Codd created a series of rules called normal forms that help define that

organization.

Database normalization is a series of steps followed to obtain a database design that allows for consistent

storage and efficient access of data in a relational database. These steps reduce data redundancy and the

risk of data becoming inconsistent.

Normalization is the process of identifying the logical associations between data items and designing a

database that will represent such associations but without suffering the update anomalies which are;

Insertion Anomalies, Deletion Anomalies and Modification Anomalies.

All the normalization rules will eventually remove the update anomalies that may exist during data

manipulation after the implementation. The underlying ideas in normalization are simple enough.

Through normalization we want to design for our relational database a set of tables that;

Contain all the data necessary for the purposes that the database is to serve, Have as little redundancy as possible, Accommodate multiple values for types of data that require them, Permit efficient updates of the data in the database, and Avoid the danger of losing data unknowingly

Normalization may reduce system performance since data will be cross referenced from many tables.

Thus denormalization is sometimes used to improve performance, at the cost of reduced consistency

guarantees.


50/106


Page 3

Drawbacks of Normalization

Requires data to see the problems May reduce performance of the system

Is time consuming, Difficult to design and apply and Prone to human error

The type of problems that could occur in insufficiently normalized table is called update anomalies which

includes;

1. Insertion anomaliesAn "insertion anomaly" is a failure to place information about a new database entry into all the places in

the database where information about that new entry needs to be stored. In a properly normalized

database, information about a new entry needs to be inserted into only one place in the database; in an

inadequately normalized database, information about a new entry may need to be inserted into more than

one place and, human fallibility being what it is, some of the needed additional insertions may be missed.

2. Deletion anomaliesA "deletion anomaly" is a failure to remove information about an existing database entry when it is time

to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of

entry needs to be deleted from only one place in the database; in an inadequately normalized database,

information about that old entry may need to be deleted from more than one place, and, human fallibility

being what it is, some of the needed additional deletions may be missed.

3. Modification anomaliesA modification of a database involves changing some value of the attribute of a table. In a properly

normalized database table, whatever information is modified by the user, the change will be effected and

used accordingly.

The purpose of normalization is to reduce the chances for anomalies to occur in a database.


51/106


Page 4

Example of problems related with Anomalies

EmpID FName LName SkillID Skill SkillType School SchoolAddress Skill

Level

12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5

16 Lemma Alemu 5 C++ Programming Unity Gerji 6

28 Chane Kebede