Database Design Concepts · 2018-02-28 · Compiled by Michael Mapundu, Herbert Zuze and Nyasha...

Database Design Concepts C_ITDB211

Compiled by Michael Mapundu, Herbert Zuze and Nyasha Magutsa

Quality assured by Masimba Nyadzibaya, Martin Appiah and Sabelo Muchayiri

Edited by Isobel Coetzee

Version 1.0

NQF Level 5

Credit value: 12

November 2015 CTI Education Group

TABLE OF CONTENTS

INTRODUCTION .......................................................................................................... 1

Summary of learning outcomes and assessment criteria ............................................... 2

Study Guide alignment ............................................................................................. 8

UNIT 1 – DATABASE FUNDAMENTALS ......................................................................... 9

1.1.1 Database uses ............................................................................................ 10 1.1.2 Physical and logical views of databases ......................................................... 11 1.1.3 Using relational database tables ................................................................... 11 1.1.4 Processing relational database tables ............................................................ 12 1.1.5 The database environment ........................................................................... 12

1.2.1 Database component .................................................................................. 14 1.2.2 DBMS component ....................................................................................... 16 1.2.3 Database application component .................................................................. 18 1.2.4 Common DBMS data types ........................................................................... 19 1.2.5 Personal vs enterprise-class database systems ............................................... 19 1.2.6 Data diagrams ............................................................................................ 21

1.3.1 Relations ................................................................................................... 22 1.3.2 Types of key .............................................................................................. 23

1.4.1 Functional dependency ................................................................................ 26 1.4.2 Normalisation ............................................................................................. 26

UNIT 2: DATABASE DESIGN ...................................................................................... 30

2.1.1 Database development stages ...................................................................... 31 2.1.2 Structured Systems Analysis and Design Method (SSADM)............................... 34 2.1.3 Agile Software Development methodology ..................................................... 34 2.1.4 Rapid Application Development (RAD) methodology ........................................ 35 2.1.5 Spiral methodology ..................................................................................... 35 2.1.6 Waterfall methodology ................................................................................ 36

2.2.1 ER Data Model ............................................................................................ 36

Module aim ........................................................................................................... 1 Module abstract .................................................................................................... 1 Learning outcomes and assessment criteria ......................................................... 2

Module content ..................................................................................................... 3 Lectures................................................................................................................ 3 Class exercises and activities ............................................................................... 4 Information resources .......................................................................................... 4 Recommended information sources ...................................................................... 4 Using this Study Guide ......................................................................................... 5 Purpose ................................................................................................................ 5 Structure .............................................................................................................. 6 Individual units .................................................................................................... 6 Glossary ............................................................................................................... 7 The use of icons .................................................................................................... 7 Alignment to Study Guide ..................................................................................... 8

Concluding remarks .............................................................................................. 8

Learning objectives .............................................................................................. 9 Introduction ......................................................................................................... 9 1.1 Introduction to databases ...................................................................... 10

1.2 Database systems ................................................................................... 13

1.3 Relational Model ..................................................................................... 22

1.4 Functional dependency and normalisation .............................................. 26

1.5 Concluding remarks ................................................................................ 28 1.6 Self-assessment ..................................................................................... 28

Learning objectives ............................................................................................ 30 Introduction ....................................................................................................... 30 2.1 Database design methods and methodologies ........................................ 31

2.2 Relational database design: ER modelling .............................................. 36

2.2.2 ER diagrams .............................................................................................. 41 2.2.3 Sub-type entities ........................................................................................ 45 2.2.4 Recursive relationships ................................................................................ 46 2.2.5 Developing ER diagrams .............................................................................. 46

2.3.1 Normalisation process ................................................................................. 52 2.3.2 Transforming data models into relational design ............................................. 55 2.3.3 Normalisation review ................................................................................... 56 2.3.4 Denormalisation ......................................................................................... 57 2.3.5 Developing database design ......................................................................... 63

UNIT 3 – DATABASE DEVELOPMENT ......................................................................... 69

3.2.1 Creating a database .................................................................................... 71 3.2.2 Showing a database .................................................................................... 73 3.2.3 Selecting a database ................................................................................... 74 3.2.4 Deleting a database .................................................................................... 74 3.2.5 MySQL data types ....................................................................................... 75 3.2.6 Creating a table .......................................................................................... 76 3.2.7 Showing a table .......................................................................................... 78 3.2.8 Describing a table ....................................................................................... 80 3.2.9 Entering data into a table: Form Editor .......................................................... 81 3.2.10 Entering data into a table: Datasheet View .................................................... 82 3.2.11 Deleting data from a table: Datasheet View ................................................... 82 3.2.12 Altering table structure ................................................................................ 83 3.2.13 Working with multiple tables ........................................................................ 84 3.2.14 Working with multiple tables: Relationships Window ....................................... 85 3.2.15 Adding a table: Relationships Window ........................................................... 86 3.2.16 Creating a relationship: Relationships Window ................................................ 86 3.2.17 Working with multiple tables: Edit Relationships Window ................................. 87 3.2.18 Accessing a table: Form Editor ..................................................................... 88

3.3.1 MySQL background ..................................................................................... 88 3.3.2 SQL for data definition................................................................................. 88 3.3.3 Creating a database .................................................................................... 90 3.3.4 Showing a database .................................................................................... 91 3.3.5 Selecting a database ................................................................................... 91 3.3.6 Creating a table .......................................................................................... 92 3.3.7 Showing and describing a table .................................................................... 92 3.3.8 Creating a MySQL table ............................................................................... 93 3.3.9 Relational data ........................................................................................... 97 3.3.10 Relational queries ....................................................................................... 97 3.3.11 Comparison operators ............................................................................... 103 3.3.12 Sorting a query result ............................................................................... 104 3.3.13 Built-in functions and calculations ............................................................... 104 3.3.14 Built-in functions and grouping ................................................................... 105 3.3.15 Querying multiple tables with joins ............................................................. 106 3.3.16 Relational data modification and deletion ..................................................... 107 3.3.17 Table and constraint modification and deletion ............................................. 109 3.3.18 SQL views ................................................................................................ 110

UNIT 4 – DATABASE MANAGEMENT ........................................................................ 113 4.1.1 Responsibilities ......................................................................................... 114

2.3 Relational database design: normalisation ............................................. 52

2.4 Concluding remarks ................................................................................ 67 2.5 Self-assessment ..................................................................................... 68

Learning objectives ............................................................................................ 69 Introduction ....................................................................................................... 69 3.1 MySQL 5.6 database software ................................................................. 70 3.2 MySQL GUI.............................................................................................. 70

3.3 MySQL CMD ............................................................................................. 88

3.4 Concluding remarks .............................................................................. 112 3.5 Self-assessment ................................................................................... 112

4.1.2 Database processing environment I ............................................................ 114 4.1.3 Concurrency control .................................................................................. 115 4.1.4 Security ................................................................................................... 122 4.1.5 Backup and recovery ................................................................................. 124

4.2.1 Database processing environment II ........................................................... 127

4.3.1 Application programming interfaces (APIs) ................................................... 129 4.3.2 N-tier architecture .................................................................................... 129

4.4.1 Object-relational database management ...................................................... 131

GLOSSARY .............................................................................................................. 134 BIBLIOGRAPHY ...................................................................................................... 137

Learning objectives .......................................................................................... 113 Introduction ..................................................................................................... 113 4.1 Database administration ....................................................................... 114

4.2 Database applications processing ......................................................... 127

4.3 Internet applications processing .......................................................... 128

4.4 Distributed database processing ........................................................... 130

4.5 Concluding remarks .............................................................................. 132 4.6 Self-assessment ................................................................................... 132

Introduction Page 1

© CTI Education Group

Introduction

This module covers relational database structures, intermediate database design, management and administration using MySQL. The normalisation

principle is explored in great depth and forms the basis of designing relational databases in this module. The normalized data is to be transformed into an

Entity Relationship diagram (ERD) and translated in physical tables. All concepts taught are translated into practice using MySQL. This module’s scope

builds up to other modules. The concepts taught enable the student to design a database that can be used to interact with other third party applications. The

module scope basically introduces students to practical interaction with the application using the command line interface. This module focuses more on a

client-server topology. It also explores how MySQL functions in Graphics User

Interfaces (GUIs) and command line interfaces (CMDs); these are based on generic Structured Query Language (SQL) and will thus only differ in syntax.

Upon completion of this module, you should be equipped with strong MySQL, database design and database development skills.

The main source of information for Database Design Concepts is this Study

Guide.

In this introductory unit, we provide you with the following information on Database Design Concepts:

A brief description of the aim of the module

An abstract of the module The learning outcomes and assessment criteria involved in the module

An outline of the module content

An outline of the module structure An explanation of the design and proper use of the Study Guide

Module aim

The aim of this module is to afford you the opportunity to develop an understanding of the concepts and issues related to databases and database

design as well as the practical skills to translate such an understanding into the

design and creation of complex databases.

Module abstract

Databases play an integral part in commercial domains as they provide users

with a tool in which to store, model and retrieve data. Database development is fundamental in the areas of computing and Information and Communication

Technology (ICT). Database Management Systems (DBMSs) provide the

systems, tools and interfaces with which an organisation can manage their information and use to assist in the effective running of said organisation.

Databases offer many links to other areas, such as programming, systems

Introduction Page 2


analysis and Human Computer Interaction (HCI); they also embrace issues of compatibility and end user interfacing.

This module also explores database architecture, DBMSs and the use of

databases in organisational contexts. Database design techniques are investigated and successful students will be able to apply their theoretical

understanding to design, create and document a database system.

Learning outcomes and assessment criteria

On successful completion of this module, you will:

1. Discuss the role, purpose and use of databases and data management systems.

2. Identify and explain database design techniques and principles 3. Design, create and document databases

The following table outlines the assessment criteria that are aligned to the

learning outcomes.

Summary of learning outcomes and assessment criteria

Learning outcomes Assessment criteria to pass

On successful completion of

this module, you will: You can:

1. Discuss the role purpose and

use of databases and data

management systems.

1.1 Analyse the key issues and applications of

databases within an organisational environment

1.2 Evaluate the features and advantages of database

management systems

2. Identify and explain database

design techniques &

principles

2.1 Analyse a database developmental methodology –

normalization and ERD

2.2 Apply relation database design for a given data set

(3NF/ERD)

3. Design, create and document

databases

3.1 Apply the database developmental cycle to a given

data set

3.2 Design a fully functional database (containing at

least four interrelational tables) including user

interface

3.3 Provide supporting user and technical

documentation

These outcomes are covered in the module content and are assessed in the

form of written assignments and semester tests. If you comply with and achieve all the pass criteria related to said outcomes, you will pass this

module.

Learning and assessment may be performed across modules, at module level or at outcome level. Evidence may be required at outcome level, although

opportunities exist for covering more than one outcome in an assignment.

Introduction Page 3


Module content

1. Discuss the role purpose and use of databases and data

management systems Databases: database architectures; files and record structures; physical and logical views of data; advantages of using databases; reduction of data

redundancy; data consistency (validity, accuracy, usability and integrity); independence of data; data sharing possibilities; security; enforcement of

standards; database utilities; data dictionaries; query languages and report generators.

Databases in an organisational context: database applications; role of the

database administrator; key organisational issues e.g. integrity, security,

recovery, concurrency; industry standards e.g. Microsoft SQL, Oracle, Sybase and dBase.

Database Management Systems (DBMS): structures; purposes; features and

advantages; applications; methods of data organisation and access.

2. Identify and explain database design techniques & principles Database design methods and methodology: database design within a system

development methodology – normalisation practical to third normal form and Entity Relationship modelling/ERD.

Relational database design: logical tables design; physical table design; relations and primary/foreign/compound keys.

3. Design, create and document databases Database development cycle: developing logical data model; implementing a physical data model based on the logical data model; testing the physical data

model; comparing model with requirements analysis; user interface e.g. input masks, drop-down lists, option buttons and command buttons.

Database software: using appropriate applications software, MySQL; database

tools e.g. create tables, add new rows, alter data, functions, and relational

database languages.

Tools and techniques: field and table design; validation and verification techniques; forms including such features as dropdown lists or check boxes;

reports; queries; macros.

Documentation: technical documentation and user documentation.

Lectures

Each week has four compulsory lecture hours for all students. It is recommended that the lecture hours be divided into two sessions of two

hours each, but this may vary depending on the campus.

Introduction Page 4


Each week has a lecture schedule, which indicates the approximate time that

should be allocated to each activity. The week’s work schedule has also been divided into two lessons.

Class exercises and activities

You will be required to complete a number of exercises and activities in class.

These activities and exercises may also contribute to obtaining a pass, therefore, it is important that you are present in class so that you do not forfeit

the opportunity to be exposed to these exercises and activities.

Activity sheets that are submitted should be kept by the lecturer so that they can be used as proof of criteria that were met, if necessary.

Information resources

You should have access to a resource centre or library with a wide range of

relevant resources. Resources can include textbooks, e-books, newspaper articles, journal articles, organisational publications, databases, etc. You can

access a range of academic journals in electronic format via EBSCOhost. You may have to ask a campus librarian to assist you with accessing EBSCOhost.

Recommended information sources

Avison, D. & Fitzgerald, G. 2006. Information systems development:

methodologies, techniques and tools. London: McGraw Hill Higher Publishing.

Chao, L. 2006. Database development and management. Boston: CRC Press.

Connolly, T. & Begg, C. 2015. Database Systems: A Practical Approach to Design, Implementation and Management.6TH ed.Pearson.ISBN:

9780132943260.

Hoffer. J. et al. 2014. Essentials of Database Management. Pearson.

Howe, D. 2001. Data analysis for database design. Massachusetts:

Butterworth-Heinemann.

Jukic, N. 2014. Database Systems: Introduction to Databases and Data Warehouses. Pearson.

Kroenke, D. 2014. Database Processing. Pearson.

Patrick, J.J. 2009. SQL Fundamentals. Pearson Education.

Ritchie, C. 2002. Relational database principles. London: Thomson Learning.

Introduction Page 5


Anon., 2015. deeptraining.com. [Online]

Available at: http://www.deeptraining.com/ [Accessed 01 October 2015].

Anon., 2015. w3resource. [Online]

Available at: http://www.w3resource.com/mysql-exercises/ [Accessed 1 October 2015].

Sebastian, 2014. Database Design Guide. [Online]

Available at: http://www.smart-it-consulting.com/database/progress-database-design-guide/

[Accessed 01 October 2015].

Vines, R., 2015. geekgirl's. [Online]

Available at: http://geekgirls.com/category/office/databases/ [Accessed 01 October 2015].

Note

Web pages provide access to a further range of Internet information sources.

Students must use this resource with care, justifying the use of information gathered.

Using this Study Guide

As indicated earlier, the Study Guide is your main source of information for this module.

The purpose of the Study Guide is to facilitate your learning and to help you to

master the module content. It also helps you to structure your learning and

manage your time, and provides outcomes and activities to help you master said outcomes.

The Study Guide has been carefully designed to optimise your study time and

maximise your learning, so that your learning experience is as meaningful and successful as possible. To deepen your learning and enhance your chances of

success, it is important that you read the Study Guide attentively and follow all instructions carefully. Pay special attention to the module outcomes at the

beginning of the Study Guide and at the beginning of each unit.

It is essential that you complete the exercises and other learning activities in the Study Guide as your module assessments (examinations, tests and

assignments) will be based on the assumption that you have completed them.

Purpose

The purpose of the Study Guide is to facilitate the learning process and to help you to structure your learning and to master the content of the module. It is

important for you to work through the Study Guide attentively and to follow all

Introduction Page 6


instructions set out therein. In this way you should be able to deepen your learning and enhance your chances of success.

Structure

The Study Guide is structured as follows:

Introduction

Unit 1 Database fundamentals

Unit 2 Database design

Unit 3 Database development

Unit 4 Database management

Glossary

Bibliography

Individual units

The individual units in the Study Guide are structured in the same way. Each unit contains the following features, which should enhance your learning

process:

Unit title

Each unit title is based on the title and/or content of a

specific learning outcome or Assessment Criterion/Criteria as

discussed in the unit.

Learning outcomes and

assessment criteria

The unit title is followed by an outline of the learning

outcomes and assessment criteria, which will guide your

learning process. It is important for you to become familiar

with the learning outcomes and assessment criteria as they

represent the overall purpose of the module as well as the

end product of what you should have learnt in the unit.

Learning objectives

Learning objectives, which follow the learning outcomes and

assessment criteria, are statements that define the expected

goals of the unit in terms of specific knowledge and skills

that you should acquire as a result of mastering the unit

content. Learning objectives clarify, organise and prioritise

learning and help you to evaluate your own progress,

thereby taking responsibility for your learning.

Introduction The learning objectives section is followed by an introduction

that identifies the key concepts of the unit.

Content

The content of each unit contains the theoretical foundation

of the module and is based on the work of experts in the

field of the module. Theory is illustrated by means of

relevant examples.

Concluding remarks

The concluding remarks at the end of each unit provide a

brief summary of the unit as well as an indication of what

you can expect in the following unit.

Self-assessment

The unit ends off with a number of theoretical self-

assessment questions that test your knowledge of the unit

content.

Introduction Page 7


Glossary

As you will see, we include a glossary at the end of the Study Guide. Please

refer to it as often as necessary in order to familiarise yourself with the exact meaning of terms and concepts.

The use of icons

Icons are used to highlight (emphasise) particular sections or points in the

Study Guide, to draw your attention to important aspects of the work or to highlight activities. The following icons are used in the Study Guide:

Activity This icon indicates learning activities/exercises that have to be

completed, whether individually or in groups, in order to assess (evaluate) your understanding of the content of a particular

section.

Definition

This icon appears when definitions of a particular term or concept are given in the text.

Example This icon points to a section in the text where relevant

examples of a particular topic (theme) or concept are provided.

Learning outcome alignment

This icon is used to indicate how individual units in the Study Guide are aligned to a specific outcome and its assessment

criteria.

Supplementary reading This icon indicates that you are expected to do some additional

(supplementary) reading – i.e. you should obtain additional information by consulting relevant, external information

sources.

Test your knowledge This icon appears at the end of each unit, indicating that you

are required to answer self-assessment questions to test your knowledge of the content of the foregoing unit.

Introduction Page 8


Alignment to Study Guide

The following table reflects the alignment between the learning outcomes,

assessment criteria and units in the Study Guide.

Study Guide alignment

Learning outcomes Assessment criteria Study Guide

unit

1. Discuss the role

purpose and use of

databases and data

management systems

1.1 Analyse the key issues and application

of databases within organisational

environments 1

1.2 Evaluate the features and advantages

of database management systems

2. Identify and explain

database design

techniques and

principles

2.1 Analyse a database developmental

methodology – normalization and ERD 2

2.2 Apply relation database design for a

given data set (3NF/ERD)

3. Design, create and

document databases

3.1 Apply the database developmental life

Cycle to a given data set

2 and 3 3.2 Design a fully functional database

(containing at least four inter-relational

tables), which includes a user interface

3.3 Provide supporting user and technical

documentation 4

Concluding remarks

At this point, you should be familiar with the module design and structure as well as with the use of the Study Guide.

In Unit 1, we start with the actual module content by exploring database

fundamentals.

Unit 1 – Database fundamentals Page 9


Unit 1 – Database fundamentals

Unit 1 is aligned with the following learning outcome and assessment criteria:

Learning outcome

LO1 Discuss the role purpose and use of

databases and data management systems

Assessment criteria

AC1.1 Analyse the key issues and applications of databases within an organisational environment

AC1.2 Evaluate the features and advantages of database management systems (DBMSs)

Learning objectives

After studying this unit, you should be able to: Understand the reasons for using databases

Identify the potential problems associated with lists Understand how using related tables helps to avoid the problems

associated with lists Know the components of database systems

Know the purpose of DBMSs

Understand the functions of database applications Understand the conceptual foundation of the Relational Model

Understand the meaning and importance of keys Apply the process of normalisation

Introduction

In this unit, we focus on databases and their role in maintaining information.

We discuss key database design concepts, which form an integral part of storing and retrieving data optimally. We conclude the unit by exploring DBMSs

and how these are applied by databases designers.

Activity

Before we define databases, write down what you think a database is and give examples of where databases are used.



1.1 Introduction to databases

Database A database is a collection of related information that is organised

so that it can be easily accessed, managed and updated.

Relational (tabular) databases are the most popular as they allow data to be

reorganised and accessed in different ways.

In computing, databases are often classified according to a company’s

approach. Modern applications, such as MySQL and Structured Query Language (SQL) Server, have become popular database tools.

A database is, generally, not portable across different DBMSs, however,

different DBMSs can operate in unison to some extent by using standards, such as SQL and Open Database Connectivity (ODBC), to support a single

application.

1.1.1 Database uses

Databases are used extensively in our day-to-day lives. By using such, items

can be easily tracked, for example, valuable items listed on lists. Lists, in turn, can be split into tables, which are a key component of databases.

Example A list of projects may include the project manager’s name,

identity (ID) and telephone extension. If a particular person is managing ten projects, their information would have to be

entered ten times.

In a list, each row could contain information on more than one theme. A list of projects may thus include the project manager’s

information (name, ID and telephone extension) as well as the project’s information (name, ID, start date and budget).

The advantages of using lists are:

They are simple to design. Entries can be sorted, e.g. alphabetically.

Data can easily be changed.

Spreadsheets can be used to store data.

The disadvantages of using lists are: They cause modification problems.

They cause data inconsistencies. They increase confusion and uncertainty when dealing with updates.

Relational databases are designed to address many information complexity

issues.



Using an old filing cabinet is one of the most well-known traditional manual processes used for information storage and management. In many paper-

based companies, folders are sorted and stored in the drawers of filing cabinets. Information, in turn, is stored in individual folders within each

drawer; there might even be a sequence of filing cabinets wherein a specific record can be found. Whether a manual process or a database is utilised,

organisation is key to managing information along with finding the correct information in the shortest possible amount of time.

Some of the most common database uses include:

Tracking statistics/trends. Automating manual processes to eliminate paper-based systems.

Managing different types of transaction performed by individuals or

businesses. Maintaining historic information/student records/telephone records

Organising library card catalogues/online music stores/hospital patient records.

1.1.2 Physical and logical views of databases

In most cases, when using databases, you are primarily concerned about using information, not with the actual storage or size of each piece of data. You

would be focused on ensuring that your request or action yields a correct result. When a database system provides you with this, it does not include

details about how data is stored, only a conceptual representation of said data.

A logical database structure refers to the structure of data and the relationship

between different pieces of information. The logical view/user’s view of databases represents data in a format that is meaningful to both the user and

software program that processes said data. Information about the structure and relationship is thus not presented. This type of data model uses logical

concepts (such as objects, their properties and their relationships), which are easy to understand.

A physical database describes how the structure in a logical database and the search paths between them are implemented. Unless indicated otherwise, the

term ‘database’ refers to a logical database. Database specialists use a physical view to make efficient use of storage and processing resources.

Therefore, a logical view refers to how users view data; a physical view refers to how data is physically stored and processed.

1.1.3 Using relational database tables

The problems with using lists were first identified in the 1960s; subsequently,

a number of techniques have been developed to solve them. Over time, a methodology called the ‘Relational Model’ emerged as the leading solution.

Today, almost every commercial database is based on the Relational Model.



A relational database contains a collection of separate tables. A table holds data about one – and only one – theme (in most circumstances). If a table has

two or more themes, it is broken up into two or more tables.

Tables and spreadsheets (also known as ‘worksheets’) are similar in that you can think of both as having rows, columns and cells. The details that define

tables as something different from spreadsheets are discussed in later units.

There are three basic table modifications, namely: 1. Insert: Insert data by adding data to a table

2. Update: Update data values by changing data in a table without unintended consequences

3. Delete: Delete data values in a table

Each table is assigned an ID column that assigns a unique identifying number

to each row; this is necessary as some tables might have the same name. The IDs are later used as linking columns to other tables.

1.1.4 Processing relational database tables

Some may consider tearing lists up to eliminate processing problems, but what if a user wants to view data in the format of an original list? With the data

separated into different tables, a user will have to jump from one table to the next to find the information that they want; this jumping around will become

tedious.

Several approaches have been invented for combining, querying and

processing tables. Over time, one of those approaches, a language called ‘Structured Query Language’ (SQL), emerged as the leading technique for data

definition and manipulation. Today, SQL is an international standard.

Using SQL, you can: Reconstruct lists from their underlying tables.

Query-specific data conditions. Perform computations on data in tables.

Insert, update and delete data.

1.1.5 The database environment

The database environment is, essentially, a place where databases reside. Within these, users and database administrators will require access to data.

Users might come from within the database environment or might have a remote connection to it. Users and administrators will, typically, perform a

range of tasks. These may include activities, such as data mining (collection), data modification or data insertion. Certain users might be restrained, either

physically or logically, from accessing data. Permissions to access data are, usually, configured by database owners or administrators.



We will now explore the different components of a database environment.

1.1.5.1 Human component

Data administrators

The function of data administrators is to manage or have direct control over a database.

System developers

System developers are responsible for the development of applications, breaking data down into smaller steps and computer code, and transferring

them into more comfortable and understandable forms for end users.

End users

End users use a system/application.

1.1.5.2 Hardware/software component

User interface

This is what a system/application looks like. It is an easy and understandable way for end users to interact with and deploy.

Application programs

These execute what the system/application’s users would want to do with it.

1.1.5.3 Database component

Repository

This contains all data definitions, screen and report formats, and definitions of other organisation and system components.

DBMS A commercial software system used to create, maintain and provide controlled

access to a database repository.

Database A collection of data that meets the information needs of multiple users in an

organisation.

We have now defined the different database components, however, various possible environments exist for databases. In our case, an ‘environment’ refers

to the system/application configuration in which databases run.

1.2 Database systems

A database system implies that data is managed to some level of quality (measured in terms of accuracy, availability, usability and resilience); this, in

turn, often implies the use of a general-purpose DBMS.



Well-known DBMSs include Oracle, IBM DB2, Microsoft SQL Server, Microsoft Access, MySQL and SQLite.

A database system has four components (Figure 1), namely:

1. Users 2. Database application

3. DBMS 4. Database

Figure 1 – Components of a database system

Source: Herbert Zuze

A database is a collection of related tables and other structures.

A DBMS is a computer program used to create, process and administer the

database. It receives requests encoded in SQL and translates them into actions. A DBMS is a large, complicated program that is licensed from a

software vendor; companies almost never write their own DBMS programs.

A database application is a set of one or more computer programs; that serves as an intermediary between users and the DBMS. Application programs

read or modify database data by sending SQL statements to the DBMS. They

also present data to users in form and report format. Application programs can be acquired from software vendors; they are also frequently written in-house.

Users employ a database application to keep track of items. They use forms to

read, enter and query data as well as to produce reports.

Of these mentioned components, we will consider the database, DBMS and database application in more detail.

1.2.1 Database component

As indicated earlier, in the broadest sense, a database can be defined as a self-

describing collection of related records or a self-describing collection of related tables.



The two key terms herein are: 1. Self-describing

2. Related tables

Self-describing A database structure contained within the database itself. The

content of such a database can always be determined by looking inside the database itself; this situation is similar to that of a

library, where you can tell what is in the library by examining the catalogue that resides within it.

You already have a good idea of related tables. An example of related tables would be the ADVISOR and STUDENT tables, which are related by the common

column AdvisorName.

Data about the structure of databases is called ‘meta data’. Examples of meta

data are the names of tables, the names of columns and the tables to which they belong, the properties of tables and columns, etc.

All DBMS products provide a set of tools to display the structure of their

database. Other tools describe the structure of tables and others the components.

The content of a database is illustrated in Figure 2: a database has user data

and meta data, as already explained. It also has indices and other structures that exist to improve performance. Some databases may also contain

application meta data; this is, data that describes application elements, such as forms and reports. For example, MySQL carries application meta data as

part of its databases:

Figure 2 – Database content




1.2.2 DBMS component

The purpose of a DBMS is to create, process and administer databases. DBMSs

are large, complicated products that are almost always licensed from software vendors. Open source DBMS products are also available.

A DBMS consists of three main components, namely:

1. A data definition language (DDL) 2. A data manipulation language (DML)

3. A data dictionary

A DDL is a formal language used by programmers to conceive a database’s

structure. This could include the creation or alteration of columns, constraints, tables and indices as well as the specification of any database file storage

location.

A DML is a specialised language that is used in conjunction with other programming languages to update data within a database. This language

contains commands that allow for the extraction of data from a database. The most prominent DML is SQL.

A data dictionary is a file that stores definitions of data elements and

characteristics, such as usage, ownership, authorisation and security. Many data dictionaries can produce lists and reports. Most data dictionaries are

passive; they simply report. More advanced types are considered to be active

when a change in the dictionary is automatically utilised by any related program.

DBMS functions include:

• Creating databases • Creating tables

• Creating supporting structures, e.g. indices • Reading database data

• Modifying (inserting, updating or deleting) database data • Maintaining database structures

• Enforcing rules • Controlling concurrency

• Providing security • Performing backup and recovery

DBMSs are used to create databases and tables as well as other supporting structures inside them.



Example As an example of the latter, suppose that we have a STUDENT

table: The table has 10 000 rows

It includes a column, CourseName, that records the name of the course for which a student is enrolled

We frequently need to access student data via CourseName

Because this is a large database, searching through the table to find, for example, all students in the BSc IT field, would take a

long time. To improve performance, we can create an index (similar to the index at the back of a book) for CourseName to

show which students are enrolled for which courses. Such an

index is an example of a supporting structure that is created and maintained by a DBMS.

DBMSs also read and modify database data. To do this, a DBMS receives SQL

and other requests and transforms it into actions.

Another DBMS function is to maintain the database structure. For example, it might be necessary to change the format of a table or another supporting

structure from time to time.

With most DBMS products, it is possible to declare rules regarding data values.

Example In the IT COURSE database tables, what would happen if a user

mistakenly entered a value of 9 for CustomerID in the

ENROLMENT table? No such customer exists, so such a value would cause numerous errors.

To prevent this situation, it is possible to instruct the DBMS that

any value of CustomerID in the ENROLMENT table must already be a value of CustomerID in the CUSTOMER table. If no such

value exists, the insert or update request should be rejected. The DBMS then enforces these rules, which are called ‘referential

integrity constraints’.

The last three functions of a DBMS have to do with database administration: a DBMS controls concurrency by ensuring that one user’s work does not

inappropriately interfere with another user’s work. Also, a DBMS contains a security system that is used to ensure that only authorised users perform

authorised actions on a database. For example, users can be prevented from

seeing certain data. Similarly, user actions can be confined to making only certain types of data change on specified data. Finally, a DBMS provides

facilities for backing up database data and recovering it from backups, when necessary. Databases, as centralised repositories of data, are valuable

organisational assets.



Example

Consider the value of a book database to a company such as Amazon.com. Because their database is so important, steps need

to be taken to ensure that no data will be lost in the event of error, hardware or software problems, natural or human

catastrophes.

1.2.3 Database application component

A database application creates and processes forms. A typical form is used for entering and processing data.

The functions of database application programs include:

• Creating and processing forms • Processing user queries

• Creating and processing reports • Executing application logic

• Controlling applications

Forms hide the structure of underlying tables from users. The goal of a form is to present data in a format that is useful to users, regardless of the underlying

table structure. Behind forms, applications process a database in accordance with user actions. Applications generate SQL statements to insert, update or

delete data.

Application programs, further, process user queries. This first generates a

query request and then sends it to the DBMS; results are then formatted and returned to the user.

Applications also create and process reports. This function is somewhat similar

to the former as application programs first query DBMSs for data. The applications then format the query results as a report.

A report displays all database data in a defined order. A report, similar to a

form, is structured according to user needs and not according to the underlying table structure.

In addition to generating forms, queries and reports, application programs take

other actions to update databases in accordance with application-specific logic.

Example

Suppose that a user, using an order entry application, requests ten units of a particular item and when the application queries

the database (via the DBMS), it finds that only eight units are in stock. What should happen?



It depends on the logic of that particular application. Perhaps no units should be removed from inventory and the user should be

notified or perhaps the eight units should be removed and two more placed on back order. Whatever the case may be, it is the

duty of the application program to execute appropriate logic.

To control an application, applications need to be written so that only logical options are presented to users.

Example

An application may generate a menu with choices. In such a case, the application needs to ensure that only appropriate

choices are available. The application then needs to control data activities with the DBMS. It might direct the DBMS, for example,

to make a certain set of data changes as a unit. The application might tell the DBMS to either make all of the changes or none of

them.

1.2.4 Common DBMS data types

Table 1 summarises the common DBMS data types:

Table 1 – Common DBMS data types

Data type Description

BinaryBinary Length 0 to 8 000 bytes

CharCharacter Length 0 to 8 000 bytes

DateTime An 8 byte DateTime; ranges from January 1, 1753, to December 31,

9999, with an accuracy of three-hundredths of a second

Image Variable length binary data; maximum length of 2 147 483 647 bytes

Integer A 4 byte Integer; ranges from –2 147 483 648 to 2 147 483 647

Numeric Decimal; ranges from 1 038 + 1 to 1 038 − 1

Text Variable length text; maximum length of 2 147 483 648 characters

Varchar Variable length character; length 0 to 8 000 bytes


1.2.5 Personal vs enterprise-class database systems

Database technology can be used in a wide array of applications. On one end of the spectrum, a researcher might use database technology to track

experiment results performed in a laboratory. Such a database might include only a few tables and each table would have, at most, 700 rows. The

researcher would be the only user of this application; this would, typically, be a personal database system.

At the other end of the spectrum, enormous databases support international

organisations. They have hundreds of tables with millions of rows of data and support thousands of concurrent users. These databases are in use 24 hours a



day, 7 days a week. Just making a backup of this is a difficult task. These databases are enterprise-class database systems.

Figure 3 illustrates the four components of a personal database system. In this

figure, Microsoft Access takes on the role of both the database application and DBMS. Microsoft designed Microsoft Access in this way to make it easier for

people to build personal database systems. Using Microsoft Access, you can switch between DBMS functions and application functions and never know the

difference:

Figure 3 – Personal database system


Microsoft Access 2013 is a commonly used personal DBMS and is available as part of the Microsoft Office 2013 suite.

Personal DBMSs do have limitations in terms of concurrent user connection as

well as the fact that they become easily corrupt.

Enterprise-class DBMSs are widely used in organisations to support business systems. They reside in central locations and can be accessed via ODBC

connections; they are capable of multiple user connections.

The problem with database technology often being hidden is that you do not

understand what is being done on your behalf. If you want to develop larger database systems, you need to learn all hidden technology.



Figure 4 represents an enterprise-class database system:

Figure 4 – Enterprise-class database system


Supplementary reading

Kroenke, D.M. & Auer, D.J. 2013. Database concepts. 6th edition. New Jersey: Prentice Hall. Chapter 1.

1.2.6 Data diagrams

Different diagrams have been developed to manage data in databases. When

planning a database, you would draw a diagram to see how the elements inside such relate to one another. Each type of diagram is slightly different and

has a place in the development of a database plan.

There are four main types of data diagram, namely: 1. Hierarchical Model

2. Network Model

3. Object-orientated Model 4. Relational Model

1.2.6.1 Hierarchical Model

The Hierarchical Model consists of files with pointers from parent files to child

files that link related information together. It uses a tree structure to represent data. A child file can only have one parent file, while a parent file can have

multiple child files. This means that the structure may have data elements that are repeated (mostly in child elements).



1.2.6.2 Network Model

The Network Model is similar to the Hierarchical Model, except that pointers

that link related information can point in both directions. It permits many-to-many (N:M) relationships, which are illegal in other models.

This model was created for three main purposes, namely:

1. To represent complex data relationships more effectively 2. To improve database performance

3. To impose a database standard

1.2.6.3 Object-Oriented Model

The Object-Oriented Model organises data into a hierarchical class of objects.

This form of modelling was developed to bring about more persistent storage

of programming language objects.

1.2.6.4 Relational Model

The Relational Model represents data as a set of relations in which data is

stored. Each column is unique and each field may contain unique keys. Relational databases are the most commonly used databases.

Advantages of the Relational Model are:

It reduces data redundancy (this refers to unnecessary data that can be calculated)

It ensures data integrity (this means that data is reliable) It enforces data security

It can be used by many people concurrently It supports shared data

It accommodates changes, growth and new information requirements

Apart from these models, many other models exist, such as object/relational,

semi-structured, associative, entity-attribute-value (EAV) and context diagrams.

1.3 Relational Model

1.3.1 Relations

Relations are two-dimensional tables that consist of rows and columns. Relations have the following characteristics:

Each row holds data that pertain to some entity or a portion of some entity. Each column contains data that represents an attribute of an entity. For

example, in the EMPLOYEE relation, each row would contain data about a particular employee and each column would contain data representing an

attribute of said employee, such as LastName, Phone or E-mailAddress. Each cell must hold a single value; no repeating elements are allowed



All entries in a column must be of the same kind. For example, if the third column in the first row contains EmployeeNumber, then the third column in

all other rows must also contain EmployeeNumber. Each column must have a unique name.

The order of columns is unimportant. The order of rows is unimportant.

The set of data values in each row must be unique; no two rows may hold identical sets of data value.

Figure 5 illustrates an example of a relation:

Figure 5 – Example of a relation


1.3.2 Types of key

A key is one or more columns of a relation that is used to identify a row. A key

can be unique or non-unique.

Example In the EMPLOYEE relation in Figure 5, EmployeeNumber is a

unique key; it identifies a unique row. A query to display all employees having an EmployeeNumber of 200 will produce a

single row result.

In contrast, Department is a non-unique key. It is a key because it is used to identify a row, but it is non-unique because a value

of Department potentially identifies more than one row.

1.3.2.1 Composite keys

A composite key is a key that contains two or more attributes.

Example Suppose that we are looking for a unique key for the EMPLOYEE

relation: users say that although LastName is not unique, the combination of LastName and Department is unique.

Let us assume that the users know that two people with the same

last name will never work in the same department. Two Johnsons, for example, will never work in Accounting. If that is



the case, then the combination (LastName, Department) is a unique composite key.

Alternatively, if the users know that the combination (LastName,

Department) is not unique, but that another combination (FirstName, LastName, Department) is unique, then the latter

combination is a composite key with three attributes.

1.3.2.2 Candidate and Primary Keys (PK)

Candidate keys are keys that uniquely identify each row in a relation; they can

be single column keys or they can be composite keys.

A PK is a candidate key that is chosen as the key the DBMS will use to uniquely

identify each row in a relation.

Example Suppose that we have the following EMPLOYEE relation:

EMPLOYEE (EmployeeNumber, FirstName, LastName,

Department, E-mail, Phone)

The users tell us that EmployeeNumber is a unique key, E-mail is a unique key and the composite key (FirstName,

LastName, Department) is a unique key. Therefore, we have three candidate keys.

When designing the database, we choose one of the candidate

keys to be the PK. In this case, for example, we use

EmployeeNumber as the PK. The PK is important not only because it can be used to identify unique rows, but also because

it can be used to represent rows in relationships.

1.3.2.3 Surrogate keys

A surrogate key is a column with a unique, DBMS-assigned identifier added to

a table to be the PK. The unique values of the surrogate key are assigned by the DBMS each time a row is created; in addition, the values never change.

An ideal PK is short, numeric and never changes. Sometimes, one column in a

table will meet these requirements or come close to them. EmployeeNumber in the EMPLOYEE relation should work very well as a PK.



Example Consider the following relation:

PROPERTY (Street, City, State, ZIP, OwnerID)

The PK of PROPERTY is (Street, City, State, ZIP), which is long

and non-numeric (although it probably will not change). This is thus not an ideal PK. In cases such as this, the database designer

will add a surrogate key, such as PropertyID:

PROPERTY (PropertyID, Street, City, State, ZIP, OwnerID)

Surrogate keys are short, numeric and never change, therefore, they are ideal

PKs. Because the value of the surrogate PK will have no meaning to users, they are often hidden on forms, query results and reports. Most DBMS

products have a facility that automatically generates key values. In Microsoft Access, data type is set to AutoNumber.

1.3.2.4 Foreign Keys (FK) and referential integrity

Values from one relation into a second relation represent a relationship. The values we use are the PK values (including composite PK values, when

necessary) of the first relation. When we do this, the attribute in the second relation that holds these values is the FK. For example, CustomerNumber is

the PK of a relation that is foreign to the table in which it resides.

Example Consider the following two relations, where (apart from the

EMPLOYEE relation) we now have a DEPARTMENT relation to hold

data about departments:

EMPLOYEE (EmployeeNumber, FirstName, LastName, Department, E-mail, Phone)

and

DEPARTMENT (DepartmentName, BudgetCode, OfficeNumber,

DepartmentPhone)

EmployeeNumber and DepartmentName are the PKs of EMPLOYEE and DEPARTMENT respectively.

Now, suppose that Department in EMPLOYEE contains the names

of the departments in which employees work and that

DepartmentName in DEPARTMENT also contains these names. In such a case, Department in EMPLOYEE is said to be a FK to

DEPARTMENT.



Note that it is not necessary for PKs and FKs to have the same column name. In most cases, it is important to ensure that every value of a FK matches a

value of the PK. Such a rule is called a ‘referential integrity constraint’. Whenever you see a FK, you should always look for an associated referential

integrity constraint.

1.3.2.5 Null values

A null value is a missing value in a cell in a relation. Null values are ambiguous as they are not easy to interpret. There are three possible meanings that can

be deduced from null values, namely: 1. There might be no appropriate value for the entry.

2. There is only one known value for the entry or the value has not yet been decided.

3. The value is unknown.

1.4 Functional dependency and normalisation

1.4.1 Functional dependency

Functional dependency exists when a relationship between two attributes

allows you to uniquely determine the corresponding attribute’s value. The normalisation process starts with this and acts as a constraint between the two

attribute sets in a relation.

Functional dependency helps to identify redundancy problems and suggest

refinements. Information redundancy results from the association of different types of information within a single relationship schema. Decomposing such a

relationship schema may address the problem. Conversely, decomposing may increase the time needed to evaluate queries.

1.4.2 Normalisation

Normalisation is a systematic process of decomposing relations with anomalies into smaller sets of well-structured relation. The process eliminates data

redundancy and ensures data integrity.



Normalisation ensures that: Rows contain data about an entity.

Columns contain data about attributes of an entity. Cells hold a single value.

All entries in a column are of the same kind. Each column has a unique name.

No two rows hold identical sets of data value.

The benefits of normalisation include: The elimination of data redundancy reduces the size of a database, hence

less money will be spent on storage. Less data for searching means faster querying of data.

No duplication of data results, hence there is a lowered risk of errors.

A change made on a linking record is instantly cascaded across all related tables.

Fewer indices per table mean faster maintenance tasks, such as index rebuilding relational design principles.

Design principles can be regarded as well formed, if:

Every determinant is a candidate key. Any relation that is not well formed is broken down into two or more

relations that are well formed.

The above design principles are at the heart of normalisation.

There are many defined normal forms. Technically, well-formed relations are those that are said to be in Boyce-Codd Normal Form (BCNF), namely:

First normal form (1NF)

Second normal form (2NF) Third normal form (3NF)

Note that further steps (4NF, 5NF, 6NF) were created in order to cater for rare

instances, such as ambiguity among attributes and tables. They are, however, outside the scope of this module, therefore, the focus will fall on the first three

normal forms.

1.4.2.1 Normalisation terminology

Note the following terminology as applicable to normalisation:

Composite PK: a combination of two or more attributes from a group or entity that form a PK for the group or entity involved.

PK: a PK in normalisation is the same as a unique identifier in Entity-Relationship (ER) modelling.

FK: a FK is a data item in a group that has the same value as the PK in

another group; in normalisation, a FK links two groups. Dependency: a data item (in other words, an attribute) that is dependent

on another data item as it has no meaning or purpose without its determinant.



Normal form: a step or stage during the normalisation process; each normal form applies a certain rule to the data being normalised, which

eventually causes the data to be separated into more logically related groups.

Data group: a data group results from the data being normalised, depending on which normal form is being applied to it; a data group is not

a table or an entity. Repeating group: a set of logically related data that repeats for a specific

unique ID.

1.4.2.2 Normalisation steps

There are a number of steps that should be followed to reach 3NF, namely:

Collect the raw data (0NF).

List the raw data, grouping related data together, and identify a PK (0NF). Remove repeating groups (1NF).

Remove part-key dependencies (2NF). Remove inter-data dependencies (3NF).



1.5 Concluding remarks

In this unit, we explained databases and reviewed key concepts related to it.

An overview of DBMSs was provided.

In the next unit, we will explain how databases are designed.

1.6 Self-assessment

Test your knowledge

1. Why is the study of database technology important?

2. Describe the purpose of databases.



3. What is a modification problem? What are the three possible types of modification problem?

4. What is an ID column?

5. What does SQL stand for and what purpose does it serve?

6. Why is the Relational Model important?

7. List the characteristics that tables must have to be considered as

relations.

8. Give an example of a relation (other than examples from the Study

Guide).

9. Under which circumstances can attributes be of variable length?

10. Define ‘unique’ key and give an example.

11. Define ‘referential integrity constraint’ and give an example.

Unit 2 – Database design Page 30


Unit 2: Database design

Unit 2 is aligned with the following learning outcomes and assessment criteria:

Learning outcomes

LO2 Identify and explain database design techniques and principles

LO3 Design, create and document databases

Assessment criteria AC2.1 Analyse a database developmental methodology –

normalization and ERD AC2.2: Apply relation database design for a given data set

(3NF/ERD)

AC3.1 Apply the database developmental life cycle to a given data set

AC3.2 Design a fully functional database (containing at least four inter-relational tables), which includes a user

interface

Learning objectives

After studying this unit, you should be able to:

Understand the basic stages of database development Understand the purpose and role of data models

Know the principal components of the ER Data Model Know how to present one-to-one (1:1), one-to-many (1:N) and N:M binary

relationships within the ER Data Model Construct ER diagrams

Know how to interpret Information Engineering (IE) Crow’s Foot ER diagrams

Know how to present sub-type entities and recursive relationships within

the ER Data Model Understand the nature and background of normalisation theory

Understand the need for denormalisation

Introduction

In this unit, we focus on database design. We provide an overview of the

database design process and describe data modelling, a technique for

representing database requirements. We also explore how to transform data models into relational database design.



Activity Before we explore database design, write down what you think

about them.

2.1 Database design methods and methodologies

2.1.1 Database development stages

The three primary stages of database development are:

Requirements analysis

Component design Implementation

The above stages form part of the Systems Development Life Cycle (SDLC).

The SDLC is a classic methodology used to develop information systems. Each stage should produce one or more deliverables, which entail the following

(Figure 6): System definition

Requirements analysis Component design

Implementation System maintenance



Figure 6 – SDLC


2.1.1.1 System definition

Input: the need for an information system to support a business process

Output: a project plan

During this stage, the following occurs: The system project goals and scope of information are defined.

The feasibility of the project (cost, schedule, technical, organisational) is assessed.

The project team is chosen. The project (specified tasks, assigned personnel, determined task

dependencies, set schedules) is planned.



2.1.1.2 Requirements analysis

Input: the project plan Output: a set of approved user requirements

The deliverables of this stage may include a data model, User Requirements Document (URD) and Statement Of Work (SOW). Sources of information

include user interviews, forms, reports, queries, use cases and business rules.

During this stage, the following occurs: User interviews are conducted.

Existing systems are evaluated. New required forms, reports and queries are determined.

New application features and functions are identified. Security issues are considered.

A data model is created. The five components of an information system are considered (hardware,

software, data, procedures and people).

2.1.1.3 Component design

Input: approved user requirements Output: a final system design

Database and documented system design may be part of this stage’s

deliverables.

During this stage, the following occurs: Hardware and software specifications are determined.

Database design, business procedures and job descriptions for business personnel are created.

2.1.1.4 Implementation

Input: the final system design

Output: a final system

During this stage, the following occurs:

System components are built. Component tests are conducted.

Components are integrated. Integrated components are tested.

The new system conversion takes place.



2.1.1.5 System maintenance

Input: the implemented system Output: an updated system or a request for system modification using the

SDLC

In this stage, the system is updated with patch service packs and new software

releases. Requests for system changes or enhancements are recorded and prioritised.

2.1.2 Structured Systems Analysis and Design Method (SSADM)

SSADM is a software development approach that divides application development projects into modules, stages, steps and tasks. It provides a

framework in which to describe projects.

SSADM is a formal approach that sets out a cascade or waterfall view of

information system development, in which there are a series of steps, each of which leads to the next.

The SSADM’s steps are:

Conduct a feasibility study. Investigate the current environment.

Determine the business system options. Define the requirements.

Determine the technical system options. Create a logical design.

Create a physical design.

Advantages of the SSADM approach include: It enables users to receive systems on time.

It delivers systems that meet user needs.

It delivers systems which respond to changes in the business environment. It improves the effective and economic use of available skills.

It improves quality by reducing error rates. It improves flexibility.

It improves productivity. It avoids lock-in to single sources of supply.

2.1.3 Agile Software Development methodology

Agile is an iterative and incremental software development approach that focuses on continuous feedback that leads to a refined delivery of working,

tested software. The Dynamic Systems Development Model (DSDM) and Scrum are agile approach examples.



When using the agile approach, risks are minimised by developing software in

iterations, which last one to four weeks. Each iteration can be viewed as a mini project on its own, which has tasks and deliverables at the end of the iteration

period. Released deliverables follow a project cycle of planning, requirements

analysis, design, coding, testing and documentation. The intent of every shippable product/iteration is to result in a meaningful and functional project.

After delivering an iteration, the team evaluates performance and priorities, noting all problems and successes.

Advantages of the agile approach include:

Rapid customer satisfaction owing to the continuous delivery of useful software

Clear communication of requirements owing to active client involvement Easier prioritisation of requirements in accordance with user needs

2.1.4 Rapid Application Development (RAD) methodology

RAD is a faster and higher quality software development approach. RAD does

the following: It gathers information via focus groups or workshops

It employs prototyping and user testing It reuses software components

It keeps review meetings and other team communication informal

Advantages of RAD include: It reduced development time

It increased the reusability of components It encourages customer feedback

2.1.5 Spiral methodology

The Spiral methodology focuses on the early identification and reduction of

project risks. It iterates through four phases, namely: Determine objectives

Identify and resolve risks Develop and test

Plan the next iteration

Advantages of the Spiral methodology include: It is suitable for large and high risk projects

It enhances risk avoidance owing to high amounts of risk analysis Software is produced early on in the SDLC



2.1.6 Waterfall methodology

The Waterfall Methodology (Figure 7) is a popular, rigid and linear traditional

software development approach. Each phase has distinct goals and has to be completed before starting the next; there is no turning back:

Figure 7 – Waterfall Methodology


Advantages of the Waterfall methodology include: It is easy and simple to understand and use.

It is easy to manage owing to the rigidity of the model. Phases are processed and completed one at a time.

2.2 Relational database design: ER modelling

2.2.1 ER Data Model

After system requirements have been gathered, they are transformed into data models. A number of different techniques can be used to create these; by far

the most popular is the ER Data Model, first published by Peter Chen in 1976.

Chen’s basic model has since been extended to create the Extended ER Data Model.



The most important ER Data Model elements are:

Entities Attributes

Identifiers

Relationships

2.2.1.1 Entities

Entity classes vs entity instances

An entity is something that a user wants to track. An entity class is a description of the structure and format of an entity. Entities of a given type are

grouped into entity classes. For example, the STUDENT entity class is a collection of all STUDENT entities. Entity classes are displayed with capital

letters. An entity instance is a specific occurrence of an entity within an entity class (Figure 8):

Figure 8 – ITEM entity with two entity instances


When developing data models, developers analyse forms, reports, queries and

other system requirements. Entities are, usually, the subject of one or more forms/reports or they are a major section in one or more forms/reports.

2.2.1.2 Attributes

Entities have attributes that describe entity characteristics, for example,

ProjectName, StartDate, ProjectType, ProjectDescription. An attribute has a data type (character, numeric, date, currency and the like) and properties that

are determined from system requirements. Properties specify whether an attribute is required, whether this has a default value and value limit as well as

any other constraints.



2.2.1.3 Identifiers

Entity instances have identifiers, which are attributes that name or identify a particular instance in an entity class, for example, SocialSecurityNumber,

StudentID, EmployeeID, etc. An identifier of an entity instance consists of one

or more of the entity’s attributes.

Types of identifier Uniqueness:

An identifier may be unique or non-unique. If an identifier is unique, the data value for it must be unique for all

instances.

Composite: A composite identifier consists of two or more attributes, e.g. OrderNumber

and LineItemNumber, which are both required.

Figure 9 illustrates the levels of entity attribute display:

Figure 9 – Levels of entity attribute display


2.2.1.4 Relationships

Entities can be associated with one another via relationships. The ER Data Model contains relationship classes and instances. Relationship classes are

associations among entity classes whereas relationship instances are associations among entity instances.

Relationship degrees define the number of entity classes participating in a

relationship:

Degree 2 is a binary relationship (Figure 10) Degree 3 is a ternary relationship (Figure 11)



Figure 10 – Binary relationship


Figure 11 – Ternary relationship


There are three types of binary relationship, namely:

1:1

1:N N:M

1:1 binary relationships

A single entity instance in one entity class is related to a single entity instance in another entity class. For example, an employee may have no more than one

locker and a locker may only be accessible to one employee (Figure 12):

Figure 12 – 1:1 binary relationship


1:N binary relationships

A single entity instance in one entity class is related to many entity instances in another entity class. For example, a quotation is associated with only one

item whereas an item may have several quotations (Figure 13):



Figure 13 – 1:N binary relationship


N:M binary relationships Many entity instances in one entity class are related to many entity instances

in another entity class. For example, a supplier may supply several items and a particular item may be supplied by several suppliers (Figure 14):

Figure 14 – N:M binary relationship


Maximum and minimum cardinality

Relationships are named and classified by their cardinality, which is a word

meaning ‘count’.

Each of the three types of binary relationship has different maximum cardinalities. Maximum cardinality is the maximum number of entity instances

that may participate in a relationship instance (one, many or a fixed amount).

Minimum cardinality is the minimum number of entity instances that must participate in a relationship instance. These values, typically, assume a value

of zero (optional) or one (mandatory).

Example In Figure 15, maximum cardinality is many for both ITEM and

SUPPLIER. Minimum cardinality is zero (optional) for ITEM and one (mandatory) for SUPPLIER: a supplier does not have to

supply an item, but an item must have a supplier:



Figure 15 – Relationship cardinalities


2.2.2 ER diagrams

The sketches in Figure 10 to Figure 15 are called ‘ER diagrams’. The relationships in these are called ‘HAS-A relationships’. This term is used

because each entity instance ‘has a’ relationship to a second entity instance, for example, an employee has a badge and a badge has an employee.

Note the following about ER diagrams: Entity classes are shown with rectangles.

Relationships are shown with diamonds. Maximum cardinality is shown inside diamonds.

Minimum cardinality is shown with ovals or hash marks next to entities. The name of an entity is shown inside rectangles.

The name of a relationship is shown near diamonds.

2.2.2.1 Types of ER diagram

The types of ER diagram are: IE: uses ‘crow’s feet’ to show the many sides of a relationship; it is often

referred to as the ‘Crow’s Foot Model’ Integrated Definition 1, Extended 3 (IDEF1X): a version of the ER Data

Model that is a national standard Unified modelling language (UML): a set of structures and techniques for

modelling and designing object-oriented programs and applications

Crow’s Foot symbols Figure 16 illustrates Crow’s Foot symbols:



Figure 16 – Crow’s Foot symbols


Crow’s Foot example: 1:N relationship Figure 17 illustrates a 1:N relationship: Crow’s Foot:

Figure 17 – Two versions of a 1:N relationship




Crow’s Foot example: N:M relationship

Figure 18 illustrates an N:M relationship: Crow’s Foot:

Figure 18 – Two versions of an N:M relationship


Weak entities

A weak entity is an entity that cannot exist in a database without the existence of another entity. Any entity that is not a weak entity is called a ‘strong entity’.

ID-dependent weak entities An ID-dependent weak entity is a weak entity that cannot exist without its

parent entity. An ID-dependent weak entity has a composite identifier. The first part of said identifier is the identifier for the strong entity and the second

part of the identifier is the identifier for the weak entity.

An identifying relationship is a type of relationship that is shown by a solid line representing the relationship between an ID-dependent entity and its parent

entity. A relationship drawn with a dashed line used between strong entities is called a ‘non-identifying relationship’; there are no ID-dependent entities in

this kind of relationship.



Figure 19 illustrates ID-dependent weak entities:

Figure 19 – ID-dependent weak entities


Non-ID-dependent weak entities

All ID-dependent entities are weak entities, however, there are other entities that are weak but not ID-dependent. A non-ID-dependent weak entity may

have a single or composite identifier; the identifier of the parent entity will be a FK.

Figure 20 illustrates non-ID-dependent weak entities:

Figure 20 – Non-ID-dependent weak entities




Figure 21 illustrates strong and weak entity examples:

Figure 21 – Strong and weak entity examples


2.2.3 Sub-type entities

A sub-type entity is a special case of another entity called a ‘super-type’. An attribute of a super-type may be included; this indicates which sub-type is

appropriate for a given instance; this attribute is called a ‘discriminator’.

Sub-types can be exclusive or inclusive:

If exclusive, the super-type relates to, at most, one sub-type. If inclusive, the super-type can relate to one or more sub-types.

2.2.3.1 Sub-type entity identifiers

The relationships that connect super-types and sub-types are called ‘IS-A relationships’; sub-types are the same entities as super-types. The identifier of

a super-type and all of its sub-types is the same attribute.

Figure 22 illustrates sub-type entity examples:



Figure 22 – Sub-type entity examples


2.2.4 Recursive relationships

It is possible for an entity to have a relationship with itself; this is called a

‘recursive relationship’. As with binary relationships, recursive relationships (Figure 23) can be 1:1, 1:N and N:M:

Figure 23 – Recursive relationship


2.2.5 Developing ER diagrams

Example

Heather Sweeney Designs will be used as an ongoing example for the rest of the Study Guide. Heather Sweeney is an interior

designer who specialises in home kitchen design. She offers a variety of free seminars at home shows, kitchen and appliance

stores, and other public locations. She earns revenue by selling books and videos. She also offers custom-design consulting

services.

Figure 24 illustrates Heather Sweeney Designs: a seminar customer list:



Figure 24 – Seminar customer list


Figure 25 illustrates Heather Sweeney Designs: an initial ER diagram:

Figure 25 – Initial ER diagram




Figure 26 illustrates Heather Sweeney Designs: a customer form letter:

Figure 26 – Customer form letter




Figure 27 illustrates Heather Sweeney Designs: a data model with CONTACT:

Figure 27 – Data model with CONTACT




Figure 28 illustrates Heather Sweeney Designs: a sales invoice:

Figure 28 – Sales invoice




Figure 29 illustrates Heather Sweeney Designs: a data model with INVOICE:

Figure 29 – Data model with INVOICE


Figure 30 illustrates Heather Sweeney Designs: a data model with LINE_ITEM:

Figure 30 – Data model with LINE_ITEM




Figure 31 illustrates Heather Sweeney Designs: a final data model:

Figure 31 – Final data model


Business rules need to be recorded. Heather Sweeney Designs has a business

rule that no more than one form letter or e-mail per day is to be sent to a

customer. After the data model has been completed, this needs to be validated. Prototyping is commonly used to validate forms and reports.

Business rules are constraints on database activities and, generally, these arise from business policy and practice.

2.3 Relational database design: normalisation

Recall the introduction to normalisation as discussed in Unit 1 before starting

with this section.

2.3.1 Normalisation process

The normalisation process for normalising relations can be formulated as follows:

Identify all candidate keys in a relation Identify all functional dependencies in a relation

Examine the determinants of said functional dependencies. If any determinant is not a candidate key, the relation is not well formed. In

such a case: Place the columns of the functional dependency in a new relation.

Make the determinant of the functional dependency the PK of the new relation.

Leave a copy of the determinant as a FK in the original relation. Create a referential integrity constraint between the original and new

relations.



Repeat the previous step as many times as necessary until every

determinant of every relation is a candidate key.

Figure 32 acts as an example to explain the above:

Figure 32 – PRESCRIPTION relation and data


Example Consider the following relation:

PRESCRIPTION (PrescriptionNumber, Date, Drug, Dosage, CustomerName, CustomerPhone, CustomerE-mail)

Identify all candidate keys

PrescriptionNumber is a candidate key; it clearly determines

Date, Drug and Dosage. If we assume that a prescription is only for one person, then it also determines CustomerName,

CustomerPhone and CustomerE-mail.

Identify all functional dependencies

To know whether functional dependencies are true for a particular application, we need to look beyond the sample data and ask the

users.

PrescriptionNumber determines all other attributes as described.

By examining the customer columns, we do find a functional dependency, CustomerE-mail, which can be assumed to be a better determinant of attributes: CustomerE-mail →

(CustomerName, CustomerPhone).

This assumption is based on the fact that it is possible for some

customers to share the same e-mail address; it is also possible

that some customers do not have an e-mail address.

Ask whether there are determinants that are not candidate keys

CustomerE-mail is a determinant but not a candidate key.

Therefore, PRESCRIPTION has normalisation problems and is not



well formed. According to this step, we must split the functional

dependency into a new relation of its own:

CUSTOMER (CustomerName, CustomerPhone,

CustomerE-mail)

We make the determinant of the functional dependency, CustomerE-mail, the PK of the new relation. We leave a copy of

such in the original relation as a FK. Thus, PRESCRIPTION is now:

PRESCRIPTION (PrescriptionNumber, Date, Drug, Dosage, CustomerE-mail)

Finally, we create a referential integrity constraint:

CustomerE-mail in PRESCRIPTION must exist in CustomerE-mail

in CUSTOMER.

At this point, if we move through the three steps, we find that

neither of these relations has a determinant that is not a candidate key. We can now say that the two relations are

normalised.

Figure 33 illustrates the result of normalisation on the example:

Figure 33 – Normalised data


Supplementary reading Kroenke, D.M. & Auer, D.J. 2013. Database concepts. 6th

edition. New Jersey: Prentice Hall. Chapter 4.



2.3.2 Transforming data models into relational design

The steps for transforming data models into relational design involve the

following:

Create a table for each entity in a data model: Specify the PKs

Specify the properties of each column, e.g. data types, null status (if any), default values and data constrains (if any)

Normalise the table Create the relationships between tables by replacing FKs

Create strong entity relationships (1:1, 1:N, N:M) Create ID-dependent and non-ID-dependent weak entity relationships

Create sub-types Create recursive relationships (1:1, 1:N, N:M)

2.3.2.1 Representing entities

Follow the process below:

Create a relation for each entity (a relation has a descriptive name and a set of attributes that describe the entity)

Specify a PK

Specify column properties, e.g. data types, null status (if any), default values and data constrains (if any)

Analyse the relation using normalisation rules As normalisation issues arise, the initial relational design may need to

change

2.3.2.2 Representing entities as tables

Figure 34 illustrates representing an entity as a table:

Figure 34 – ITEM entity and table


The entity table with column characteristics are:

ITEM (ItemNumber, Description, Cost, ListPrice, QuantityOnHand) (Figure 35):



Figure 35 – Final ITEM table


2.3.3 Normalisation review

2.3.3.1 Solving problems

Tables that are not normalised will experience problems. These include:

Insertion problems: difficulties inserting data into relations

Modification problems: difficulties modifying data in relations Deletion problems: difficulties deleting data from relations

Most problems are solved by breaking existing tables into two or more tables

through normalisation.

2.3.3.2 Concepts review

Important normalisation concepts are:

Functional dependency: the relationship within a relation that describes how the value of one attribute may be used to find the value of another

attribute

Determinant: an attribute that can be used to find the value of another attribute in a relation

Candidate key: the value of a candidate key can be used to find the value of every other attribute in a table; a simple candidate key consists of only

one attribute whereas a composite candidate key consists of more than one attribute

2.3.3.3 Normal forms

There are many defined normal forms, namely: 1NF

2NF 3NF

BCNF 4NF

5NF

Domain key/normal form (DK/NF)



Figure 36 illustrates a CUSTOMER entity and table:

Figure 36 – CUSTOMER entity and table


Figure 37 illustrates the CUSTOMER entity’s normalised set of tables:

Figure 37 – Normalised CUSTOMER entity tables


2.3.4 Denormalisation

Denormalisation is the process of trying to optimise the read performance of databases by adding redundant data or by grouping data. In some cases,

denormalisation covers up the inefficiencies inherent in relational database software.

A relational normalised database imposes a heavy access load on physical

storage of data even if it is well-tuned for high performance. It may also significantly increase the complexity of the data structure. There are situations,

however, where denormalised relations are preferred.

http://www.wikipedia.org/wiki/Database

http://www.wikipedia.org/wiki/Relational_model

http://www.wikipedia.org/wiki/DBMS

http://www.wikipedia.org/wiki/DBMS

http://www.wikipedia.org/wiki/Database_normalization



Figure 38 illustrates the CUSTOMER entity’s denormalised set of tables:

Figure 38 – Denormalised CUSTOMER entity tables


2.3.4.1 Representing weak entities

If an entity is not ID-dependent, use the same techniques as for strong entities. If it is ID-dependent, add a PK to the parent entity.

Figure 39 illustrates a weak entity:

Figure 39 – Weak entity


2.3.4.2 Representing 1:1 relationships

Maximum cardinality determines how relationships are represented. In a 1:1

relationship, the key from one relation is placed in the other as a FK; it does not matter which table receives it.



Figure 40 illustrates a 1:1 relationship:

Figure 40 – 1:1 relationship


2.3.4.3 Representing 1:N relationships

Similar to 1:1 relationships, 1:N relationships are saved by placing the key

from one table into another as a FK. However, in a 1:N the FK always goes into the N-side of the relationship. The 1-side is called the ‘parent’ and the N-side is

called the ‘child’.

Figure 41 illustrates a 1:N relationship:

Figure 41 – 1:N relationship




2.3.4.4 Representing N:M relationships

To create N:M relationships, new tables must be created. Such tables are called ‘intersection tables’. An intersection table has a composite key consisting

of keys from each of the tables that it connects.

Figure 42 illustrates an N:M relationship data model:

Figure 42 – N:M relationship data model


Figure 43 illustrates an N:M relationship database design:

Figure 43 – N:M relationship database design




2.3.4.5 Representing associated relationships

Figure 44 illustrates an associated relationship:

Figure 44 – Associated relationship


2.3.4.6 Representing super-type and sub-type relationships

The identifier of a super-type becomes the PK and FK of each sub-type (Figure 45):

Figure 45 – Super-type and sub-type relationships


2.3.4.7 Representing recursive relationships

Recursive relationships have relations within themselves; they adhere to the

same rules as binary relationships. 1:1 and 1:N relationships are saved using FKs and N:M relationships are saved by creating intersecting relations.



Figure 46 illustrates recursive relationships:

Figure 46 – Recursive relationships


Figure 47 illustrates an N:M recursive relationship:

Figure 47 – N:M recursive relationship




2.3.5 Developing database design

Example Heather Sweeney Designs will again be used as an example.

2.3.5.1 Final data models

Figure 48 illustrates Heather Sweeney Designs: the final data model:

Figure 48 – Final data model




2.3.5.2 Specifying column properties

Column properties must be specified for each table. The column properties for the Heather Sweeney Designs tables are illustrated in Figures Figure 49 to

Figure 54:

Figure 49 – Column properties (SEMINAR)


Figure 50 – Column properties (CUSTOMER)


Figure 51 – Column properties (CONTACT)




Figure 52 – Column properties (INVOICE)


Figure 53 – Column properties (LINE_ITEM)


Figure 54 – Column properties (PRODUCT)




Figure 55 illustrates Heather Sweeney Designs: the final data design:

Figure 55 – Final data design


The final data design for Heather Sweeney Designs is as follows:

SEMINAR (SeminarID, SeminarDate, SeminarTime, Location, SeminarTitle)

CUSTOMER (E-mailAddress, LastName, FirstName, Phone, StreetAddress, City, State, ZIP)

SEMINAR_CUSTOMER (SeminarID, E-mailAddress)

CONTACT (E-mailAddress, ContactDate, ContactNumber, ContactType,

SeminarID)

PRODUCT (ProductNumber, Description, UnitPrice, QuantityOnHand)

INVOICE (InvoiceNumber, InvoiceDate, PaymentType, Subtotal, Shipping, Tax, Total, E-mailAddress)

LINE_ITEM (InvoiceNumber, LineNumber, Quantity, UnitPrice, Total,

ProductNumber)



2.3.5.3 Referential integrity constraints

Figure 56 illustrates Heather Sweeney Designs: referential integrity constraint enforcement:

Figure 56 – Referential integrity constraint enforcement





This unit explored database design principles as well as various database

design models.

In the next unit, we will equip you with practical MySQL knowledge as it pertains to database development.



2.5 Self-assessment

Test your knowledge

1. Give an example of a business rule that would need to be documented in a

database development project.

2. Explain the difference between entity classes and entity instances.

3. List the three types of binary relationship. Draw both a traditional ER diagram as well as an IE Crow’s Foot ER diagram for each.

4. Draw an IE Crow’s Foot ER diagram for the entities DEPARTMENT and

EMPLOYEE as well as the 1:N relationship between them. Assume that a

department does not need to have employees, but that every employee is assigned to a department. Include appropriate identifiers and attributes for

each entity.

5. Summarise the SDLC.

6. Describe the different types of relationship. Use examples to motivate and illustrate your answer.

Unit 3 – Database development Page 69


Unit 3 – Database development

Unit 3 is aligned with the following learning outcome and assessment criteria:

Learning outcome

LO3 Be able to design, create and document databases

Assessment criteria

AC3.1 Apply the database developmental life cycle to a given data set

AC3.2 Design a fully functional database (containing at least four inter-relational tables), including user interface

AC3.3 Evaluate the effectiveness of the database solution and suggest methods of improvement

Learning objectives

After studying this unit, you should be able to: Create a database

Create tables Create relationships between tables

Understand basic SQL statements Use basic SQL statements to add, modify and delete data

Use basic SQL statements to process single and multiple tables

Use SQL queries to extract data

Introduction

In this unit, you are introduced to MySQL. You will learn how to create a

database and various database objects. You will also learn how to query a database as well as how to manipulate data using various techniques.

This unit elaborates on the previous units and is meant to provide a solution to a business problem.



3.1 MySQL 5.6 database software

Follow the following steps:

Download MySQL for Windows from http://dev.mysql.com/downloads Install MySQL with an administrator account:

Install MySQL on Windows using the Microsoft Software Installer

Package: o Download and start the MySQL Installation Wizard

o Click Custom Installation > OK Install MySQL additional optional components:

ODBC driver Connecter/net driver

Disable antivirus scanning on the main MySQL data directory (datadir) Disable antivirus scanning on the temporary MySQL data directory

(tmpdir)

MySQL has both a graphics user interface (GUI) and command line interface (CMD).

3.2 MySQL GUI

To access the MySQL GUI (Figure 57), click the following:

Start All Programs

MySQL MySQL Server 5.6

MySQL Workbench 6.1 CE

http://dev.mysql.com/downloads



Figure 57 – MySQL GUI

Source: Nyasha Magutsa

3.2.1 Creating a database

Follow the steps displayed in Figure 58 to Figure 61 to create a database:

Figure 58 – Creating a database




Figure 59 – Creating a database (continued)








3.2.2 Showing a database

Follow the step displayed in Figure 62 to view all databases on a database server:

Figure 62 – Showing a database




3.2.3 Selecting a database

Follow the step displayed in Figure 63 to use a database for an operation; you

must select the database:

Figure 63 – Selecting a database


3.2.4 Deleting a database

Follow the step displayed in Figure 64 to delete a database; the database’s related objects will be removed; the process is irreversible:

Figure 64 – Deleting a database




3.2.5 MySQL data types

MySQL tables contain multiple columns with different data types. Data type is

determined by the data that is stored in a particular column. Table 2 – Numeric data types to Table 5 summarise the different MySQL data types:

Table 2 – Numeric data types

Numeric type Description

TINYINT A very small integer

SMALLINT A small integer

MEDIUMINT A medium-sized integer

INT A standard integer

BIGINT A large integer

DECIMAL A fixed-point number

FLOAT A single precision floating-point number

DOUBLE A double precision floating-point number

BIT A bit field


Table 3 – String data types

String type Description

CHAR A fixed-length non-binary string

VARCHAR A variable-length non-binary string

BINARY A fixed-length binary string

VARBINARY A variable-length binary string

TINYBLOB A very small Binary Large Object (BLOB)

BLOB A small BLOB

MEDIUMBLOB A medium-sized BLOB

LONGBLOB A large BLOB

TINYTEXT A very small non-binary string

TEXT A small non-binary string

MEDIUMTEXT A medium-sized non-binary string

LONGTEXT A large non-binary string

ENUM An enumeration; each column value may be assigned to one

enumeration member only

SET A set; each column value may be assigned to zero or more set

members


Table 4 – Date and time data types

Date and time type Description

DATE A date value in ‘CCYY-MM-DD’ format

TIME A time value in ‘hh:mm:ss’ format

DATETIME A date and time value in ‘CCYY-MM-DD hh:mm:ss’ format

TIMESTAMP A timestamp value in ‘CCYY-MM-DD hh:mm:ss’ format

YEAR A year value in ‘CCYY’ or ‘YY’ format




Table 5 – Spatial data types

Spatial type Description

GEOMETRY A spatial value of any type

POINT A point

LINESTRING A curve

POLYGON A polygon

GEOMETRYCOLLECTION A collection of geometry values

MULTILINESTRING A collection of linestring values

MULTIPOINT A collection of point values

MULTIPOLYGON A collection of polygon values


3.2.6 Creating a table

Create a table identical to Table 6 below:

Table 6 – CONTACT

Column name Data type Key Required

E-mailAddress VARCHAR (100) PK/FK Yes

ContactDate DATE PK Yes

ContactNumber INT (11) No Yes

ContactType VARCHAR (15) No Yes


Follow the steps displayed in Figure 65 – Creating a table to Figure 68 –

Creating a table (continued) to create the above table:

Figure 65 – Creating a table




Figure 66 – Creating a table (continued)








3.2.7 Showing a table

Follow the step displayed in Figure 69 – Showing a table to view all tables in a

database:

Figure 69 – Showing a table




3.2.8 Describing a table

Follow the step displayed in Figure 70 – Describing a table to describe a table

in a database:

Figure 70 – Describing a table




3.2.9 Entering data into a table: Form Editor

Follow the steps displayed in Figure 71 – Entering data into a table: Form

Editor and Figure 72 to enter data into a table in Form Editor:

Figure 71 – Entering data into a table: Form Editor


Figure 72 – Entering data into a table: Form Editor (continued)




3.2.10 Entering data into a table: Datasheet View

Follow the step displayed in Figure 73 to enter data into a table in Datasheet

View:

Figure 73 – Entering data into a table: Datasheet View


3.2.11 Deleting data from a table: Datasheet View

Follow the steps displayed in Figure 74 and Figure 75 to delete data from a table in Datasheet View:

Figure 74 – Deleting data from a table: Datasheet View




Figure 75 – Deleting data from a table: Datasheet View


3.2.12 Altering table structure

Add the following columns (Table 7) to the table that you created (refer to

Table 6):

Table 7 – Additional columns to be added to CONTACT

Column name Data type Key Required

ContactID AUTONUMBER PK Yes

CustomerID INT (9) FK Yes


Follow the steps displayed in Figure 76 and Figure 77 to add the columns:

Figure 76 – Adding columns




Figure 77 – Adding columns (continued)


3.2.13 Working with multiple tables

CustomerID in CUSTOMER must also exist in CONTACT (Figure 78):

Figure 78 – Working with multiple tables




3.2.14 Working with multiple tables: Relationships Window

Follow the steps displayed in Figure 79 and Figure 80 to work with multiple

tables in the Relationships Window:

Figure 79 – Working with multiple tables: Relationships Window


Figure 80 – Working with multiple tables: Relationships Window (continued)




3.2.15 Adding a table: Relationships Window

Follow the step displayed in Figure 81 to add a table in the Relationships

Window:

Figure 81 – Adding a table: Relationships Window


3.2.16 Creating a relationship: Relationships Window

To create a relationship between two tables, do the following (Figure 82):

In the Relationships Window, click on Relationship Type Click the Primary Key column in the first table

Click the Relating Column in the second table



Figure 82 – Creating a relationship: Relationships Window


3.2.17 Working with multiple tables: Edit Relationships

Window

Follow the step displayed in Figure 83 to work with multiple tables in the Edit Relationships Window:

Figure 83 – Working with multiple tables: Edit Relationships Window




3.2.18 Accessing a table: Form Editor

Follow the step displayed in Figure 84 to access a table in Form Editor:

Figure 84 – Accessing a table: Form Editor


3.3 MySQL CMD

3.3.1 MySQL background

SQL is a standard language used to access and manipulate databases; it is not

a programming language, as such, rather a data sub-language.

SQL is, further, an American National Standards Institute (ANSI) standard. It was originally developed by IBM in the 1970s.

SQL commands can be broken down into two categories, namely:

1. DDL: used to define database structures

2. DML: used to query and modify database data

Oracle, Sybase, Microsoft SQL Server, Microsoft Access and Ingres are some of the most common relational DBMSs that use SQL.

3.3.2 SQL for data definition

DDL is used to create and alter database structures, such as tables, as well as to insert, modify and delete data. Before creating tables, you must first create

a database. Although there is an SQL statement for creating such, most developers use graphics tools; these tools are, usually, DBMS-specific.

SQL data definition statements include:

CREATE: used to create database objects

ALTER: used to modify the structure and/or characteristics of database

objects

DROP: used to delete database objects



TRUNCATE: used to delete table data while keeping the structure

To access the CMD, click the following: Start

All Programs MySQL

MySQL Server 5.6 MySQL 5.6 Command Line Client

Enter the root password (Figure 85):

Figure 85 – Entering the root password


You can now see the default command prompt (Figure 86):

Figure 86 – Default command prompt




3.3.3 Creating a database

Before doing anything with your data, you need to create a database. A

database is a container of data; it stores information of any kind. In SQL, a database is a collection of objects that are used to store and manipulate data.

To create a database in SQL, use the CREATE DATABASE statement:

CREATE DATABASE [IF NOT EXISTS] database_name;

A database name should be meaningful and descriptive. IF NOT EXISTS is an

optional part of the statement. The statement prevents you from using a

database name that already exists in your database catalogue. No two databases can have the same name (Figure 87):

Figure 87 – Creating a database




3.3.4 Showing a database

To view all databases on a database server, the SHOW DATABASE statement is

used (Figure 88): SHOW DATABASES;

Figure 88 – Showing a database


3.3.5 Selecting a database

To use a database for an operation, you must select such. The USE statement

is employed in this case (Figure 89): USE database_name;

Figure 89 – Selecting a database


After a database has been selected, all DML and DDL statements will be

affected on such.



3.3.6 Creating a table

To create a table, the CREATE TABLE statement is used. This creates a structure

with defined data types, primary and other keys (optional). MySQL has made provision for specifying the table type (Figure 90):

CREATE TABLE [IF NOT EXISTS] table_name(

column_list

) type = table_-type

Figure 90 – Specifying a table type (syntax)


Column constraints to consider are PRIMARY KEY, NOT NULL, NULL and UNIQUE. In

addition to these, there is also a CHECK column constraint. Finally, the DEFAULT

keyword (this is not considered a column constraint) can be used to set initial values.

3.3.7 Showing and describing a table

To view all tables in a database as well as their table structures, the SHOW and

DESCRIBE statements are used (Figure 91 and Figure 92):

SHOW TABLES;

DESCRIBE table_name;

Figure 91 – Showing and describing a table (syntax)


Figure 92 – Showing and describing a table




3.3.8 Creating a MySQL table

Create a table identical to Table 8 below:

Table 8 – CUSTOMER

Column name Type Key Required

CustomerID AUTONUMBER PK Yes

LastName CHAR (25) No Yes

FirstName CHAR (25) No Yes

Address CHAR (35) No No

City CHAR (35) No No

State CHAR (2) No No

Zip CHAR (10) No No


Your table should look as follows (Figure 93):

Figure 93 – Creating a MySQL table


The column constraint NOT NULL indicates that a value must be supplied when a

new row is created.

NULL indicates that null values are allowed, which means that a row can be

created without a value.

You can create database tables using PRIMARY KEY and FOREIGN KEY constraints.



3.3.8.1 ALTER statement

Add a PK constraint to an existing table as follows (Figure 94 and Figure 95):

ALTER TABLE EMPLOYEE

ADD CONSTRAINT Emp_PK PRIMARY KEY(EmpID);

Figure 94 – Altering a table (syntax)


Figure 95 – Altering a table


Add a composite PK constraint to an existing table as follows:

Drop the existing key (Figure 96 and Figure 97):

ALTER TABLE EMPLOYEE

DROP PRIMARY KEY;

Figure 96 – Altering a table (syntax) (continued)




Figure 97 – Altering a table (continued)


Add the new composite key (Figure 98 and Figure 99): ALTER TABLE EMPLOYEE

ADD CONSTRAINT EmpName_PK

PRIMARY KEY(EmpID, EmpSurname);







Add a FK constraint to an existing table as follows (Figure 100 to Figure 103): ALTER TABLE EMPLOYEE

ADD COLUMN ContactID mediumint(9);





ALTER TABLE EMPLOYEE ADD

CONSTRAINT Emp_FK FOREIGN KEY(ContactID)

REFERENCES Contact(ContactID);







3.3.9 Relational data

DML is used to query databases and to modify table data. There are three

possible data modification operations, namely:

1. INSERT

2. UPDATE

3. DELETE

Data can be added to a relation by using the INSERT statement. This statement

has two forms, depending on whether data is supplied for all columns. Non-

numeric data must be enclosed in straight single quotes (').

If all column data is supplied, the INSERT statement can be used to insert data

into the EMPLOYEE table (Figure 104): INSERT INTO EMPLOYEE VALUES ('700', 'Golden', 'Barker', 'B', '3');

Figure 104 – Inserting data into a table


If the DBMS provides a surrogate key, the PK value does not need to be

specified. SQL comments are enclosed in the symbols /* and */; any text

between these symbols is ignored when SQL statements are executed.

3.3.10 Relational queries

After defining and populating a table, DML can be used to query data in many

ways as well as to change and delete it.



3.3.10.1 SELECT/FROM/WHERE framework

SELECT is the best known SQL statement; it retrieves information from

databases that matches specified criteria using the SELECT/FROM/WHERE

framework (Figure 105):

SELECT ColumnName

FROM TableName

WHERE SomeConditionExists;

Figure 105 – SELECT/FROM/WHERE framework (syntax)


The result of a SELECT statement is a relation. This is always true for SELECT

statements. A SELECT statement starts with one or more relations, manipulates

this in some way and then produces relations. Even if the result of the manipulation is a single number, the number is considered to be a relation with

one row and one column. WHERE in a SELECT statement specifies the record to be

retrieved (Figure 106 to Figure 108):

SELECT EmpName

FROM EMPLOYEE

WHERE EmpID = 700;

Figure 106 – SELECT and WHERE statements (syntax)


or SELECT *

FROM CUSTOMER

WHERE CustomerID = 2;

Figure 107 – SELECT and WHERE statements (syntax) (continued)




Figure 108 – SELECT and WHERE statements


The former retrieves all records under CUSTOMER that has the CustomerID ‘2’.

The asterisk (*) wildcard operator, after the keyword SELECT, can be used to

select all columns. The syntax is as follows (Figure 109):

SELECT *

FROM CUSTOMER;

Figure 109 – * wildcard operator (syntax)


The former retrieves every row (all columns) from the table CUSTOMER (Figure

110):

Figure 110 – * wildcard operator




By default, DBMS products do not check for duplication. Thus, in practice, duplicate rows can occur. If you want a DBMS to check for and eliminate

duplicate rows, you must use the DISTINCT keyword (Figure 111):

SELECT DISTINCT FirstName

FROM CUSTOMER;

Figure 111 – DISTINCT keyword (syntax)


Many different conditions can be placed in a WHERE statement. Select all

columns from CUSTOMER where the value of the CustomerID column is

greater than ‘2’ by typing the following (Figure 112): SELECT *

FROM CUSTOMER

WHERE CustomerID > 2;

Figure 112 – WHERE statement (syntax)


You can also combine two or more conditions in a WHERE statement by using the

AND and OR keywords. If the AND keyword is used, only rows meeting all of the

conditions will be selected. If the OR keyword is used, rows that meet any of

the conditions will be selected. For example, the query below uses the AND

keyword to retrieve customers with the letter ‘i’ in their first name; they must

also reside in the ‘NW’ state (Figure 113 to Figure 115): SELECT *

FROM CUSTOMER

WHERE FirstName like '%i%'

AND State = 'NW';

Figure 113 – AND keyword (syntax)




Figure 114 – AND keyword


SELECT EmpName

FROM EMPLOYEE

WHERE EmpID < 9

OR ContactID > 2;

SELECT EmpName

FROM EMPLOYEE

WHERE EmpGrade = 'A'

AND EmpID < 800;

Figure 115 – AND keyword (syntax) (continued)


A WHERE statement may, further, include the IN keyword to specify that a

particular column value must be included in the list of values (Figure 116):

SELECT EmpName

FROM EMPLOYEE

WHERE EmpGrade IN ('A', 'B');

Figure 116 – IN keyword (syntax)


Any statement may be preceded by the NOT operator, which is to say that all

information will be shown except information matching the specified criteria

(Figure 117 and Figure 118): SELECT EmpName

FROM EMPLOYEE

WHERE ContactID NOT IN (1, 2, 9);

Figure 117 – NOT operator (syntax)




Figure 118 – NOT operator


The BETWEEN keyword allows a user to specify minimum and maximum values

on one line (Figure 119 and Figure 120):

SELECT EmpName

FROM EMPLOYEE

WHERE ContactID BETWEEN 1 AND 2;

Figure 119 – BETWEEN keyword (syntax)


Figure 120 – BETWEEN keyword




3.3.11 Comparison operators

Table 9 lists the most common comparison operators:

Table 9 – Comparison operators

Operator Description

= Equal to

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to

<> Not equal to


The LIKE keyword is used to select partial values. It is used in combination with

wildcard characters, which represent unknown characters in patterns.

The underscore (_) is used to represent a single, unspecified character. The

percent sign (%) is used to represent a series of one or more unspecified

characters.

In the following query, LIKE is used to find values that fit a pattern (Figure

121):

SELECT *

FROM STUDENT

WHERE ModuleName LIKE 'CINT 1_Networking';

Figure 121 – LIKE keyword (syntax)


A % query could be written as follows (Figure 122):

SELECT *

FROM STUDENT

WHERE Phone LIKE '021 360-%';

Figure 122 – % query (syntax)


If you want to query all students who work in departments that end in ‘ing’,

you can use % as follows (Figure 123):

SELECT *

FROM STUDENT

WHERE DepartmentName LIKE '%ing';

Figure 123 – % query (syntax) (continued)




3.3.12 Sorting a query result

The order of rows in the result of a SELECT statement is somewhat arbitrary. If

this is undesirable, you can use the ORDER BY statement to sort the rows. For

example, the following syntax will display the names, telephone numbers and

modules of all students, sorted by LastName (Figure 124): SELECT FirstName, LastName, Phone, ModuleName

FROM STUDENT

ORDER BY LastName;

Figure 124 – ORDER BY statement (syntax)


By default, SQL sorts rows in ascending order. The ASC and DESC keywords can

be used to specify ascending and descending order. Thus, to sort students in descending order by LastName, type the following (Figure 125):

SELECT FirstName, LastName, Phone, ModuleName

FROM STUDENT

ORDER BY LastName DESC;

Figure 125 – DESC keyword (syntax)


3.3.13 Built-in functions and calculations

SQL allows you to calculate values based on table data. You can use arithmetic

formulae or SQL built-in functions, such as:

COUNT: used to count the number of rows that match specified criteria

MIN: used to find the minimum value in a specific column

MAX: used to find the maximum value in a specific column

SUM: used to calculate the sum of a specific column

AVG: used to calculate the numeric average of a specific column

Examples are illustrated in Figure 126 to Figure 128: SELECT COUNT(EmpID)

FROM EMPLOYEE;

SELECT COUNT(*)

FROM CUSTOMER;

Figure 126 – COUNT built-in function (syntax)




Figure 127 – COUNT built-in function


SELECT MIN (Hours) AS MinimumHours,

MAX (Hours) AS MaximumHours,

AVG (Hours) AS AverageHours

FROM PROJECT

WHERE ProjID > 7;

Figure 128 – MIN, MAX and AVG built-in functions (syntax)


3.3.14 Built-in functions and grouping

In SQL, you can use the GROUP BY statement to group rows by common values;

this increases the utility of built-in functions as you can apply this to groups of rows.

Figure 129 serves as an example:

SELECT ModuleName, Count(*) AS NumberOfStudents

FROM STUDENT

GROUP BY ModuleName;

Figure 129 – GROUP BY statement (syntax)


A GROUP BY statement tells a DBMS to sort a table by the named column and

then to apply the built-in function to groups of rows that have the same value

for the named column. When GROUP BY is used, the name of the grouping

column and built-in function may appear in the SELECT statement. This is the

only time that a column name and built-in function can appear together.

You can further restrict results by using the HAVING statement to apply

conditions to groups that are formed. For example, if you want to consider only



groups with more than two members, you could specify such as follows (Figure 130):

SELECT ModuleName, Count(*) AS NumberOfStudents

FROM STUDENT

GROUP BY ModuleName

HAVING COUNT(*) > 2;

Figure 130 – HAVING statement (syntax)


3.3.15 Querying multiple tables with joins

Sometimes more than one table must be processed to obtain the desired information. Sub-queries are effective for processing multiple tables as long as

the results come from a single table. If you, however, need to display data from two or more tables, sub-queries will not work. You then need to use a

JOIN statement. There are five types of JOIN statement, namely: 1. INNER JOIN

2. LEFT OUTER JOIN

3. RIGHT OUTER JOIN

4. FULL OUTER JOIN

5. CROSS JOIN

An INNER JOIN iterates through each row in two tables and matches the rows

that are common. It looks at each row in both tables and, if the condition is

met, the rows are included in the results set. LEFT OUTER JOIN, RIGHT OUTER JOIN

and FULL OUTER JOIN are used to include rows that do not have anything in

common with the other table. This is useful to identify rows that have things in

common along with the ones that do not.

Examples are illustrated in Figure 131 to Figure 133: SELECT EmpName

FROM EMPLOYEE AS E, CONTACT AS C

WHERE E.ContactID = C.ContactID

AND E.EmpGrade LIKE 'A';

Figure 131 – JOIN statement (syntax)




Figure 132 – JOIN statement


SELECT EmpName

FROM EMPLOYEE AS E JOIN DEPARTMENT AS D

ON E.DeptID = D.DeptID

WHERE D.DeptName LIKE 'Account%';

SELECT EmpName

FROM EMPLOYEE AS E

LEFT JOIN DEPARTMENT AS D



SELECT EmpName

FROM EMPLOYEE AS E

RIGHT JOIN DEPARTMENT AS D



Figure 133 – JOIN statements (syntax)


3.3.16 Relational data modification and deletion

A DDL contains commands for three possible data modification operations. We have already discussed inserting data; we will now consider modifying and

deleting data.

INSERT and UPDATE statements can be combined into one statement. This uses

the equivalent of if-then-else logic to decide whether to use INSERT or UPDATE.

An advance feature is to learn to use INSERT and UPDATE separately first and

then to consult DBMS documentation.



3.3.16.1 Modifying data

The UPDATE . . . SET command can be used to modify existing data. This is a

powerful command that needs to be used with care. It can modify more than one column value at a time, as shown in the example:

If a customer with CustomerID ‘6’ changes their LastName to ‘Waver’, the

following syntax can be used to update their data (Figure 134 and Figure 135): UPDATE CUSTOMER

SET LastName = 'Waver'

WHERE CustomerID = 6;

Figure 134 – UPDATE statement (syntax)


Figure 135 – UPDATE statement


The former changes the value of LastName for CUSTOMER. You can use a

SELECT statement to view the result (Figure 136):

SELECT *

FROM Customer

WHERE CustomerID = '6';

Figure 136 – UPDATE statement result (syntax)


3.3.16.2 Deleting data

A DELETE statement can be used to eliminate rows. DELETE is deceptively simple

to use and easy to apply in unintended ways thus care should be taken. The following, for example, deletes all employees whose EmpID is 1 000 (Figure

137 and Figure 138):



DELETE

FROM Employee

WHERE EmpID = '1000';

Figure 137 – DELETE statement (syntax)


Figure 138 – DELETE statement


To delete all rows in EMPLOYEE, use the following syntax (Figure 139): DELETE

FROM Employee;

Figure 139 – DELETE statement (syntax) (continued)


3.3.17 Table and constraint modification and deletion

3.3.17.1 DROP TABLE statement

The DROP TABLE statement is also a dangerous SQL statement as it drops a

table’s structure along with all of the table’s data. For example, to drop the

EMPLOYEE table, use the following syntax: DROP TABLE Employee;

A DROP TABLE statement does not work if the table contains or could contain

values needed to fulfil referential integrity constraints. In such a case, an

attempt to issue the statement fails and an error message is generated.

3.3.17.2 Deleting a database

To remove a database, a DROP statement is used. A database’s related objects

will also be deleted; the process is irreversible (Figure 140):

DROP DATABASE [IF EXISTS] database_name;



Figure 140 – DROP statement


3.3.18 SQL views

A SQL view is a virtual table created by a DBMS-stored SELECT statement. A

view can combine access to data in multiple tables and even in other views. A

view has no data of its own; it uses data stored in tables or in other views.

Views are created by using a SELECT statement. Views are used just as if they

were a table. The statements that create views may not contain ORDER BY

statements. If the results of a query need to be sorted, the sort order must be

provided by a SELECT statement that processes the view.

A CREATE VIEW statement is used to create view structures (Figure 141 to Figure

143): CREATE VIEW ViewName AS

{SQL SELECT statement};

Figure 141 – CREATE VIEW statement (syntax)


CREATE VIEW EmployeeNames AS

SELECT EmpName, EmpSurname,

EmpGrade AS EmployeeRank

FROM EMPLOYEE;

Figure 142 – CREATE VIEW statement (syntax) (continued)




Figure 143 – CREATE VIEW statement


Once a view has been created, it can be used in the FROM clause of a SELECT

statement (Figure 144 and Figure 145): SELECT *

FROM EmployeeNames

ORDER BY EmpName;

Figure 144 – FROM clause (syntax)


Figure 145 – FROM clause


SQL views can also be used to: Hide columns or rows

Display computation results Hide complicated SQL syntax

Layer built-in functions






This unit provided a practical approach to database development.

In the next unit, we will highlight how databases are managed and documented.

3.5 Self-assessment

Test your knowledge

1. Create a database using the design of Figure 55 in Unit 2. Use MySQL to

create three tables and the CMD to create four tables and their relationships.

2. Explain DDL and DML.

Unit 4 – Database management Page 113


Unit 4 – Database management

Unit 4 is aligned with the following learning outcome and Assessment Criterion:

Learning outcome

LO3 Design, create and document databases

Assessment Criterion

AC3.3 Provide supporting user and technical documentation

Learning objectives

After studying this unit, you should be able to:

Understand the need for and importance of database administration Understand the need for concurrency, security, backup and recovery

Understand the different ways of processing databases Understand the use of locking and the problem of deadlock

Be familiar with ACID (Atomic, Consistent, Isolated, Durable) transactions Know the difference between recovery via reprocessing and

rollback/rollforward Understand rollback/rollforward

Understand Web database processing Understand the basic concepts of big data and structured storage

Understand the basic concepts of data warehouses and data marts

Introduction

In this unit, we focus on database management; this includes sections on problems that can occur when databases are processed concurrently by more

than one user. You will also learn about how databases support data warehouses, modern Business Intelligence (BI) systems and big data.

Activity

Write down what you think database management is about and what kind of business problems can be solved by proper

database management.



4.1 Database administration

4.1.1 Responsibilities

The three major concerns of database administration are:

1. Concurrency control 2. Security

3. Reliability

A database administrator needs to ensure that a system exists to gather and record user reported errors and other problems. A means needs to be devised

to prioritise such errors and problems and to ensure that they are corrected accordingly.

A database administrator also needs to create and manage a process to control

database configuration. This entails procedures to create projects and tasks, record change requests and conduct user and developer reviews of such

requests.

Database administrators are responsible for ensuring that appropriate

documentation is maintained for: Database structures

Concurrency control Security

Backup and recovery Applications

4.1.2 Database processing environment I

Databases come in a variety of sizes and scope, from single user databases to

large organisational databases, such as inventory management systems. Databases also vary in the way that they are processed.



Figure 146 illustrates the database processing environment:

Figure 146 – Database processing environment


Three necessary database administration functions are:

1. Concurrency control 2. Security

3. Backup and recovery

We will now investigate some of the issues surrounding these.

4.1.3 Concurrency control

Concurrency control ensures that one user’s actions do not adversely affect another user’s actions. At the core of concurrency is accessibility. In one

extreme, data becomes inaccessible once a user touches it. Concurrency thus ensures that data that is being considered for update is not shown. In another

extreme, data is always readable even when it is locked for update.

Interdependence refers to changes required by one user that may affect

another user. Concurrency refers to people or applications that may try to update the same information at the same time. Record retention is when

information should be discarded.



4.1.3.1 Need for atomic transactions

Database operations, typically, involve several transactions. These transactions

are atomic and often called ‘Logical Units of Work’ (LUW). Before an operation is committed, all LUW must be successfully completed. If one or more LUW is

unsuccessful, a rollback is performed and no changes are saved to the database. If two transactions are being processed against a database at the

same time, they are termed a ‘concurrent transaction’ (Figure 147):

Figure 147 – Concurrent transaction


4.1.3.2 Lost update problem

If two or more users are attempting to update the same piece of data at the

same time, it is possible that one update may overwrite the other (Figure 148):



Figure 148 – Lost update problem


Other concurrency issues include the following:

Dirty read: a transaction that reads a changed record that has not been

committed to a database Inconsistent read: a transaction that rereads a data set and finds that data

has changed Phantom read: a transaction that rereads a data set and finds that a new

record has been added

4.1.3.3 Resource locking

To avoid concurrency issues, resource locking will not allow transactions to

read from, modify or write to data sets that have been locked. Implicit locks are issued automatically by DBMSs whereas explicit locks are issued by users

requesting exclusive rights to data. Figure 149 illustrates concurrent processing with explicit locks:



Figure 149 – Concurrent processing with explicit locks


4.1.3.4 Serialisable transactions

When two or more transactions are processed concurrently, database results

should be logically consistent with the results that would have been achieved had the transactions been processed in an arbitrary serial fashion. A scheme

for processing concurrent transactions in this way is said to be ‘serialisable’.

One way to achieve serialisable transactions is by using two-phase locking. Two-phase locking allows locks to be obtained and released as they are

needed. A growing phase is when a transaction continues to request additional locks whereas a shrinking phase is when a transaction begins to release locks.

4.1.3.5 Deadlock

As a transaction begins to lock resources, it may have to wait for a particular

resource to be released by another transaction. On occasion, two transactions may indefinitely wait for each other to release a resource; this condition is

known as ‘deadlock’ or a ‘deadly embrace’ (Figure 150):



Figure 150 – Deadlock


4.1.3.6 Optimistic vs pessimistic locking

Optimistic locking (Figure 151) entails: Reading data

Processing transactions Issuing updates

Looking for conflict

Figure 151 – Optimistic locking


If no conflict occurs, a transaction is committed; if not, rollback occurs and the transaction is repeated.



Pessimistic locking (Figure 152) entails: Lock-required resources

Reading data Processing transactions

Issuing commitments Releasing locks

Figure 152 – Pessimistic locking


4.1.3.7 Consistent transactions

Consistent transactions are often referred to by the acronym ACID (Atomic,

Consistent, Isolated, Durable).

Atomic

An atomic transaction is one in which all database actions occur or none occur. A transaction consists of a series of steps; each step must be successful for the

transaction to be saved. This ensures that transactions complete everything as intended before saving changes.

Consistent

No other transactions are permitted until the current transaction has finished. This ensures that transaction integrity has statement level consistency across

all records.



Isolated Within multiuser environments, different transactions may be operating on the

same data. In such a case, the sequencing of uncommitted updates and rollbacks commit changes to data content. The 1992 ANSI SQL (Figure 153)

defines four isolation levels that specify which concurrency control problems are allowed to occur:

Figure 153 – Isolation levels


Durable

A durable transaction is one in which all committed changes are permanent.

4.1.3.8 Cursors

A cursor is a pointer to a set of rows that are the result from a SQL SELECT

statement. Cursors are, usually, defined using SELECT statements (Figure 154):

DECLARE CURSOR TransCursor AS

SELECT *

FROM SALE_TRANSACTION

WHERE PurchasePrice > '10000';

Figure 154 – Cursor (syntax)


There are forward only and scrollable cursor types. In SQL, for forward only or scrollable cursors, there are three types, namely:

Static cursors

Keyset cursors Dynamic cursors



Other DBMS products may define a different set of cursors. In such a case, the forward only cursor is considered a separate cursor type whereas the scrollable

cursor may be static, keyset or dynamic (Figure 155):

Figure 155 – Cursor types


4.1.4 Security

Database security strives to ensure that only authenticated users perform

authorised activities (Figure 156):

Figure 156 – Database security authentication and authorisation


4.1.4.1 Processing rights and responsibilities

Processing rights define who is permitted to do what and when. Individuals

performing activities assume full responsibility for the implications of their actions. Individuals are identified by a username and password.



A DBMS security model (Figure 157) may look like this:

Figure 157 – DBMS security model


Processing rights at Heather Sweeney Designs (Figure 158) may look like this:

Figure 158 – Processing rights


4.1.4.2 Granting permissions

Database users are known as individuals and members with one or more roles.

Access and processing rights may be granted to individuals and/or roles. Users possess a compilation of rights granted to individuals and all roles for which

they are members.



4.1.4.3 Database security guidelines

The following database security guidelines are useful:

Run a DBMS behind a firewall Apply the latest operating system and DBMS service packs and patches

Limit DBMS functionality to needed features only Protect computers that run a DBMS

Manage accounts and passwords

4.1.5 Backup and recovery

Common causes of database failures are: Hardware failures

Programming bugs Human errors/mistakes

Malicious actions

The above are impossible to avoid completely; recovery procedures are thus essential.

4.1.5.1 Recovery via reprocessing

In reprocessing, all activities since a backup was performed are redone; this is a brute force technique. This procedure is costly in terms of effort as data

needs to be re-entered. It is also risky in that human error is likely and paper record keeping may not be accurate.

4.1.5.2 Recovery via rollback and rollforward

Most DBMSs provide a mechanism to record activity in a log file.

To undo a transaction, the log file must contain a copy of every database

record before it was changed; such records are called ‘before-images’. A transaction is undone by applying before-images to a database.

To redo a transaction, the log file must contain a copy of every database

record after it was changed; such records are called ‘after-images’. A transaction is redone by applying after-images to a database.

A log file is used for recovery via rollback or rollforward.



Rollback A log file saves activities in sequence order. It is possible to undo activities in

reverse order back to the original. This is performed to correct/undo erroneous or malicious transactions after a database has been recovered from a full

backup (Figure 159):

Figure 159 – Rollback


Rollforward Activities recorded in a log file may be replayed. In doing so, all activities are

reapplied to a database. This procedure is used to resynchronise restored database data by adding transactions to the last full backup (Figure 160):

Figure 160 – Rollforward




An example of a transaction log (Figure 161) may look like this:

Figure 161 – Transaction log


An example of a recovery (Figure 162 and Figure 163) may look like this:

Figure 162 – Problem processing




Figure 163 – Recovery processing




4.2 Database applications processing

4.2.1 Database processing environment II

As stated earlier, databases vary considerably in size, scope and the way in

which they are processed. Some databases only have a few forms and reports whereas others are processed via applications using Internet technology, such

as active server pages (ASPs.NET) or Java server pages (JSPs). Databases can also be processed via application programs or stored procedures and triggers.

A database processing environment is clearly complex. It has multiple: Users

Queries Forms

Reports Application programs



4.2.1.1 Processing constraints

Database processing constraints include:

Enforcing referential integrity Cascading modifications

Cascading deletions Data type constraints

Data size constraints Data value constraints

Null constraints Uniqueness constraints

4.2.1.2 Triggers and stored procedures

Enterprise-class DBMS products include features that enable developers to

create modules of logic and database actions (called ‘triggers’ and ‘stored procedures’). Triggers and stored procedures are written in a language

provided by a DBMS.

A trigger is a stored procedure that is automatically invoked by a DBMS when a specified activity occurs (before, after and instead of).

A stored procedure is a module similar to a sub-routine or function that

performs database actions.

4.3 Internet applications processing

Internet application processing is more complicated than traditional application processing. With Internet application processing, the network becomes an

integral part of the application (Figure 164):

Figure 164 – Internet application processing




4.3.1 Application programming interfaces (APIs)

Every DBMS product has an API. An API is a collection of objects, methods and

properties used to execute DBMS functions. Each DBMS has its own API and it varies from one DBMS product to another. To simplify this situation, the

computer industry has developed standards for database access.

4.3.1.1 ODBC and Object Linking and Embedding Databases (OLEDBs)

The ODBC Standard was developed in the early 1990s; such provides DBMSs

with an independent means of processing relational database data.

OLEDBs were created by Microsoft in the mid-1990s; these are object-oriented interfaces that encapsulate data-server functionality. The design is not just

aimed at relational database access but also at many other types of data access. OLEDBs are readily accessible to programming languages, such as C#

and Java. Such is, however, not as accessible to Visual Basic or other scripting languages.

4.3.1.2 Active data objects (ADOs) and ADO.NET

Microsoft developed ADOs as sets of objects to use in OLEDBs. Any language, including Visual Basic, Visual Basic Script and Jscript, can be used in this

instance.

ADOs have been followed by ADO.NET, which is an improved version; it was

developed as part of a Microsoft .NET initiative. The role of ADO.NET in a Web database processing environment is illustrated in Figure 165:

Figure 165 – ADO.NET


4.3.2 N-tier architecture

Tiers refer to the number of computers involved in a Web database application: The workstation with Web browser is the first tier

The Web server is the second tier; this is on the same server as the DBMS The Web server is the third tier; this is not on the same server as the

DBMS

http://en.wikipedia.org/wiki/ODBC

http://en.wikipedia.org/wiki/OLE_DB

http://en.wikipedia.org/wiki/ActiveX_Data_Objects

http://en.wikipedia.org/wiki/ADO.NET



Figure 166 illustrates an ODBC three-tier Web server architecture:

Figure 166 – ODBC three-tier Web server architecture




4.4 Distributed database processing

A database is distributed when it is:

Partitioned Replicated

Both partitioned and replicated

Distributed database processing is fairly straightforward in the case of read-

only replicas but can be very difficult to apply to other installations.



There are different types of distributed database, as shown in Figure 167:

Figure 167 – Types of distributed database


4.4.1 Object-relational database management

Object-Oriented Programming (OOP) is based on objects; it is used as a basis

for many computer programming languages, for example, Java, Visual Basic and C#.

4.4.1.1 Object classes

Object classes have identifiers, properties (data items associated with objects) and methods (programs that allow objects to perform tasks). The only

difference between entity and object classes is methods.

4.4.1.2 Object persistence

Object persistence refers to the values of object properties that are storable and retrievable. Object persistence can be achieved via various techniques,

such as database technologies and relational databases.



4.4.1.3 Object-Oriented DBMSs (OODBMSs)

OODBMSs have never achieved commercial success as it is too expensive to

transfer existing data from relational or Legacy databases. OODBMSs are, therefore, not cost justifiable.

4.4.1.4 Object-relational DBMSs

Some relational DBMS vendors have added object-oriented features to their products, for example, Oracle. These products are known as ‘object-relational

DBMSs’; they support object-relational databases.




This unit explored basic database management aspects. Emphasis was placed on how to resolve highlighted issues.

4.6 Self-assessment

Test your knowledge

1. What is the purpose of database administration?

2. What is the purpose of concurrency control?

3. What is the goal of database security systems?

4. Explain two-phase locking.

5. Explain serialisable transactions.

6. Describe, in your own words, the nature of traditional database processing applications.

7. Name three types of trigger.



8. What are stored procedures and how are they used?

9. Name the three major components of Web database applications.

10. What are BI systems?

11. Name and describe the two main categories of BI systems.

12. What are the three sources of data in BI systems?

Glossary Page 134


Glossary

1:1 An abbreviation for a one-to-one relationship between the rows of two

tables

1:N An abbreviation for a one-to-many relationship between the rows of

two tables.

Active Data Object

(ADO)

An implementation of an object linking and embedding database

accessible via object- and non-object-oriented languages; it is used

primarily as a scripting language interface

Application

Programming

Interface (API)

A set of objects, methods and properties used to access the

functionality of a program

Association entity An entity that represents the combination of at least two objects

Attribute A value that represents a characteristic of an entity/column

Big data An enormous data set created by Web applications

Boyce-Codd

Normal Form

(BCNF)

A relation in third normal form, in which every determinant is a

candidate key

Business

Intelligence (BI)

system

An information system used to assist managers and other professionals

to analyse current and past activities as well as to predict future events

Candidate key An attribute or group of attributes that identify a unique row in a

relation; it is chosen to be the primary key

Cardinality

In a binary relationship, the maximum or minimum number of

elements allowed on each side of the relationship: maximum

cardinality can be 1:1, 1:N or N:M; minimum cardinality can be

optional/optional, optional/mandatory, mandatory/optional or

mandatory/mandatory

Composite key A key in a relation that consists of two or more columns

Data

administration

An enterprise-wide function that effectively uses and controls

organisational data assets

Data Definition

Language (DDL) A language used to describe the structure of a database

Data Manipulation

Language (DML) A language used to describe the processing of a database

Data mart

A facility similar to a data warehouse but with a restricted domain;

data is often restricted to a particular type, business function or

business unit

Data model

A model detailing user data requirements; this is, usually, expressed in

an Entity-Relationship Model/a language used to describe the structure

and processing of a database

Data warehouse

A store of enterprise data that is designed to facilitate management

decision making; this includes not only data, but also meta data, tools,

procedures, training, personnel information and other resources

Database A self-describing collection of related records, or, in the case of a

relational database, related tables

Database

administration

A function that effectively uses and controls a particular database and

its related applications

Database

administrator

A person or group responsible for establishing policies and procedures

(used to control and protect a database)

Database backup A copy of database files that can be used to restore it to some previous

consistent state

Database design A graphic display of tables (files) and their relationships; tables are

shown in rectangles and relationships are shown with lines

Glossary Page 135


Database

Management

System (DBMS)

A set of programs used to define, administer and process a database

and its applications

Database schema A complete logical view of a database

Deadlock A condition that can occur during concurrent processing, in which two

(or more) transactions are waiting to access data

Denormalisation A process of intentionally designing a relation that is not normalised;

this is performed to improve performance or security

Distributed

database A database that is stored and processed on two or more computers

Enterprise-class

database system

A database management system capable of supporting the operating

requirements of a large organisation

Entity Something of importance to a user that needs to be represented in a

database

Entity class A set of entities of the same type

Entity instance A particular occurrence of an entity

Entity-relationship

(ER) Model

A model detailing user data; which is represented by entities and

relationships

First normal form

(1NF) A table that fits the definition of a relation

Foreign key (FK) An attribute that is a key in one or more relations other than the one in

which it appears

Form A structured onscreen presentation of selected data from a database;

this is used to input and read data

Functional

dependency

A relationship between attributes, in which one attribute or group of

attributes determine the value of the other

Identifier In an entity, a group of one or more attributes that determine an entity

instance

Identity (ID)-

dependent entity

An entity that cannot logically exist without the existence of another

entity

Information

Engineering (IE)

Crow’s Foot Model

A system of symbols used to construct Entity-Relationship diagrams in

data modelling and database design

Insertion anomaly In a relation, a condition that exists when adding a complete row to a

table

N:M An abbreviation for a many-to-many relationship between the rows of

two tables

Normal Form (NF)

A rule or set of rules governing the allowed structure of a relation;

rules apply to attributes, functional dependencies, domains and

constraints

Normalisation

A process wherein a relation is evaluated to determine whether it is in

a specified normal form; if necessary, said relation is converted to said

specified normal form

Primary key (PK) A candidate key selected to be the key of a relation; this uniquely

identifies the records (rows) in a table relation

Record A group of fields pertaining to the same entity/in the Relational Model,

a synonym for row and tuple

Referential

integrity constraint A relationship constraint imposed on foreign key values

Relation

A two-dimensional array that contains single value entries and no

duplicate rows; the meaning of columns is the same in every row

whereas the order of rows and columns is immaterial

Relational

database

A database that consists of relations; in practice, it contains relations

with duplicate rows

Glossary Page 136


Relational Model A data model in which data is stored in relations and relationships

between rows are represented by data values

Relational schema A set of relations with referential integrity constraints

Relationship An association between two entities, objects or rows of relations

Report A formatted set of information created to meet user needs

Rollback

A process that involves recovering a database, in which before-images

are applied to a saved copy of said database to return to an earlier

checkpoint or other point

Rollforward

A process that involves recovering a database, in which after-images

are applied to a saved copy of said database to return to an earlier

checkpoint or other point

Row A group of columns in a table; all columns in a row pertain to the same

entity

Second normal

form (2NF)

A relation in first normal form, in which all non-key attributes are

dependent on all keys

Serialisable

isolation level

A transaction isolation level that does not allow dirty, non-repeatable

or phantom reads

Star schema In a dimensional database, the structure of a central fact table is linked

to a dimension table

Stored procedure A collection of structured query language statements stored as a file

that can be invoked via a single command

Strong entity In the Entity-Relationship Model, an entity whose existence in a

database does not depend on the existence of any other entity

Structured Query

Language (SQL)

A language used to define the structure and processing of a relational

database; it can be used as a standalone query language or can be

embedded in application programs

Surrogate key A unique, system-supplied identifier used as the primary key of a

relation

Table A database structure of rows and columns that create cells that hold

data values; it is known as a relation in a relational database

Third normal form

(3NF) A relation in second normal form that has no transitive dependencies

Three-tier

architecture

A Web database processing architecture, in which a database

management system and Web server reside on separate computers

Trigger A special type of stored procedure that is invoked by a database

management system when a specified condition occurs

Two-tier

architecture

A Web database processing architecture, in which a database

management system and Web server reside on the same computer

Unique identifier An identifier that determines exactly one entity instance

Unique key A key that identifies a unique row

Weak entity In the Entity-Relationship Model, an entity whose existence in a

database depends on the existence of another entity

Bibliography Page 137


Bibliography Kroenke, D.M. & Auer, D.J. 2013. Database concepts. 6th edition. New Jersey:

Prentice Hall.

Contact details

Bedfordview Campus

9 Concorde Road East, Bedfordview P.O. Box 1389, Bedfordview, 2008 Tel: +27 (0)10 595 2999, Fax: +27 (0)86 686 4950 Email: [email protected]

Bloemfontein Campus

Tourist Centre, 60 Park Avenue, Willows, Bloemfontein P.O. Box 1015, Bloemfontein, 9300 Tel: +27 (0)51 430 2701, Fax: +27 (0)51 430 2708 Email: [email protected]

Cape Town Campus

The Brookside Building, 11 Imam Haron Str (old Lansdowne Road), Claremont P.O. Box 2325, Clareinch, 7740 Tel: +27 (0)21 674 6567, Fax: +27 (0)21 674 6599 Email: [email protected]

Durban Campus

1 Lunar Row, Umhlanga Ridge, Durban

P.O. Box 20251, Durban North, 4016 Tel: +27 (0)31 564 0570/5, Fax: +27 (0)31 564 8978 Email: [email protected]

Durbanville Campus

Kaapzicht, 9 Rogers Street, Tyger Valley P.O. Box 284, Private Bag X7

Tyger Valley, 7536 Tel: +27 (0)21 914 8000, Fax: +27 (0)21 914 8004 Email: [email protected]

East London Campus

12 Stewart Drive, Berea, East London PostNet Suite 373 Private Bag X9063, East London, 5200 Tel: +27 (0)43 721 2564, Fax: +27 (0)43 721 2597 Email: [email protected]

Nelspruit Campus 50 Murray Street, Nelspruit P.O. Box 9497, Sonpark, Nelspruit, 1206 Tel: +27 (0)13 755 3918, Fax: +27 (0)13 755 3918 Email: [email protected]

Port Elizabeth Campus

Building 4, Ascot Office Park Cnr Ascot & Conyngham Roads, Greenacres P.O. Box 40049, Walmer, 6065 Tel: +27 (0)41 374 7978, Fax: +27 (0)41 374 3190 Email: [email protected]

Potchefstroom Campus

12 Esselen Street, Cnr Esselen Street & Steve Biko Avenue, Die Bult, Potchefstroom P.O. Box 19900, Noordbrug, 2522 Tel: +27 (0)18 297 7760, Fax: +27 (0)18 297 7783 Email: [email protected]

Pretoria Campus

22 Umgazi Street, Menlo Park, Pretoria PostNet Suite A147, Private Bag X18, Lynnwood Ridge, 0040 Tel: +27 (0)12 348 3060, Fax: +27 (0)12 348 3063 Email: [email protected]

Randburg Campus

6 Hunter Avenue, Cnr Bram Fischer Drive Ferndale, Randburg P.O. Box 920, Randburg, 2125 Tel: +27 (0)11 789 3178, Fax: +27 (0)11 789 4606 Email: [email protected]

Vanderbijlpark Campus

Building 2, Cnr Rutherford & Frikkie Meyer Boulevards

Vanderbijlpark P.O. Box 6371, Vanderbijlpark, 1900 Tel: +27 (0)16 931 1180, Fax: +27 (0)16 933 1055 Email: [email protected]

Group Head Office

Management Services Building 44 Alsatian Road, Glen Austin Extension 3, Midrand P.O. Box 1398, Randburg, 2125 Tel: +27 (0)11 467 8422, Fax: +27 (0)86 583 6660 Website: www.cti.ac.za

CTI is part of Pearson, the world’s leading learning company. Pearson is the corporate owner, not a registered

provider nor conferrer of qualifications in South Africa. CTI Education Group (Pty) Ltd. is registered with the Department of Higher Education and Training as a private higher education institution under the

Higher Education Act, 101, of 1997. Registration Certificate number: 2004/HE07/004. www.cti.ac.za.

Database Design Concepts · 2018-02-28 · Compiled by Michael Mapundu, Herbert Zuze and Nyasha...

Documents

Transcript of Database Design Concepts · 2018-02-28 · Compiled by Michael Mapundu, Herbert Zuze and Nyasha...