Chapter 2. The Relational Model - University of Cape Town · Chapter 2. The Relational Model Table...

37
Chapter 2. The Relational Model Table of contents • Objectives • Introduction • Context • Structure of the Relational model Theoretical foundations Uniform representation of data Relation Attribute Domain Tuple Degree Cardinality Primary key Foreign keys Integrity constraints ∗ Nulls ∗ Entity integrity ∗ Referential integrity ∗ General constraints • Data manipulation: The Relational Algebra Restrict Project Union Intersection Difference Cartesian product Division Join Activities ∗ Activity 1: Relational Algebra I ∗ Activity 2: Relational Algebra II ∗ Activity 3: Relational Algebra III • Review questions • Discussion topics • Additional content and activities Objectives At the end of this chapter you should be able to: • Describe the structure of the Relational model, and explain why it provides 1

Transcript of Chapter 2. The Relational Model - University of Cape Town · Chapter 2. The Relational Model Table...

Chapter 2. The Relational Model

Table of contents

• Objectives• Introduction• Context• Structure of the Relational model

– Theoretical foundations– Uniform representation of data– Relation– Attribute– Domain– Tuple– Degree– Cardinality– Primary key– Foreign keys– Integrity constraints

∗ Nulls∗ Entity integrity∗ Referential integrity∗ General constraints

• Data manipulation: The Relational Algebra– Restrict– Project– Union– Intersection– Difference– Cartesian product– Division– Join– Activities

∗ Activity 1: Relational Algebra I∗ Activity 2: Relational Algebra II∗ Activity 3: Relational Algebra III

• Review questions• Discussion topics• Additional content and activities

Objectives

At the end of this chapter you should be able to:

• Describe the structure of the Relational model, and explain why it provides

1

a simple but well-founded approach to the storage and manipulation ofdata.

• Explain basic concepts of the Relational model, such as primary and for-eign keys, domains, null values, and entity and referential integrity.

• Be able to discuss in terms of business applications, the value of the aboveconcepts in helping to preserve the integrity of data across a range ofapplications running on a corporate database system.

• Explain the operators used in Relational Algebra.

• Use Relational Algebra to express queries on Relational databases.

Introduction

In parallel with this chapter, you should read Chapter 3 and Chapter 4 ofThomas Connolly and Carolyn Begg, “Database Systems A Practical Approachto Design, Implementation, and Management”, (5th edn.).

The aim of this chapter is to explain in detail the ideas underlying the Relationalmodel of database systems. This model, developed through the ’70s and ’80s,has grown to be by far the most commonly used approach for the storing andmanipulation of data. Currently all of the major suppliers of database systems,such as Oracle, IBM with DB2, Sybase, Informix, etc, base their products onthe Relational model. Two of the key reasons for this are as follows.

Firstly, there is a widely understood set of concepts concerning what constitutesa Relational database system. Though some of the details of how these ideasshould be implemented continue to vary between different database systems,there is sufficient consensus concerning what a Relational database should pro-vide that a significant skill base has developed in the design and implementationof Relational systems. This means that organisations employing Relational tech-nology are able to draw on this skill-base, as well as on the considerable literatureand consultancy know-how available in Relational systems development.

The second reason for the widespread adoption of the Relational model is ro-bustness. The core technology of most major Relational products has been triedand tested over the last 12 or so years. It is a major commitment for an organi-sation to entrust the integrity, availability and security of its data to software.The fact that Relational systems have proved themselves to be reliable and se-cure over a significant period of time reduces the risk an organisation faces incommitting what is often its most valuable asset, its data, to a specific softwareenvironment.

Relational Algebra is a procedural language which is a part of the Relationalmodel. It was originally developed by Dr E. F. Codd as a means of accessing datain Relational databases. It is independent of any specific Relational databaseproduct or vendor, and is therefore useful as an unbiased measure of the power

2

of Relational languages. We shall see in a later chapter the further value ofRelational Algebra, in helping gain an understanding of how transactions areprocessed internally within the database system.

Context

The Relational model underpins most of the major database systems in com-mercial use today. As such, an understanding of the ideas described in thischapter is fundamental to these systems. Most of the remaining chapters ofthe module place a strong emphasis on the Relational approach, and even inthose that examine research issues that use a different approach, such as thechapter on Object databases, an understanding of the Relational approach isrequired in order to draw comparisons and comprehend what is different aboutthe new approach described. The material on Relational Algebra provides avendor-independent and standard approach to the manipulation of Relationaldata. This information will have particular value when we move on to learn theStructured Query Language (SQL), and also assist the understanding of howdatabase systems can alter the ways queries were originally specified to reducetheir execution time, a topic covered partially in the chapter called DatabaseAdministration and Tuning.

Structure of the Relational model

Theoretical foundations

Much of the theory underpinning the Relational model of data is derived frommathematical set theory. The seminal work on the theory of Relational databasesystems was developed by Dr E. F. Codd, (Codd 1971). The theoretical devel-opment of the model has continued to this day, but many of the core principleswere described in the papers of Codd and Date in the ’70s and ’80s.

The application of set theory to database systems provides a robust foundationto both the structural and data-manipulation aspects of the Relational model.The relations, or tables, of Relational databases, are based on the concepts ofmathematical sets. Sets in mathematics contain members, and these correspondto the rows in a relational table. The members of a set are unique, i.e. duplicatesare not allowed. Also the members of a set are not considered to have anyorder; therefore, in Relational theory, the rows of a relation or table cannot beassumed to be stored in any specific order (note that some database systemsallow this restriction to be overridden at the physical level, as in some situations,for example to improve the performance response of the database, it can bedesirable to ensure the ordering of records in physical storage).

3

Uniform representation of data

We saw in the previous chapter, that the Relational model uses one simple datastructure, the table, to represent information. The rows in tables correspondto specific instances of records, for example, a row of a customer table containsinformation about a particular customer. Columns in a table contain informa-tion about a particular aspect of a record, for example, a column in a customerrecord might contain a customer’s contact telephone number.

Much of the power and robustness of the Relational approach derives from theuse of simple tabular structures to represent data. To illustrate this, considerthe information that might typically be contained in part of the data dictionaryof a database. In a data dictionary, we will store the details of logical tablestructures, the physical allocations of disk space to tables, security information,etc. This data dictionary information will be stored in tables, in just the sameway that, for example, customer and other information relevant to the end usersof the system will be stored.

This consistency of data representation means that the same approach for dataquerying and manipulation can be applied throughout the system.

Relation

A relation is a table with columns and tuples. A database can contain as manytables as the designer wants. Each table is an implementation of a real-worldentity. For example, the university keeps information about students. A studentis represented as an entity during database design stage. When the design isimplemented, a student is represented as a table. You will learn about databasedesign is later modules.

Attribute

An attribute is a named column in the table. A table can contain as manyattributes as the designer wants. Entities identified during database designmay contain attributes/characteristics that describe the entity. For example, astudent has a student identification number and a name. Student identificationand name will be implemented as columns in the student table.

Domain

The domain is the allowable values for a column in the table. For example,a name of a student can be made of a maximum of 30 lower and upper casecharacters. Any combination of lower and upper case characters less or equalto 30 is the domain for the name column.

4

Tuple

A tuple is the row of the table. Each tuple represents an instance of an entity.For example, a student table can contain a row holding information about Moses.Moses is an instance of the student entity.

Degree

The degree of a relation/table is the number of columns it contains.

Cardinality

The cardinality of a relation/table is the number of rows it contains.

Primary key

In the previous chapter, we described the use of primary keys to identify eachof the rows of a table. The essential point to bear in mind when choosing aprimary key is that it must be guaranteed to be unique for each different row ofthe table, and so the question you should always ask yourself is whether thereis any possibility that there could be duplicate values of the primary key underconsideration. If there is no natural candidate from the data items in the tablethat may be used as the primary key, there is usually the option of using asystem-generated primary key. This will usually take the form of an ascendingsequence of numbers, a new number being allocated to each new instance of arecord as it is created. System-generated primary keys, such as this, are knownas surrogate keys. A drawback to the use of surrogate keys is that the uniquenumber generated by the system has no other meaning within the application,other than serving as a unique identifier for a row in a table. Whenever the op-tion exists therefore, it is better to choose a primary key from the available dataitems in the rows of a table, rather than opting for an automatically generatedsurrogate key.

Primary keys may consist of a single column, or a combination of columns. Anexample of a single table column would be the use of a unique employee numberin a table containing information about employees.

As an example of using two columns to form a primary key, imagine a table inwhich we wish to store details of project tasks. Typical data items we mightstore in the columns of such a task table might be: the name of the task, datethe task was started, expected completion date, actual completion date, andthe employee number of the person responsible for ensuring the completion ofthe task. There is a convenient, shorthand representation for the description ofa table as given above: we write the name of the table, followed by the name of

5

the columns of the table in brackets, each column being separated by a comma.For example:

TASK (TASK_NAME, START_DATE, EXPECTED_COMP_DATE,COMP_DATE, EMPNO)

We shall use this convention for expressing the details of the columns of a tablein examples in this and later chapters. Choosing a primary key for the task tableis straightforward while we can assume that task names are unique. If that isthe case, then we may simply use task name as the primary key. However, if wedecide to store in the task table, the details of tasks for a number of differentprojects, it is less likely that we can still be sure that task names will be unique.For example, supposing we are storing the details of two projects, the first tobuy a new database system, and the second to move a business to new premises.For each of these projects, we might have a task called ‘Evaluate alternatives’. Ifwe wish to store the details of both of these tasks in the same task table, we cannow no longer use TASK_NAME as a unique primary key, as it is duplicatedacross these two tasks.

As a solution to this problem, we can combine the TASK_NAME column withsomething further to add the additional context required to provide a uniqueidentifier for each task. In this case, the most sensible choice is the project name.So we will use the combination of the PROJECT_NAME and TASK_NAMEdata items in our task table in order to identify uniquely each of the tasks inthe table. The task table becomes:

TASK (PROJECT_NAME, TASK_NAME, START_DATE, EXPECTED_COMP_DATE,COMP_DATE, EMPNO)

We may, on occasions, choose to employ more than two columns as a primarykey in a table, though where possible this should be avoided as it is both un-wieldy to describe, and leads to relatively complicated expressions when it comesto querying or updating data in the database. Notice also that we might haveused a system-generated surrogate key as the solution to the problem of pro-viding a primary key for tasks, but the combination of PROJECT_NAME andTASK_NAME is a much more meaningful key to users of the application, andis therefore to be preferred.

Foreign keys

Very often we wish to relate information stored in different tables. For example,we may wish to link together the tasks stored in the task table described above,with the details of the projects to which those tasks are related. The simplicityby which this is achieved within the Relational model, is one of the model’smajor strengths. Suppose the Task and Project tables contain the followingattributes:

6

TASKS (TASK_NAME, START_DATE, EXP_COMP_DATE, COMP_DATE,EMPNO)

PROJECT (PROJECT_NAME, START_DATE, EXP_COMP_DATE,COMP_DATE, PROJECT_LEADER)

We assume that TASK_NAME is an appropriate primary key for the TASKtable, and PROJECT_NAME is an appropriate primary key for the PROJECTtable.

In order to relate a record of a task in a task table to a record of a correspondingproject in a project table, we use a concept called a foreign key. A foreign key issimply a piece of data that allows us to link two tables together. In the case ofthe projects and tasks example, we will assume that each project is associatedwith a number of tasks. To form the link between the two tables, we place theprimary key of the PROJECT table into the TASK table. The task table thenbecomes:

TASK (TASK_NAME, START_DATE, EXP_COMP_DATE, COMP_DATE,EMPNO, PROJECT_NAME)

Through the use of PROJECT_NAME as a foreign key, we are now able to see,for any given task, the project to which it belongs. Specifically, the tasks associ-ated with a particular project can now be identified simply by virtue of the factthat they contain that project’s name as a data item as one of their attributes.Thus, all tasks associated with a research project called GENOME RESEARCH,will contain the value of GENOME RESEARCH in their PROJECT_NAMEattribute.

The beauty of this approach is that we are forming the link using a data item.We are still able to maintain the tabular structure of the data in the database,but can relate that data in whatever ways we choose. Prior to the developmentof Relational systems, and still in many non-Relational systems today, ratherthan using data items in this way to form the link between different entity typeswithin a database, special link items are used, which have to be created, alteredand removed. These activities are in addition to the natural insertions, updatesand deletions of the data itself. By using a uniform representation of data bothfor the data values themselves, and the links between different entity types, weachieve uniformity of expression of queries and updates on the data.

The need to link entity types in this way is a requirement of all, other than themost trivial of, database applications. That it occurs so commonly, allied withthe simplicity of the mechanism for achieving it in Relational systems, has beena major factor in the widespread adoption of Relational databases.

Below is the summary of the concepts we have covered so far:

7

Integrity constraints

Integrity constraints are restrictions that affect all the instances of the database.

Nulls

There is a standard means of representing information that is not currentlyknown or unavailable within Relational database systems. We say that a columnfor which the value is not currently known, or for which a value is not applicable,is null. Null values have attracted a large amount of research within the databasecommunity, and indeed for the developers and users of database systems theycan be an important consideration in the design and use of database applications(as we shall see in the chapters on the SQL language).

An important point to grasp about null values is that they are a very specific wayof representing the fact that the data item in question literally is not currentlyset to any value at all. Prior to the use of null values, and still in some systemstoday, if it is desired to represent the fact that a data item is not currently setto some value, an alternative value such as 0, or a blank space, will be givento that data item. This is poor practice, as of course 0, or a blank space, areperfectly legitimate values in their own right. Use of null values overcomes thisproblem, in that null is a value whose meaning is simply that there is no valuecurrently allocated to the data item in question.

There are a number of situations in which the use of null values is appropriate.In general we use it to indicate that a data item currently has no value allocatedto it. Examples of when this might happen are:

• When the value of the data item is not yet known.

8

• When the value for that data item is yet to be entered into the system.

• When it is not appropriate that this particular instance of the data itemis given a value.

An example of this last situation might be where we are recording the detailsof employees in a table, including their salary and commission. We would storethe salaries of employees in one table column, and the details of commission inanother. Supposing that only certain employees, for example sales staff, are paidcommission. This would mean that all employees who are not sales staff wouldhave the value of their commission column set to null, indicating that they arenot paid commission. The use of null in this situation enables us to representthe fact that some commissions are not set to any specific value, because it isnot appropriate to pay commission to these staff.

Another result of this characteristic of null values is that where two data itemsboth contain null, if you compare them with one another in a query language,the system will not find them equal. Again, the logic behind this is that thefact that each data item is null does not mean they are equal, it simply meansthat they contain no value at all.

Entity integrity

As briefly discussed in Chapter 1, in Relational databases, we usually use eachtable to store the details of particular entity types in a system. Therefore, wemay have a table for Customers, Orders, etc. We have also seen the importanceof primary keys in enabling us to distinguish between different instances ofentities that are stored in the different rows of a table.

Consider for the moment the possibility of having null values in primary keys.What would be the consequences for the system?

Null values denote the fact that the data item is not currently set to any realvalue. Imagine, however, that two rows in a table are the same, apart from thefact that part of their primary keys are set to null. An attempt to test whetherthese two entity instances are the same will find them not equal, but is thisreally the case? What is really going on here is that the two entity instancesare the same, other than the fact that a part of their primary keys are as yetunknown. Therefore, the occurrence of nulls in primary keys would stop usbeing able to compare entity instances. For this reason, the column or columnsused to form a primary key are not allowed to contain null values. This rule isknown as the Entity Integrity Rule, and is a part of the Relational theory thatunderpins the Relational model of data. The rule does not ensure that primarykeys will be unique, but by not allowing null values to be included in primarykeys, it does avoid a major source of confusion and failure of primary keys.

Referential integrity

9

If a foreign key is present in a given table, it must either match some candidatevalue in the home table or be set to null. For example, in our project and taskexample above, every value of PROJECT_NAME in the task table must existas a value in the PROJECT_NAME column of the project table, or else it mustbe set to null.

General constraints

These are additional rules specified by the users or database administrators of adatabase, which define or constrain some aspect of the enterprise. For example,a database administrator can contain the PROJECT_NAME column to havea maximum of 30 characters for each value inserted.

Data manipulation: The Relational Algebra

Restrict

Restrict (also known as ‘select’) is used on a single relation, producing a newrelation by excluding (restricting) tuples in the original relation from the newrelation if they do not satisfy a condition; thus only the tuples required areselected.

This operation has the effect of choosing certain tuples (rows) from the table,as illustrated in the diagram below.

The use of the term ‘select’ here is quite specific as an operation in the RelationalAlgebra. Note that in the database query language SQL, all queries are phrasedusing the term ‘select’. The Relational Algebra ‘select’ means ‘extract tupleswhich meet specific criteria’. The SQL ‘select’ is a command that means ‘producea table from existing relations using Relational Algebra operations’.

10

Using the Relational Algebra, extract from the relation Singers those individualswho are over 40 years of age, and create a new relation called Mature-Singers.

Relational Algebra operation: Restrict from Singers where age > 40 givingMature-Singers

We can see which tuples are chosen from the relation Singers, and these areidentified below:

The new relation, Mature-Singers, contains only those tuples for singers agedover 40 extracted from the relation Singers. These were shown highlighted inthe relation above.

Note that Anne Freeman (age 40) is not shown. The query explicitly statedthose over 40, and therefore anyone aged exactly 40 is excluded.

If we wanted to include singers aged 40 and above, we could use either of thefollowing operations which would have the same effect.

Either:

Relational Algebra operation: Restrict from Singers where age >= 40 givingMature-Singers2

11

or

Relational Algebra operation: Restrict from Singers where age > 39 givingMature-Singers2

The result of either of these operations would be as shown below:

Project

Project is used on a single relation, and produces a new relation that includesonly those attributes requested from the original relation. This operation hasthe effect of choosing columns from the table, with duplicate entries removed.

The task here is to extract the names of the singers, without their ages or anyother information as shown in the diagram below. Note that if there were twosingers with the same name, they would be distinguished from one another inthe relation Singers by having different singer-id numbers. If only the names areextracted, the information that there are singers with the same name will notbe preserved in the new relation, as only one copy of the name would appear.

12

The column for the attribute singer-names is shown highlighted in the relationabove.

The following Relational Algebra operation creates a new relation with singer-name as the only attribute.

Relational Algebra operation: Project Singer-name over Singers giving Singer-names

13

Union

Union (also called ‘append’) forms a new relation of all tuples from two or moreexisting relations, with any duplicate tuples deleted. The participating relationsmust be union compatible.

It can be seen that the relations Singers and Actors have the same number ofattributes, and these attributes are of the same data types (the identificationnumbers are numeric, the names are character fields, the address field is alphanu-meric, and the ages are integer values). This means that the two relations areunion compatible. Note that it is not important whether the relations have thesame number of tuples.

The two relations Singers and Actors will be combined in order to produce anew relation Performers, which will include details of all singers and all actors.An individual who is a singer as well as an actor need only appear once in thenew relation. The Relational Algebra operation union (or append) will be used

14

in order to generate this new relation.

The union operation will unite these two union-compatible relations to form anew relation containing details of all singers and actors.

Relational Algebra operation: Union Singers and Actors giving Performers

We could also express this activity in the following way:

Relational Algebra operation: Union Actors and Singers giving Performers

These two operations would generate the same result; the order in which theparticipating relations are given is unimportant. When an operation has thisproperty it is known as commutative - other examples of this include additionand multiplication in arithmetic. Note that this does not apply to all RelationalAlgebra operations.

The new relation Performers contains one tuple for every tuple that was in therelation Singers or the relation Actors; if a tuple appeared in both Singers and

15

Actors, it will appear only once in the new relation Performers. This is whythere is only one tuple in the relation Performers for each of the individualswho are both actors and singers (Helen Drummond and Desmond Venables), asthey appear in both the original relations.

Intersection

Intersection creates a new relation containing tuples that are common to boththe existing relations. The participating relations must be union compatible.

16

As before, it can be seen that the relations Singers and Actors are union compat-ible as they have the same number of attributes, and corresponding attributesare of the same data type.

We can see that there are some tuples that are common to both relations, asillustrated in the diagram below. It is these tuples that will form the newrelation as a result of the intersection operation.

17

The Relational Algebra operation intersection will extract the tuples commonto both relations, and use these tuples to create a new relation.

Relational Algebra operation: Actors Intersection Singers giving Actor-Singers

The intersection of Actors and Singers produces a new relation containing onlythose tuples that occur in both of the original relations. If there are no tuplesthat are common to both relations, the result of the intersection will be anempty relation (i.e. there will be no tuples in the new relation).

Note that an empty relation is not the same as an error; it simply means thatthere are no tuples in the relation. An error would occur if the two relationswere found not to be union compatible.

Difference

Difference (sometimes referred to as ‘remove’) forms a new relation by excludingtuples from one relation that occur in another. The resulting relation containstuples that were present in the first relation only, but not those that occur inboth the first and the second relations, or those that occur in the second relationalone. This operation can be regarded as ‘subtracting’ tuples in one relationfrom another relation. The participating relations must be union compatible.

18

If we want to find out which actors are not also singers, the following RelationalAlgebra operation will achieve this:

Relational Algebra operation: Actors difference Singers giving Only-Actors

The result of this operation is to remove from the relation Actors those tuplesthat also occur in the relation Singers. In effect, we are removing the tuples inthe intersection of Actors and Singers in order to create a new relation that con-tains only actors. The diagram below shows which tuples are in both relations.

19

The new relation Only-Actors contains tuples from the relation Actors only ifthey were not also present in the relation Singers.

It can be seen that this operation produces a new relation Only-Actors contain-ing all tuples in the relation Actors except those that also occur in the relationSingers.

20

If it is necessary to find out which singers are not also actors, this can be doneby the relational operation ‘remove Actors from Singers’, or ‘Singers differenceActors’. This operation would not produce the same result as ‘Actors differenceSingers’ because the relational operation difference is not commutative (theparticipating relations cannot be expressed in reverse order and achieve thesame result).

Relational Algebra operation: Singers difference Actors giving Only-Singers

The effect of this operation is to remove from the relation Singers those tuplesthat are also present in the relation Actors. The intersection of the two relationsis removed from the relation Singers to create a new relation containing tuples ofthose who are only singers. The intersection of the relations Singers and Actorsis, of course, the same as before.

21

This operation produces a new relation Only-Singers containing all tuples in therelation Singers except those that also occur in the relation Actors. The tuplesthat are removed from one relation when the difference between two relationsis generated are those that are in the intersection of the two relations.

Cartesian product

If a relation called Relation-A has a certain number of tuples (call this numberN), this can be represented as NA (meaning the number of tuples in Relation-A).Similarly, Relation-B may have a different number of tuples (call this numberM), which can be shown as MB (meaning the number of tuples in Relation-B).

The resulting relation from the operation ‘Relation-A Cartesian productRelation-B’ forms a new relation containing NA * MB tuples (meaning thenumber of tuples in Relation-A times the number of tuples in Relation-B).Each tuple in the new relation created as a result of this operation will consistof each tuple from Relation-A paired with each tuple from Relation-B, whichincludes all possible combinations.

22

In order to create a new relation which pairs each singer with each role, we needto use the relational operation Cartesian product.

Relational Algebra operation: Singers Cartesian product Roles giving Singers-Roles

23

The result of Singers Cartesian product Roles gives a new relation Singers-Roles,showing each tuple from one relation with each tuple of the other, producing thetuples in the new relation. Each singer is associated with all roles; this producesa relation with 16 tuples, as there were 8 tuples in the relation Singers, and 2tuples in the relation Roles.

What do you think would be the result of the following?

Relational Algebra operation: Singers Cartesian product Singers giving Singers-Extra

Relational Algebra operation: Actors Cartesian product Roles giving Actors-Roles

Division

The Relational Algebra operation ‘Relation-A divide by Relation-B givingRelation-C’ requires that the attributes of Relation-B must be a subset of thoseof Relation-A. The relations do not need to be union compatible, but they musthave some attributes in common. The attributes of the result, Relation-C, willalso be a subset of those of Relation-A. Division is the inverse of Cartesianproduct; it is sometimes easier to think of the operation as similar to divisionin simple algebra and arithmetic.

In arithmetic:

24

If, for example:

Value-A = 2

Value-B = 3

Value-C = 6

then: Value-C is the result of Value-A times Value-B (i.e. 6 = 2 * 3) and: Value-C divided by Value-A gives Value-B as a result(i.e. 6/2 = 3) and: Value-Cdivided by Value-B gives Value-A as a result (i.e. 6/3 = 2)

In Relational Algebra:

If:

Relation-A = Singers

Relation-B = Roles

Relation-C = Singers-Roles

then: Relation-C is the result of Relation-A Cartesian product Relation-B and:Relation-C divided by Relation-A gives Relation-B as a result

and: Relation-C divided by Relation-B gives Relation-A as a result.

If we start with two relations, Singers and Roles, we can create a new relationSingers-Roles by performing the Cartesian product of Singers and Roles. Thisnew relation shows every role in turn with every singer.

25

The new relation Singers-Roles has a special relationship with the relationsSingers and Roles from which it was created, as will be demonstrated below.

We can see that the attributes of the relation Singers are a subset of the at-tributes of the relation Singers-Roles. Similarly, the attributes of the relationRoles are also a subset (although a different subset) of the relation Singers-Roles.

If we now divide the relation Singers-Roles by the relation Roles, the resultingrelation will be the same as the relation Singers.

Relational Algebra operation: Singers-Roles divide by Roles giving Our-Singers

26

Similarly, if we divide the relation Singers-Roles by the relation Singers, therelation that results from this will be the same as the original Roles relation.

Relational Algebra operation: Singers-Roles divide by Singers giving Our-Roles

Note that there are only two tuples in this relation, although the attributesrole-id and role-name appeared eight times in the relation Singer-Roles, as eachrole was associated with each singer in turn.

In the case where not all tuples in one relation have corresponding tuples in the‘dividing’ relation, the resulting relation will only contain those tuples whichare represented in both the ‘dividing’ and ‘divided’ relations. In such a case itwould not be possible to recreate the ‘divided’ relation from a Cartesian productof the ‘dividing’ and resulting relations. The next example demonstrates this.

Consider the relation Recordings shown below, which holds details of the songsrecorded by each of the singers.

27

Three individuals, Chris, Mel and Sam, have each created two new relationslisting their favourite songs and their favourite singers. The use of the divisionoperation will enable Chris, Mel and Sam to find out which singers have recordedtheir favourite songs, and also which songs their favourite singers have recorded.

The table above contains Chris’s favourite songs. In order to find out whichsingers have made a recording of these songs, we need to divide this relationinto the Recordings relation. The result of this Relational Algebra operation isa new relation containing the details of singers who have recorded all of Chris’sfavourite songs. Singers who have recorded some, but not all, of Chris’s favouritesongs are not included.

Relational Algebra operation: Recordings divide by Chris-Favourite-Songs giv-

28

ing Chris-Singers-Songs

Note that the singers in this relation are not the same as those in the relationChris-Favourite-Singers. The reason for this is that Chris’s favourite singershave not all recorded Chris’s favourite songs.

The next relation shows Chris’s favourite singers. Chris wants to know whichsongs these singers have recorded. If we divide the Recordings relation by thisrelation, we will get a new relation that contains the songs that all these singershave recorded; songs that have been recorded by some, but not all, of the singerswill not be included in the new relation.

The singers in this relation are not the same as those who sing Chris’s favouritesongs. The reason for this is that not all of Chris’s favourite singers haverecorded these songs.

In order to discover the songs recorded by Chris’s favourite singer, we need todivide the relation Recordings by Chris-Favourite-Songs.

Relational Algebra operation: Recordings divide by Chris-Favourite-Songs giv-ing Chris-Songs-by-Singers

The new relation created by this operation will provide us with the informationabout which of Chris’s favourite singers has recorded all of Chris’s chosen songs.Any singer who has recorded some, but not all, of Chris’s favourite songs willbe excluded from this new relation.

29

We can see that the relation created to identify the songs recorded by Chris’sfavourite singers is not the same as Chris’s list of favourite songs, because thesesingers have not all recorded the songs listed as Chris’s favourites.

In this example, we have been able to generate two new relations by dividinginto the Recordings relation. These two new relations do not correspond withthe other two relations that were divided into the relation Recordings, becausethere is no direct match. This means that we could not recreate the Record-ings relation by performing a Cartesian product operation on the two relationscontaining Chris’s favourite songs and singers.

In the next example, we will identify the songs recorded by Mel’s favouritesingers, and which singers have recorded Mel’s favourite songs. In commonwith Chris’s choice, we will find that the singers and the songs do not matchas not all singers have recorded all songs. If all singers had recorded all songs,the relation Recordings would be the result of Singers Cartesian product Songs,but this is not the case.

Mel’s favourite songs include all those that have been recorded, but not allsingers have recorded all songs. Mel wants to find out who has recorded thesesongs, but the result will only include those singers who have recorded all thesongs.

There are no other songs that have been recorded from the list available; Melhas indicated that all of these are favourites.

The relational operation below will produce a new relation, Mel-Singers-Songs,

30

which will contain details of those singers who have recorded all of Mel’sfavourite songs.

Relational Algebra operation: Recordings divide by Mel-Favourite-Songs givingMel-Singers-Songs

We can see from this relation, and the one below, that none of Mel’s favouritesingers has recorded all of the songs selected as Mel’s favourites.

We can use the relation Mel-Favourite-Singers to find out which songs have beenrecorded by both these performers.

Relational Algebra operation: Recordings divide by Mel-Favourite-Singers giv-ing Mel-Songs-by-Singers

There is only one song that has been recorded by the singers Mel has chosen, asshown in the relation below.

We can now turn our attention to Sam’s selection of songs and singers. Ithappens that Sam’s favourite song is the same one that Mel’s favourite singershave recorded.

31

If we now perform the reverse query to find out who has recorded this song, wefind that there are more singers who have recorded this song than the two whowere Mel’s favourites. The reason for this difference is that Mel’s query wasto find out which song had been recorded by particular singers. This contrastswith Sam’s search for any singer who has recorded this song.

Relational Algebra operation: Recordings divide by Sam-Favourite-Songs givingSam-Singer-Songs

Here we can see that Mel’s favourite singers include the performers who haverecorded Sam’s favourite song, but there are many other singers who have alsomade a recording of the same song. Indeed, there are only two singers who havenot recorded this song (Desmond Venables and Swee Hor Tan). This could beconsidered unfortunate for Sam, as these are the only two singers named asSam’s favourites.

32

We know that the only two singers who have not recorded Sam’s favourite songare in fact Sam’s favourite singers. It is now our task to discover which songsthese two singers have recorded.

Relation Algebra operation: Recordings divide by Sam-Favourite-Singers givingSam-Songs-by-Singers

This operation creates a new relation that reveals the identity of the song thathas been recorded by all of Sam’s favourite singers.

There is only one song that these two singers have recorded. Indeed, Swee HorTan has only recorded this song, and therefore whatever other songs had beenrecorded by Desmond Venables, this song is the only one that fulfils the criteriaof being recorded by both these performers.

Join

Join forms a new relation with all tuples from two relations that meet a condition.The relations might happen to be union compatible, but they do not have tobe.

The following two relations have a conceptual link, as the stationery orders havebeen made by some of the singers. Invoices can now be generated for each singerwho placed an order. (Note that we would not wish to use Cartesian producthere, as not all singers have placed an order, and not all orders are the same.)

The relation Orders identifies the stationery items (from the Stationery relation)that have been requested, and shows which customer ordered each item (here

33

the customer-id matches the singer-id).

The Singers relation contains the names and addresses of all singers (who arethe customers), allowing invoices to be prepared by matching the customers whohave placed orders with individuals in the Singers relation.

Relational Algebra operation: Join Singers to Orders where Singers Singer-id =Orders Customer-id giving Invoices

The relation Singers is joined to the relation Orders where the attribute Singer-id in Singers has the same value as the attribute Customer-id in Orders, to forma new relation Invoices.

34

The attribute which links together the two relations (the identification number)occurs in both original relations, and thus is found twice in the resulting newrelation; the additional occurrence can be removed by means of a project oper-ation. A version of the join operation, in which such a project is assumed tooccur automatically, is known as a natural join.

The types of join operation that we have used so far, and that are in fact by farmost commonly in use, are called equi-joins. This is because the two attributesto be compared in the process of evaluating the join operation are compared forequality with one another. It is possible, however, to have variations on the joinoperation using operators other than equality. Therefore it is possible to havea GREATER THAN (>) JOIN, or a LESS THAN (<) JOIN.

It would have been possible to create the relation Invoices by producing theCartesian product of Singers and Orders, and then selecting only those tupleswhere the Singer-id attribute from Singers and the Customer-id attribute fromOrders has the same value. The join operation enables two relations whichare not union compatible to be linked together to form a new relation withoutgenerating a Cartesian product, and then extracting only those tuples whichare required.

Activities

Activity 1: Relational Algebra I

Let X be the set of student tuples for students studying databases, and Y theset of students who started university in 1995. Using this information, whatwould be the result of:

1. X union Y

2. X intersect Y

3. X difference Y

35

Activity 2: Relational Algebra II

Describe, using examples, the characteristics of an equi-join and a natural join.

Activity 3: Relational Algebra III

Consider the following relation A with attributes X and Y,

and a relation B with only one attribute (attribute Y). Assume that attribute Yof relation A and the attribute of relation B are defined on a common domain.What would be the result of A divided by B if:

1. B = Attribute Y C1

2. B = Attribute Y C2 C3

Review questions

1. Briefly describe the theoretical foundations of Relational database sys-tems.

2. Describe what is meant if a data item contains the value ‘null’.

3. Why is it necessary sometimes to have a primary key that consists of morethan one attribute?

36

4. What happens if you test two attributes, each of which contains the valuenull, to find out if they are equal?

5. What is the Entity Integrity Rule?

6. How are tables linked together in the Relational model?

7. What is Relational Algebra?

8. Explain the concept of union compatibility.

9. Describe the operation of the Relational Algebra operators RESTRICT,PROJECT, JOIN and DIVIDE.

Discussion topics

1. Now that you have been introduced to the structure of the Relationalmodel, and having seen important mechanisms such as primary keys, do-mains, foreign keys and the use of null values, discuss what you feel atthis point to be the strengths and weaknesses of the model. Bear in mindthat, although the Relational Algebra is a part of the Relational model,it is not generally the language used for manipulating data in commer-cial database products. That language is SQL, which will be covered insubsequent chapters.

2. Consider the operations of Relational Algebra. Why do you think Rela-tional Algebra is not used as a general approach to querying and manip-ulating data in Relational databases? Given that it is not used as such,what value can you see in the availability of a language for manipulatingdata which is not specific to any one developer of database systems?

Additional content and activities

As we have seen, the Relational Algebra is a useful, vendor-independent, stan-dard mechanism for discussing the manipulation of data. We have seen, however,that the Relational Algebra is rather procedural and manipulates data one stepat a time. Another vendor-independent means of manipulating data has beendeveloped, known as Relational Calculus. For students interested in investigat-ing the language aspect of the Relational model further, it would be valuable tocompare what we have seen so far of the Relational Algebra, with the approachused in Relational Calculus. Indeed, it possible to map expressions betweenthe Algebra and the Calculus, and it has also been shown that it is possible toconvert any expression in one language to an equivalent expression in the other.In this sense, the Algebra and Calculus are formally equivalent.

37