CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

Post on 30-Dec-2015

78 views 8 download

Tags:

description

CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. Introduction. - We discuss here two mathematical formalisms which can be used as the basis for stating and obeying queries in some specific user-oriented language.  SQL is based on relational algebra. - PowerPoint PPT Presentation

Transcript of CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS

1

CHAPTER 4

RELATIONAL ALGEBRA AND CALCULUS

2

Introduction- We discuss here two mathematical formalisms which

can be used as the basis for stating and obeying queries in some specific user-oriented language.

SQL is based on relational algebra.

- QBE (not included in the new edition of the text) and Datalog (see chapter 24) are based on relational calculus.

- The mathematical formalisms introduce various mathematical operators whose application yields answers to the queries.

3

PRELIMINARIES- The inputs and outputs of a query are relational instances.- We shall present the mathematical operators as operating on

relations (sets of lists). However, the cost of eliminating duplicate rows has also led to the practical usage of “multisets” (also called bags) rather than sets. Therefore we shall also present the necessary mathematical operations as operating on relations which consist of multisets of rows.

- We shall always illustrate the mathematical operations through examples. To that effect we shall use a sample database defined by the schema (shown on p. 101) and the instances shown in figures 4.1, 4.2, 4.3. In some cases we shall use larger instances under the same schema.

-It will be noted that queries can always be expressed by several different expressions. We shall often look at these various approaches.

-We shall give unique identifying numbers to our illustrative queries.

4

SAMPLE SCHEMAS AND INSTANCES

The Schemas:Sailors(sid: integer, sname: string, rating: integer, age: real)Boats(bid: integer, bname: string, color: string)Reserves(sid: integer, bid: integer, day: date)

The Instances:

5

SEMANTICS OF THE SAMPLE RELATIONS

• Sailors: Entity set; lists the relevant properties of sailors.

• BoatsBoats: Entity set; lists the relevant properties of boats.

• Reserves: Relationship set: links sailors and boats by describing the boat number and date for which a sailor made a reservation.

Example of the declarative sentences for which rows stand:

Row 1: “Sailor ’22’ reserved boat number ‘101’ on 10/10/98”.

N.B. The declarative sentence is obvious in this case, but this may not always be the case, especially for relations which express complex relationship sets with complicated constraints. For example we might add the constraint that a sailor can only reserve one boat on any given date; or, worse yet that a sailor can only reserve up to two boats on any given date.

6

Relational Algebra

7

Selection and Projection• Selection Operator: σrating>8 (S2)Retrieves from the current instance of relation named S2 those rows

where the value of the attribute ‘rating’ is greater than 8.Applying the above selection operator to the sample instance of S2

shown in figure 4.2 yields the relational instance on figure 4.4 as shown below:

• π

condition

8

Projection Operator πsname,rating(S2)Retrieves from the current instance of the relation named S2 those

columns whose names are ‘sname’ and ‘rating’.

Applying the above operator to the sample instance of S2 shown in figure 4.2 yields the relational instance on figure 4.5 as shown below:

N. B.: Note that the projection operator can produce duplicate rows in the resulting instance.

9

Remark Concerning Duplicate Rows

• Duplicate rows are not permitted in relational algebra and calculus.

• Duplicate rows can occur in SQL, though they may be controlled by explicit keywords.

10

- Projection Operator (cont’d)

Similarly πage(S2) yields the following relational instance

11

FUNCTIONAL APPLICATION OF SELECTION AND PROJECTION OPERATORS

The selection and projection operators can be applied successively as many times as desired in the usual functional denotation as illustrated below.

Thus the expression

πsname,rating(σrating>8(S2))

yields the following relational instance

12

SET OPERATIONSThe following set operations are also available in relational algebra:

• Union*

• Intersection*

• Set-difference*

• Cross-product

N.B.

(1) The asterisks indicate operations whose operand relations must be union-compatible. Two relation instances are said to be union-compatible if:

- they have the same number of fields,

- corresponding fields have the same domains.

- they have the same semantics.

(2) The results of set operations on sets and multisets are different, therefore we shall examine both of these separately.

13

EXAMPLES OF SET OPERATIONS ON RELATIONS

• Union: Given the sample instances S1 and S2

The union of S1 and S2, i.e. S1 ∪ S2 is shown below

14

• Given the two sample relational instances

We can form:Intersection: S1∩S2

Set-Difference: S1 – S2

15

Given the two relational samples S1 and S2

We can form the Cross-product S1 R1:

16

MULTISET OPERATIONSExamples: Given two multisets

M1 = {a, a, b, b, c, d, f} M2 = {a, b, b, d, e, e}We can form

MU = M1 ∪ M2 = {a, a, a, b, b, b, b, c, d, d, d, f, e, e}MI = M1 ∩ M2 = {a, b, b, d}MD = M1 – M2 = {a, c, f}

MC= M1 M2 = {<a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>, <a, a>, <a, b>, <a, b>, <a, d>, <a, e>, <a, e>, <b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>, <b, a>, <b, b>, <b, b>, <b, d>, <b, e>,<b, e>, <c, a>, <c, b>, <c, b>, <c, d>, <c, e>, <c, e>, <d, a>, <d, b>, <d, b>, <e, d>, <d, e>, <d, e>, <f, a>, <f, b>, <f, b>, <f, d>, <f, e>, <f, e>}

17

RENAMING OPERATOR, Purpose: to avoid name conflicts and to permit naming

anonymous relational instances (like query answers).Example: the expression

(C(1 sid1, 5 sid2), S1 R1)Returns a relational instance that contains the tuples

shown in figure 4.11 and has the schemaC(sid1: integer, sname: string, rating: integer, age: real,

sid2: integer, bid: integer, day: date)

N.B. The second operand may be a simple relational instance name, thereby renaming that relational instance.

Instance being renamed(can be a name or an operation)Attributes being renamed

(1st and 5th)Name of new relation

18

SPECIAL RELATIONAL OPERATORS

The following operators are peculiar to relations:

- Join operators

There are several kind of join operators. We only consider three of these here (others will be considered when we discuss null values – see section 5.6.4):

- (1) Condition Joins

- (2) Equijoins

- (3) Natural Joins

- Division

19

JOIN OPERATORS Condition Joins: - Defined as a cross-product followed by a selection:

R ⋈c S = σc(R S) ( is called the bow-⋈tie)where c is the condition.

- Example:Given the sample relational instances S1 and R1

The condition join S ⋈S1.sid<R1.sid R1 yields

20

Equijoin:Special case of the condition join where the join condition consists solely of

equalities between two fields in R and S connected by the logical AND operator ( ).∧

Example: Given the two sample relational instances S1 and R1

The operator S1 R.sid=Ssid R1 yields

21

Natural Join

- Special case of equijoin where equalities are implicitly specified on all fields having the same name in R and S.

- The condition c is now left out, so that the “bow tie” operator by itself signifies a natural join.

- N. B. If the two relations have no attributes in common, the natural join is simply the cross-product.

22

DIVISION

- The division operator is used for queries which involve the ‘all’

qualifier such as “Find the names of sailors who have reserved all boats”.

- The division operator is a bit tricky to explain, and perhaps best approached through examples as will be done here.

23

EXAMPLES OF DIVISION Consider the relational instances A, B1, B2, and B3. The division operation A/Bi yields the three relational instances on the right, which are constructed as follows: search among A for all those rows whose A.pno are the same as all the B.pno, and whose A.sno is the same. The corres- ponding A.sno then belong in the

answer relation.

Thus, B1 contains only p2, so we find in A rows 2, 6, 7, 8 for which A.pno = p2. Therefore we must include in the answer (A/B1) the A.sno values in rows 2, 6, 7, 8. Next, B2 contains p2, p4. So we locate in A rows 2 and 4 which contain respectively p2 and p4 and which have the same value of A.sno (namely s1); then we locate in A rows 8, 9 which again contain respectively p2 and p4 with the same value of A.sno (namely s4); so we place in the answer s1 and s4. Next, B3 contains p1, p2, p4; so we search in A for three rows with the same sno, containing respectively p1, p2, p4. We find only rows 1, 2, 4 for which A.sno is s1. So s1 goes in the answer.

Figure 4.14 Examples Illustrating Division

24

DIVISIONInterpretation of the division operation A/B:

- Divide the attributes of A into 2 sets: A1 and A2.

- Divide the attributes of B into 2 sets: B2 and B3.

- Where the sets A2 and B2 have the same attributes.

- For each set of values in B2:

- Search in A2 for the sets of rows (having the same A1 values) whose A2 values (taken together) form a set which is the same as the set of B2’s.

- For all the set of rows in A which satisfy the above search, pick out their A1 values and put them in the answer.

25

DIVISIONExample: Find the names of sailors who have reserved all boats:

(1) A = sid,bid(Reserves). A1 = sid(Reserves) A2 = bid(Reserves)

(2) B2 = bid(Boats) B3 is the rest of B.

Thus, B2 ={101, 102, 103, 104}

(3) Find the rows of A such that their A.sid is the same and their combined A.bid is the set B2.

Thus we find A1 = {22}

(4) Get the set of A2 corresponding to A1: A2 = {Dustin}

26

FORMAL DEFINITION OF DIVISION

The formal definition of division is as follows:

A/B = x(A) - x((x(A) B) – A)

27

EXAMPLES OF ALGEBRA QUERIESIn the rest of this chapter we shall illustrate queries using the

following new instances S3 of sailors, R2 of Reserves and B1 of boats.

28

QUERY Q1Given the relational instances:

(Q1) Find the names of sailors who have reserved boat 103

sname((σbid=103 Reserves) ⋈ Sailors)

The answer is thus the following relational instance

{<Dustin>, <Lubber>, <Horatio>}

29

QUERY Q1 (cont’d)There are of course several ways to express Q1 in relational algebra.

Here is another:

sname(σbid=103(Reserves Sailors))⋈ Sailors))

Which of these expressions should we use?

That is a question of optimization. Indeed, when we describe how to state queries in SQL, we can leave it to the optimizer in the DBMS to select the nest approach.

30

QUERY Q2(Q2) Find the names of sailors who have reserved a red boat.

sname((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors

31

QUERY Q3(Q3) Find the colors of boats reserved by Lubber.

color((σsname=‘Lubber’Sailors) ⋈ Reserves ⋈ Boats

32

QUERY Q4(Q4) Find the names of Sailors who have reserved at least one boat

sname(Sailors ⋈ Reserves)

33

QUERY Q5 (Q5) Find the names of sailors who have reserved a red or a green boat.

(Tempboats, (σcolor=‘red’) ∪ (σcolor=‘green’Boats))

sname(Tempboats ⋈ Reserves ⋈ Sailors)

34

QUERY Q6(Q6) Find the names of Sailors who have reserved a red and a green

boat.It seems tempting to use the expression used in Q5, replacing simply

by . However, this won’t work, for such an expression is requesting the names of sailors who have requested a boat that is both red and green! The correct expression is as follows:

(Tempred, sid((σcolor=‘red’Boats) ⋈ Reserves))

(Tempgreen, sid((σcolor=‘green’Boats) ⋈ Reserves))

sname ((Tempred ∩ Tempgreen) ⋈ Sailors)

35

QUERY Q7

(Q7) Find the names of sailors who have reserved at least two boats.

(Reservations, sid,sname,bid(Sailors ⋈ Reserves))

(Reservationpairs(1sid1, 2sname, 3bid1, 4sid2,

5sname, 6bid2), ReservationsReservations)

sname1σ(sid1=sid2)(bid1bid2)Reservationpairs)

36

QUERY 8(Q8) Find the sids of sailors with age over 20 who have not reserved a

red boat.

sid(σage>20Sailors) - sid((σcolor=‘red’Boats) ⋈ Reserves ⋈ Sailors)

37

QUERY 9(Q) Find the names of sailors who have reserved all boats.

(Tempsids, (sid,bidReserves)/(bidBoats))

sname(Tempsids ⋈ Sailors

38

QUERY Q10(Q10) Find the names of sailors who have reserved all boats called

Interlake.

(Tempsids, (sid,bidReserves)/(bid(σbname=‘Interlake’Boats)))

sname(Tempsids ⋈ Sailors)

39

RELATIONAL CALCULUS

40

Introduction

- Relational algebra is procedural: it specifies the procedure to be followed in order to get the answer to the query.

- Relational calculus is declarative: it describes (declares) the answer to the query without specifying how to get it.

- Relational calculus strongly resembles First Order Predicate Logic, or simply first order logic.

- There are two variants of relational calculus:

- Tuple relational calculus (TRC)

- Domain relational calculus (DRC)

41

TUPLE RELATIONAL CALCULUS

- A query statement in TRC is a set declaration having the form:

{ P first-order logic formula}

- This is to be read as ‘the set of tuple variables, P, for which the specified first order logic formula is true’.

- Thus a TRC query is a request (to the DBMS) to produce a set of tuples corresponding to the tuples of the relational answer in SQL.

- Example

Given the following query:

(Q11) Find all sailors with a rating above 7.

The TRC statement of this query is

{S S Sailors S.rating > 7}.

42

SYNTAX AND SEMANTICS OF TRC

• The syntax and semantics of TRC is that of first-order logic. It is stated quite precisely in the text and there is no need to repeat it here. Instead we shall examine a few query applications.

43

QUERY Q12(Q12) Find the names and ages of sailors with a rating above 7.

{P S ∃ Sailors (S.rating > 7 P.name = S.name P.age = S.age)}

Remarks

1. The fact that the tuple variable P occurs with two attributes (using the dot notation) means that solely these two attributes are required in the answer relation.

2. The symbols used are the usual first-order logic symbols:

∀: for all : there exists ∃ ⋀ : and : or ¬ : not : implies⋁ ⇒

44

QUERIES 1,2,7,9,14

The TRC statements for these queries are pretty well self explanatory, especially with the added English statements of how to read them.

45

DOMAIN RELATIONAL CALCULUS (1)

- The form of a DRC query is as follows:

{<X1, X2, … , Xn> logical DRC formula}

signifying that the system must construct (and output) a set of all the tuples which satisfy the stated logical DRC formula in terms of the n attributes X1, X2, … ,Xn. Thus, the answer is a relational instance with attributes X1, X2, … , Xn, these attributes corresponding to those of some of the relations in the database.

- Again, the approach used by the system is left unspecified.

- The Syntax and the semantics of the DRC are explicitly and precisely described in the text.

46

DOMAIN RELATIONAL CALCULUS (2)

Example:

(Q11) Find all sailors with a rating above 7.

{ < I, N, T, A > <I, N, T, A > ∈ Sailors ⋀ T > 7 }

Other queries are illustrated and described in the text with all necessary explanation.

47

EXPRESSIVE POWER OF ALGEBRA AND CALCULUS( 1)

Safety- Certain queries stated in the relational calculus may lead to

answers which contain an infinite number of tuples (or at least as many as the system can handle).

Example:

Consider the TRC query {S ¬(S ∈ Sailors)}. Since there is a quasi-infinite number of tuples that can be created with the attributes of sailors, the answer is (quasi)-infinite.

- A query which yields a (quasi)-infinite answer is said to be unsafe, and, of course, should not be allowed by the system.

- It is possible to define a safe formula in TRC (see text, section 4.4).

48

EXPRESSIVE POWER OF ALGEBRA AND CALCULUS (2)

• A query language is said to be relationally complete if it can express all the queries that can be expressed in relational algebra.

• SQL is relationally complete.

• Every query that can be expressed using a safe relational calculus query can be also be expressed as a relational algebra query.

• SQL provides additional expressive power beyond relational algebra.

49

QUERY Q1