1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

28
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra

Transcript of 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

Page 1: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

1

CS 430Database Theory

Winter 2005

Lecture 5: Relational Algebra

Page 2: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

2

What is the Relational Algebra? Answer: A collection of operations that can

be applied to Relations yielding new Relations

What’s the idea behind the Relational Algebra? Define a complete universe of operations on

relations Define notion of Relationally Complete: A system

that can do anything that can be done with the Relational Algebra

Page 3: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

3

What are the Operations?

Original Operations (as defined by Codd): SELECT or RESTRICT() PROJECT RENAME Set Operations

UNION, INTERSECTION, and MINUS or DIFFERENCE CARTESIAN PRODUCT Joins

JOIN or THETA JOIN, EQUIJOIN, NATURAL JOIN DIVISION

Page 4: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

4

What are the Operations?

Additional Operations: AGGREGATE OUTER JOIN (and OUTER UNION) EXTEND (not in book) Recursive Closure

Page 5: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

5

SELECT

<selection condition>(R) <selection condition> is a predicate (Boolean

condition) on the attributes of the relation R Result is a relation with just those tuples of R that

satisfy <selection condition> Examples:

(DNO = 5 AND SALARY > 30000)(EMPLOYEE)

Page 6: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

6

Notes for SELECT

Booleans AND, OR, NOT have usual interpretation

<cond1>(<cond2>(R))

= <cond2> ( <cond1>(R))

= (<cond1> AND <cond2>)(R)

Page 7: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

7

PROJECT

<attribute list>(R) <attribute list> is a list of some subset of the

attributes of R Result is a relation with only those columns

named in the attribute list Order of columns is as given in the attribute list

Examples: <LNAME, FNAME, SALARY>(EMPLOYEE)

<SEX, SALARY>(EMPLOYEE)

Page 8: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

8

Notes for PROJECT

Duplicates are eliminated The number of rows after a projection is always

less than or equal to the number of rows in the original relation

<List1>(<List2>(R)) = <List1>(R)

Page 9: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

9

Sequences of Relational Operations and RENAME R1 Relational Expression

Defines an intermediate relation R1 Columns named are determined by the expression

R1(A1, … , An) Relational Expression Columns are named A1, … , An

Book defines RENAME operation: S(B1, … , Bn)(R), or (B1, … , Bn)(R), or S(R)

Page 10: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

10

Set Theory

A relation is a set of tuples Two relations R(A1,…, An) and S(B1,…, Bn) are

union compatible if dom(Ai) = dom(Bi) for all I Concept is that the tuples of R and S have the same type

If two relations are union compatible, we can define their UNION(), INTERSECTION(), and DIFFERENCE (MINUS, -)

Attribute Names are determined by attribute names of the first relation

Page 11: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

11

More Set Theory

Usual Set Theory identities hold (possibly with appropriate attribute renaming):

R S = S R, (R S) T = R (S T) R S = S R, (R S) T = R (S T) R - (S T) = (R - S) (R - T) R - (S T) = (R - S) (R - T)

Page 12: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

12

CARTESIAN PRODUCT

Given R(A1,…, Am) and S(B1,…, Bn) the Cartesian Product R S is the table with attributes (A1,…, Am, B1,…, Bn) and one row for every combination of a row in R and a row in S This assumes that the Ai and Bj are distinct

Page 13: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

13

Example

Get all female employees who have dependents, together with their dependent’s names:

FEMALE_EMPS <SEX = ‘F’>(EMPLOYEE)

EMPNAMES <FNAME,LNAME,SSN>(FEMALE_EMPS)EMP_DEPENDENTS EMPNAMES DEPENDENT

ACTUAL_DEPENDENTS <SSN=ESSN>(EMP_DEPENDENTS)

RESULT <FNAME,LNAME,DEPENDENT_NAME>(ACTUAL_DEPENDENTS)

See Figure 6.5, Text Book

Page 14: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

14

Joins

Join two tables Generalization of Cartesian Product

JOIN(R, S, <join condition>) Same as SELECT(<join condition>, R S) <join condition> usually has form

<cond1> and <cond2> and … and <condn> <condi> is of form Ai Bj

is a comparison operator

This general kind of join is called a -JOIN (THETA JOIN)

Page 15: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

15

More types of Joins

EQUIJOIN: -JOIN where all comparisons are for equality (=) Note: EQUIJOIN has redundant attributes

NATURAL JOIN Standard Definition: EQUIJOIN with same named

attributes, eliminating redundant attributes Non-standard: include renaming of attributes

Notation: R*S Examples:

PROJ_DEPT PROJECT * DEPARTMENT DEPT_LOCS DEPARTMENT * DEPT_LOCATIONS

Page 16: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

16

Division

Used for universal quantification E.g. Find all employees that work on all projects that …

Given relations R(X), S(Y) with X YLet Z = X -Y, that is Z is the set of attributes of R that are not attributes of S

T(Z) is the set of all tuples tT such that for every tS in S there is a tuple tR in R such that tR[Z] = tT and tR[Y] = tS

Alternately, T is the biggest table such that T S R Written as T R S

Page 17: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

17

Picture of Division

R A B

a1 b1

a2 b1

a3 b1

a4 b1

a1 b2

a3 b2

a2 b3

a3 b3

a4 b3

a1 b4

a2 b4

a3 b4

S A

a1

a2

a3

T B

b1

b4

T R S

Page 18: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

18

Minimum Set of Operations

We have more operations than we (minimally) need

Examples: Join can be defined using (Cartesian product)

and (selection) Divide:

T1 Z(R)

T2 Z((S T1) - R) T T1 - T2

Page 19: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

19

Aggregation and Grouping

Aggregation or Summarization Functions: SUM, AVERAGE, MIN, MAX, COUNT, and others

Grouping of tuples Group all tuples that have the same value in some

subset of the columns E.g. group all employees in the same department

Aggregation and Grouping cannot be expressed with the prior set of operations

Page 20: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

20

Aggregate Function Operation AGGREGATE(<grouping attributes>, <function list>,

R) <function list> is list of <function> <attribute> pairs

<function> is an aggregation function <attribute> is an attribute of R

<grouping attributes> is a list of attributes that group the tuples of R

The result is a relation with one attribute for each grouping attribute plus one attribute for each function

Book notation:

<grouping attributes><function list>(R)

Page 21: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

21

Example:

Get Number of Employees and Average Salary by Department AGGREGATE(

DNO,

COUNT SSN, AVERAGE SALARY,

EMPLOYEE)

Page 22: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

22

Notes on Aggregation

Duplicates are not eliminated before applying the aggregation function This gives functions like SUM and AVERAGE

their normal interpretation The result of aggregation is a relation, even if

it consists of a single value E.g. get the average salary:

AGGREGATE( , AVERAGE SALARY, EMPLOYEE)Yields a table with one tuple with one attribute

Page 23: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

23

Outer Join

A JOIN eliminates tuples in one table that have no match in the other table Example: Natural Join (R*S) Tuples with NULL join attributes are also eliminated

An OUTER JOIN keeps unmatched tuples in either R, S or both Additional attributes are padded with null attributes LEFT (RIGHT) OUTER JOINs keep the unmatched tuple in

the first (second) table being joined

Page 24: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

24

Outer Join

Example: DEPARTMENT (LEFT OUTER JOIN) DEPT_LOCATIONS

would preserve departments that had no associated location

Notes: An OUTER JOIN can (almost) be constructed

from the original operations It’s the union of the standard join and the unmatched

rows extended with nulls

Page 25: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

25

Outer Union

Union of two relations which are not union compatible

Outer Union of R(X, Y) and S(X, Z)

is T(X, Y, Z) Tuples are matched if the common attributes

match

Page 26: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

26

EXTEND

Extend a table with additional attributes EXTEND(R, <attribute name>, <expression>)

Add a column to R with name <attribute name> and value <expression>

<expression> is an expression using the attributes of R EXTEND is not expressible using the original

operations EXTEND provides a mechanism for performing

arithmetic using attributes that is otherwise missing Could be expressed as a join if our Universe contained the

appropriate (infinite) relations containing results of computations

Page 27: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

27

Recursive Closure

Examples: Find all employees who work for (either directly or

indirectly) a specific manager Find all the constituent parts of a given part

Including parts of subassemblies, etc. etc.

Relational Algebra can express any fixed depth of recursion

The SQL3 standard includes a syntax for recursive closure No standard syntax as part of the relational algebra

Page 28: 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

28

Examples of Relational Algebra See Examples Section 6.5 of Text Book