1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

Post on 21-Jan-2016

214 views 0 download

Transcript of 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

1

CS 430Database Theory

Winter 2005

Lecture 5: Relational Algebra

2

What is the Relational Algebra? Answer: A collection of operations that can

be applied to Relations yielding new Relations

What’s the idea behind the Relational Algebra? Define a complete universe of operations on

relations Define notion of Relationally Complete: A system

that can do anything that can be done with the Relational Algebra

3

What are the Operations?

Original Operations (as defined by Codd): SELECT or RESTRICT() PROJECT RENAME Set Operations

UNION, INTERSECTION, and MINUS or DIFFERENCE CARTESIAN PRODUCT Joins

JOIN or THETA JOIN, EQUIJOIN, NATURAL JOIN DIVISION

4

What are the Operations?

Additional Operations: AGGREGATE OUTER JOIN (and OUTER UNION) EXTEND (not in book) Recursive Closure

5

SELECT

<selection condition>(R) <selection condition> is a predicate (Boolean

condition) on the attributes of the relation R Result is a relation with just those tuples of R that

satisfy <selection condition> Examples:

(DNO = 5 AND SALARY > 30000)(EMPLOYEE)

6

Notes for SELECT

Booleans AND, OR, NOT have usual interpretation

<cond1>(<cond2>(R))

= <cond2> ( <cond1>(R))

= (<cond1> AND <cond2>)(R)

7

PROJECT

<attribute list>(R) <attribute list> is a list of some subset of the

attributes of R Result is a relation with only those columns

named in the attribute list Order of columns is as given in the attribute list

Examples: <LNAME, FNAME, SALARY>(EMPLOYEE)

<SEX, SALARY>(EMPLOYEE)

8

Notes for PROJECT

Duplicates are eliminated The number of rows after a projection is always

less than or equal to the number of rows in the original relation

<List1>(<List2>(R)) = <List1>(R)

9

Sequences of Relational Operations and RENAME R1 Relational Expression

Defines an intermediate relation R1 Columns named are determined by the expression

R1(A1, … , An) Relational Expression Columns are named A1, … , An

Book defines RENAME operation: S(B1, … , Bn)(R), or (B1, … , Bn)(R), or S(R)

10

Set Theory

A relation is a set of tuples Two relations R(A1,…, An) and S(B1,…, Bn) are

union compatible if dom(Ai) = dom(Bi) for all I Concept is that the tuples of R and S have the same type

If two relations are union compatible, we can define their UNION(), INTERSECTION(), and DIFFERENCE (MINUS, -)

Attribute Names are determined by attribute names of the first relation

11

More Set Theory

Usual Set Theory identities hold (possibly with appropriate attribute renaming):

R S = S R, (R S) T = R (S T) R S = S R, (R S) T = R (S T) R - (S T) = (R - S) (R - T) R - (S T) = (R - S) (R - T)

12

CARTESIAN PRODUCT

Given R(A1,…, Am) and S(B1,…, Bn) the Cartesian Product R S is the table with attributes (A1,…, Am, B1,…, Bn) and one row for every combination of a row in R and a row in S This assumes that the Ai and Bj are distinct

13

Example

Get all female employees who have dependents, together with their dependent’s names:

FEMALE_EMPS <SEX = ‘F’>(EMPLOYEE)

EMPNAMES <FNAME,LNAME,SSN>(FEMALE_EMPS)EMP_DEPENDENTS EMPNAMES DEPENDENT

ACTUAL_DEPENDENTS <SSN=ESSN>(EMP_DEPENDENTS)

RESULT <FNAME,LNAME,DEPENDENT_NAME>(ACTUAL_DEPENDENTS)

See Figure 6.5, Text Book

14

Joins

Join two tables Generalization of Cartesian Product

JOIN(R, S, <join condition>) Same as SELECT(<join condition>, R S) <join condition> usually has form

<cond1> and <cond2> and … and <condn> <condi> is of form Ai Bj

is a comparison operator

This general kind of join is called a -JOIN (THETA JOIN)

15

More types of Joins

EQUIJOIN: -JOIN where all comparisons are for equality (=) Note: EQUIJOIN has redundant attributes

NATURAL JOIN Standard Definition: EQUIJOIN with same named

attributes, eliminating redundant attributes Non-standard: include renaming of attributes

Notation: R*S Examples:

PROJ_DEPT PROJECT * DEPARTMENT DEPT_LOCS DEPARTMENT * DEPT_LOCATIONS

16

Division

Used for universal quantification E.g. Find all employees that work on all projects that …

Given relations R(X), S(Y) with X YLet Z = X -Y, that is Z is the set of attributes of R that are not attributes of S

T(Z) is the set of all tuples tT such that for every tS in S there is a tuple tR in R such that tR[Z] = tT and tR[Y] = tS

Alternately, T is the biggest table such that T S R Written as T R S

17

Picture of Division

R A B

a1 b1

a2 b1

a3 b1

a4 b1

a1 b2

a3 b2

a2 b3

a3 b3

a4 b3

a1 b4

a2 b4

a3 b4

S A

a1

a2

a3

T B

b1

b4

T R S

18

Minimum Set of Operations

We have more operations than we (minimally) need

Examples: Join can be defined using (Cartesian product)

and (selection) Divide:

T1 Z(R)

T2 Z((S T1) - R) T T1 - T2

19

Aggregation and Grouping

Aggregation or Summarization Functions: SUM, AVERAGE, MIN, MAX, COUNT, and others

Grouping of tuples Group all tuples that have the same value in some

subset of the columns E.g. group all employees in the same department

Aggregation and Grouping cannot be expressed with the prior set of operations

20

Aggregate Function Operation AGGREGATE(<grouping attributes>, <function list>,

R) <function list> is list of <function> <attribute> pairs

<function> is an aggregation function <attribute> is an attribute of R

<grouping attributes> is a list of attributes that group the tuples of R

The result is a relation with one attribute for each grouping attribute plus one attribute for each function

Book notation:

<grouping attributes><function list>(R)

21

Example:

Get Number of Employees and Average Salary by Department AGGREGATE(

DNO,

COUNT SSN, AVERAGE SALARY,

EMPLOYEE)

22

Notes on Aggregation

Duplicates are not eliminated before applying the aggregation function This gives functions like SUM and AVERAGE

their normal interpretation The result of aggregation is a relation, even if

it consists of a single value E.g. get the average salary:

AGGREGATE( , AVERAGE SALARY, EMPLOYEE)Yields a table with one tuple with one attribute

23

Outer Join

A JOIN eliminates tuples in one table that have no match in the other table Example: Natural Join (R*S) Tuples with NULL join attributes are also eliminated

An OUTER JOIN keeps unmatched tuples in either R, S or both Additional attributes are padded with null attributes LEFT (RIGHT) OUTER JOINs keep the unmatched tuple in

the first (second) table being joined

24

Outer Join

Example: DEPARTMENT (LEFT OUTER JOIN) DEPT_LOCATIONS

would preserve departments that had no associated location

Notes: An OUTER JOIN can (almost) be constructed

from the original operations It’s the union of the standard join and the unmatched

rows extended with nulls

25

Outer Union

Union of two relations which are not union compatible

Outer Union of R(X, Y) and S(X, Z)

is T(X, Y, Z) Tuples are matched if the common attributes

match

26

EXTEND

Extend a table with additional attributes EXTEND(R, <attribute name>, <expression>)

Add a column to R with name <attribute name> and value <expression>

<expression> is an expression using the attributes of R EXTEND is not expressible using the original

operations EXTEND provides a mechanism for performing

arithmetic using attributes that is otherwise missing Could be expressed as a join if our Universe contained the

appropriate (infinite) relations containing results of computations

27

Recursive Closure

Examples: Find all employees who work for (either directly or

indirectly) a specific manager Find all the constituent parts of a given part

Including parts of subassemblies, etc. etc.

Relational Algebra can express any fixed depth of recursion

The SQL3 standard includes a syntax for recursive closure No standard syntax as part of the relational algebra

28

Examples of Relational Algebra See Examples Section 6.5 of Text Book