1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
-
Upload
blaise-hodges -
Category
Documents
-
view
214 -
download
0
Transcript of 1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
1
CS 430Database Theory
Winter 2005
Lecture 5: Relational Algebra
2
What is the Relational Algebra? Answer: A collection of operations that can
be applied to Relations yielding new Relations
What’s the idea behind the Relational Algebra? Define a complete universe of operations on
relations Define notion of Relationally Complete: A system
that can do anything that can be done with the Relational Algebra
3
What are the Operations?
Original Operations (as defined by Codd): SELECT or RESTRICT() PROJECT RENAME Set Operations
UNION, INTERSECTION, and MINUS or DIFFERENCE CARTESIAN PRODUCT Joins
JOIN or THETA JOIN, EQUIJOIN, NATURAL JOIN DIVISION
4
What are the Operations?
Additional Operations: AGGREGATE OUTER JOIN (and OUTER UNION) EXTEND (not in book) Recursive Closure
5
SELECT
<selection condition>(R) <selection condition> is a predicate (Boolean
condition) on the attributes of the relation R Result is a relation with just those tuples of R that
satisfy <selection condition> Examples:
(DNO = 5 AND SALARY > 30000)(EMPLOYEE)
6
Notes for SELECT
Booleans AND, OR, NOT have usual interpretation
<cond1>(<cond2>(R))
= <cond2> ( <cond1>(R))
= (<cond1> AND <cond2>)(R)
7
PROJECT
<attribute list>(R) <attribute list> is a list of some subset of the
attributes of R Result is a relation with only those columns
named in the attribute list Order of columns is as given in the attribute list
Examples: <LNAME, FNAME, SALARY>(EMPLOYEE)
<SEX, SALARY>(EMPLOYEE)
8
Notes for PROJECT
Duplicates are eliminated The number of rows after a projection is always
less than or equal to the number of rows in the original relation
<List1>(<List2>(R)) = <List1>(R)
9
Sequences of Relational Operations and RENAME R1 Relational Expression
Defines an intermediate relation R1 Columns named are determined by the expression
R1(A1, … , An) Relational Expression Columns are named A1, … , An
Book defines RENAME operation: S(B1, … , Bn)(R), or (B1, … , Bn)(R), or S(R)
10
Set Theory
A relation is a set of tuples Two relations R(A1,…, An) and S(B1,…, Bn) are
union compatible if dom(Ai) = dom(Bi) for all I Concept is that the tuples of R and S have the same type
If two relations are union compatible, we can define their UNION(), INTERSECTION(), and DIFFERENCE (MINUS, -)
Attribute Names are determined by attribute names of the first relation
11
More Set Theory
Usual Set Theory identities hold (possibly with appropriate attribute renaming):
R S = S R, (R S) T = R (S T) R S = S R, (R S) T = R (S T) R - (S T) = (R - S) (R - T) R - (S T) = (R - S) (R - T)
12
CARTESIAN PRODUCT
Given R(A1,…, Am) and S(B1,…, Bn) the Cartesian Product R S is the table with attributes (A1,…, Am, B1,…, Bn) and one row for every combination of a row in R and a row in S This assumes that the Ai and Bj are distinct
13
Example
Get all female employees who have dependents, together with their dependent’s names:
FEMALE_EMPS <SEX = ‘F’>(EMPLOYEE)
EMPNAMES <FNAME,LNAME,SSN>(FEMALE_EMPS)EMP_DEPENDENTS EMPNAMES DEPENDENT
ACTUAL_DEPENDENTS <SSN=ESSN>(EMP_DEPENDENTS)
RESULT <FNAME,LNAME,DEPENDENT_NAME>(ACTUAL_DEPENDENTS)
See Figure 6.5, Text Book
14
Joins
Join two tables Generalization of Cartesian Product
JOIN(R, S, <join condition>) Same as SELECT(<join condition>, R S) <join condition> usually has form
<cond1> and <cond2> and … and <condn> <condi> is of form Ai Bj
is a comparison operator
This general kind of join is called a -JOIN (THETA JOIN)
15
More types of Joins
EQUIJOIN: -JOIN where all comparisons are for equality (=) Note: EQUIJOIN has redundant attributes
NATURAL JOIN Standard Definition: EQUIJOIN with same named
attributes, eliminating redundant attributes Non-standard: include renaming of attributes
Notation: R*S Examples:
PROJ_DEPT PROJECT * DEPARTMENT DEPT_LOCS DEPARTMENT * DEPT_LOCATIONS
16
Division
Used for universal quantification E.g. Find all employees that work on all projects that …
Given relations R(X), S(Y) with X YLet Z = X -Y, that is Z is the set of attributes of R that are not attributes of S
T(Z) is the set of all tuples tT such that for every tS in S there is a tuple tR in R such that tR[Z] = tT and tR[Y] = tS
Alternately, T is the biggest table such that T S R Written as T R S
17
Picture of Division
R A B
a1 b1
a2 b1
a3 b1
a4 b1
a1 b2
a3 b2
a2 b3
a3 b3
a4 b3
a1 b4
a2 b4
a3 b4
S A
a1
a2
a3
T B
b1
b4
T R S
18
Minimum Set of Operations
We have more operations than we (minimally) need
Examples: Join can be defined using (Cartesian product)
and (selection) Divide:
T1 Z(R)
T2 Z((S T1) - R) T T1 - T2
19
Aggregation and Grouping
Aggregation or Summarization Functions: SUM, AVERAGE, MIN, MAX, COUNT, and others
Grouping of tuples Group all tuples that have the same value in some
subset of the columns E.g. group all employees in the same department
Aggregation and Grouping cannot be expressed with the prior set of operations
20
Aggregate Function Operation AGGREGATE(<grouping attributes>, <function list>,
R) <function list> is list of <function> <attribute> pairs
<function> is an aggregation function <attribute> is an attribute of R
<grouping attributes> is a list of attributes that group the tuples of R
The result is a relation with one attribute for each grouping attribute plus one attribute for each function
Book notation:
<grouping attributes><function list>(R)
21
Example:
Get Number of Employees and Average Salary by Department AGGREGATE(
DNO,
COUNT SSN, AVERAGE SALARY,
EMPLOYEE)
22
Notes on Aggregation
Duplicates are not eliminated before applying the aggregation function This gives functions like SUM and AVERAGE
their normal interpretation The result of aggregation is a relation, even if
it consists of a single value E.g. get the average salary:
AGGREGATE( , AVERAGE SALARY, EMPLOYEE)Yields a table with one tuple with one attribute
23
Outer Join
A JOIN eliminates tuples in one table that have no match in the other table Example: Natural Join (R*S) Tuples with NULL join attributes are also eliminated
An OUTER JOIN keeps unmatched tuples in either R, S or both Additional attributes are padded with null attributes LEFT (RIGHT) OUTER JOINs keep the unmatched tuple in
the first (second) table being joined
24
Outer Join
Example: DEPARTMENT (LEFT OUTER JOIN) DEPT_LOCATIONS
would preserve departments that had no associated location
Notes: An OUTER JOIN can (almost) be constructed
from the original operations It’s the union of the standard join and the unmatched
rows extended with nulls
25
Outer Union
Union of two relations which are not union compatible
Outer Union of R(X, Y) and S(X, Z)
is T(X, Y, Z) Tuples are matched if the common attributes
match
26
EXTEND
Extend a table with additional attributes EXTEND(R, <attribute name>, <expression>)
Add a column to R with name <attribute name> and value <expression>
<expression> is an expression using the attributes of R EXTEND is not expressible using the original
operations EXTEND provides a mechanism for performing
arithmetic using attributes that is otherwise missing Could be expressed as a join if our Universe contained the
appropriate (infinite) relations containing results of computations
27
Recursive Closure
Examples: Find all employees who work for (either directly or
indirectly) a specific manager Find all the constituent parts of a given part
Including parts of subassemblies, etc. etc.
Relational Algebra can express any fixed depth of recursion
The SQL3 standard includes a syntax for recursive closure No standard syntax as part of the relational algebra
28
Examples of Relational Algebra See Examples Section 6.5 of Text Book