RELATIONAL ALGEBRA

Post on 31-Dec-2015

18 views 0 download

description

CS157A. Lecture 8. RELATIONAL ALGEBRA. Prof. Sin-Min LEE Department of Computer Science. Relation schema Named relation defined by a set of attribute and domain name pairs. Relational database schema Set of relation schemas, each with a distinct name. - PowerPoint PPT Presentation

Transcript of RELATIONAL ALGEBRA

RELATIONAL ALGEBRA

Prof. Sin-Min LEE

Department of Computer Science

•Relation schema–Named relation defined by a set of attribute and domain name pairs.

•Relational database schema–Set of relation schemas, each with a distinct name.

•Each tuple is distinct; there are no duplicate tuples.•Order of attributes has no significance.•Order of tuples has no significance, theoretically.•Relation name is distinct from all other relation names in relational schema.•Each cell of relation contains exactly one atomic (single) value.•Each attribute has a distinct name.•Values of an attribute are all from the same domain.•Each tuple is distinct; there are no duplicate tuples.•Order of attributes has no significance.•Order of tuples has no significance, theoretically.

Database Scheme

A relational database scheme, or schema, corresponds to a set of table definitions.

Eg: product(p_id, name, category, description) supply(p_id, s_id, qnty_per_month) supplier(s_id, name, address, ph#)

* remember the difference between a DB instance and a DB scheme.

SAMPLE SCHEMAS AND INSTANCESThe Schemas:

Sailors(sid: integer, sname: string, rating: integer, age: real)

Boats(bid: integer, bname: string, color: string)Reserves(sid: integer, bid: integer, day: date)

The Instances:

What is Relational Algebra?

Relational algebra is a procedural query language.

It consists of the select, project, union, set difference, Cartesian product, and rename operations.

Set intersection, division, natural join, and assignment combine the fundamental operations.

SQL is based on relational algebra

•Relational algebra and relational calculus are formal languages associated with the relational model.•Both are equivalent to one another

It is an abstract language. We use it to express the set of operations that any relational query language must perform.

Two types of operations: 1.set-theoretic operations: tables are

essentially sets of rows 2.native relational operations: focus on the

structure of the rows Query languages are specialized languages for asking questions,or queries,that involve the data in database.

What are the query languages ?

Query languages

procedural vs. non-proceduralcommercial languages have some

of bothwe will study:

relational algebra (which is procedural, i.e. tells you how to process a query)

relational calculus (which is non-procedural i.e. tells what you want)

SEMANTICS OF THE SAMPLE RELATIONS

Sailors: Entity set; lists the relevant properties of sailors.

BoatsBoats: Entity set; lists the relevant properties of boats.

Reserves: Relationship set: links sailors and boats by describing the boat number and date for which a sailor made a reservation.Example of the declarative sentences for which rows stand:Row 1: “Sailor ’22’ reserved boat number ‘101’ on 10/10/98”.

Selection and ProjectionSelection Operator: σrating>8 (S2)Retrieves from the current instance of relation named S2

those rows where the value of the attribute ‘rating’ is greater than 8.

Applying the above selection operator to the sample instance of S2 shown in figure 4.2 yields the relational instance on figure 4.4 as shown below:

π

condition

Projection Operator πsname,rating(S2)Retrieves from the current instance of the relation

named S2 those columns whose names are ‘sname’ and ‘rating’.

Applying the above operator to the sample instance of S2 shown in figure 4.2 yields the relational instance on figure 4.5 as shown below:

N. B.: Note that the projection operator can produce duplicate rows in the resulting instance.

- Projection Operator (cont’d)

Similarly πage(S2) yields the following

relational instance

Note here the elimination of duplicates

SQL would yield

For πage (S2):

age

35.0

55.0

35.0

35.0

Introduction

one of the two formal query languages of the relational model

collection of operators for manipulating relations

Operators: two types of operators Set Operators: Union(),Intersection(),

Difference(-), Cartesian Product (x) New Operators: Select (), Project (),

Join (⋈)

Introduction – cont’d

A Relational Algebra Expression: a sequence of relational algebra operators and operands (relations), formed according to a set of rules.

The result of evaluating a relational algebra expression is a relation.

Selection

Denoted by c(R)

Selects the tuples (rows) from a relation R that satisfy a certain selection condition c.

It is a unary operatorThe resulting relation has the same

attributes as those in R.

Example 1:

SNO SNAME AGE STATE

S1 MIKE 21 IL

S2 STEVE 20 LA

S3 MARY 18 CA

S4 MING 19 NY

S5 OLGA 21 NY

S:

state=‘IL’(S)

Example 2:

CNOCNAME

CREDIT

DEPT

C1Database

3 CS

C2Statistics

3 MATH

C3 Tennis 1SPORTS

C4 Violin 4 MUSIC

C5 Golf 2SPORTS

C6 Piano 5 MUSIC

C:

CREDIT 3(C)

Example 3

SNO CNO Grade

S1 C1 90

S1 C2 80

S1 C3 75

S1 C4 70

S1 C5 100

S1 C6 60

S2 C1 90

S2 C2 80

S3 C2 90

S4 C2 80

S4 C4 85

S4 C5 100

E:

SNO=‘S1’and CNO=‘C1’(E)

Selection - Properties

Selection Operator is commutative C1(C2 (R)) = C2(C1 (R))

The Selection is an unary operator, it cannot be used to select tuples from more than one relations.

Projection

Denoted by L(R), where L is list of attribute names and R is a relation name or some other relational algebra expression.

The resulting relation has only those attributes of R specified in L.

The projection is also an unary operation.  Duplicate rows are not permitted in relational

algebra. Duplication is removed from the result.

Duplicate rows can occur in SQL, though they may be controlled by explicit keywords.

Projection - Example

Example 1: STATE (S)

SNO SNAME AGE STATE

S1 MIKE 21 IL

S2 STEVE 20 LA

S3 MARY 18 CA

S4 MING 19 NY

S5 OLGA 21 NY

STATE

IL

LA

CA

NY

Projection - Example

Example 2: CNAME, DEPT(C)

CNOCNAME

CREDIT

DEPT

C1Database

3 CS

C2Statistics

3 MATH

C3 Tennis 1SPORTS

C4 Violin 4 MUSIC

C5 Golf 2SPORTS

C6 Piano 5 MUSIC

CNAME

DEPT

Database

CS

Statistics

MATH

TennisSPORTS

Violin MUSIC

GolfSPORTS

Piano MUSIC

Projection - Example

Example 3: S#(STATE=‘NY'(S))

SNO SNAME AGE STATE

S1 MIKE 21 IL

S2 STEVE 20 LA

S3 MARY 18 CA

S4 MING 19 NY

S5 OLGA 21 NY

SNO

S4

S5

SET Operations

UNION: R1 R2

INTERSECTION: R1 R2

DIFFERENCE: R1 - R2

CARTESIAN PRODUCT: R1 R2

Union Compatibility

For operators , , -, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of the corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1,2,...,n.

The resulting relation for , , or - has the same attribute names as the first operand relation R1 (by convention).

Union Compatibility - Examples

Are S(SNO, SNAME, AGE, STATE) and C(CNO, CNAME, CREDIT, DEPT) union compatible?

Are S(S#, SNAME, AGE, STATE) and C(CNO, CNAME, CREDIT_HOURS, DEPT_NAME) union compatible?

UNION, SET DIFFERENCE & SET INTERSECT

Union puts all tuples of two relations in one relation. To use this operator, two conditions must hold:1. The two relations must be of the same arity.2. The domain of ith attribute of the two participating relation must

be the same.

Set difference operator computes tuples that are in one relation, but not in another.

Set intersect operator computes tuples that are common in two relations:

The five fundamental operations of the relational algebra are: select, project, cartesian product, Union, and set difference

All other operators can be constructed using these operators

EXAMPLE

Assume a database with the following three relations:Sailors (sid, sname, rating)Boats (bid, bname, color)Reserve (sid, bid, date)

Query 1: Find the bid of red colored boats:

EXAMPLE

Assume a database with the following three relations:Sailors (sid, sname, rating)Boats (bid, bname, color)Reserve (sid, bid, date)

Query 1: Find the bid of red colored boats: ∏bid(бcolor=red(Boats))

EXAMPLE

Assume a database with the following three relations:Sailors (sid, sname, rating)Boats (bid, bname, color)Reserve (sid, bid, date)

Query 1: Find the name of sailors who have reserved Boat number 2.

EXAMPLE

Assume a database with the following three relations:Sailors (sid, sname, rating)Boats (bid, bname, color)Reserve (sid, bid, date)

Query 1: Find the name of sailors who have reserved Boat number 2. ∏sname(бbid=2(Sailors (sid)Reserve))

EXAMPLE

Assume a database with the following three relations:Sailors (sid, sname, rating)Boats (bid, bname, color)Reserve (sid, bid, date)

Query 1: Find the name of sailors who have reserved both a red and a green boat.

Union, Intersection, Difference

T= R U S : A tuple t is in relation T if and only if t is in relation R or t is in relation S

T = R S: A tuple t is in relation T if and only if t is in both relations R and S

T= R - S :A tuple t is in relation T if and only if t is in R but not in S

Set-Intersection

Denoted by the symbol . Results in a relation that contains

only the tuples that appear in both relations.

R S = R – (R – S)Since set-intersection can be written

in terms of set-difference, it is not a fundamental operation.

Examples

A1 A2

1 Red

3 White

4 green

B1 B2

3 White

2 Blue

R S

Examples

A1 A2

1 Red

3 White

4 Green

2 Blue

A1 A2

3 White

R SR S

S - R

B1 B2

2 Blue

A1 A2

1 Red

4 Green

R - S

RENAME OPERATOR

Rename operator changes the name of its input table to its subscript, ρe2(Emp) Changes the name of Emp table to e2

RELATIONAL ALGEBRA INTRODUCTION

Assume the following two relations:Emp (SS#, name, age, salary, dno) Dept (dno, dname, floor, mgrSS#)

Relational algebra is a procedural query language, i.e., user must define both “how” and “what” to retrieve.

Relational algebra consists of a set of operators that consume either one or two relations as input. An operator produces one relation as its output.

Unary operators include: select, project, and rename

Binary operators include: cartesian product, equality join, natural join, join, semi-join, division, union, and set difference.

SELECT OPERATOR Select (б): selects tuples that satisfy a predicate; e.g., retrieve

the employees whose salary is 30,000 бSalary=30,000(Employee)

Conjunctive ( ) and disjunctive ( ) selection predicates are allowed;e.g., retrieve employees whose salary is higher than 30,000 and are younger than 25 years old:бSalary>30,000 age<25(Employee)

Note that only selection predicates are allowed. A selection predicate is either (1) a comparison (=, ≠, ≤, ≥, <, >) between an attribute and a constant (e.g., salary = 30,000) or (2) a comparison between two different attributes of the same relation (e.g., salary = age × 100).

Note: This operator is different than the SELECT command of SQL.

<

<<

EXAMPLE

Emp table:

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

EXAMPLEEmp table:

бSalary=30,000(Employee)

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

EXAMPLEEmp table:

бSalary=30,000(Employee)

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

SS# Name Age Salary dno

4 Kathy 30 30000 5

EXAMPLEEmp table:

бAge>22(Employee)

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

EXAMPLEEmp table:

бAge>22(Employee)

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

SS# Name Age Salary dno

1 Joe 24 20000 2

4 Kathy 30 30000 5

PROJECT OPERATOR

Project (∏) retrieves a column. It is a unary operator that eliminate duplicates.

e.g., name of employees: ∏ name(Employee)

e.g., name of employees earning more than 30,000:

∏ name(бSalary>30,000(Employee))

EXAMPLE

Emp table:

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

EXAMPLEEmp table:

∏ age(Emp)

SS# Name Age Salary

dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

Age

24

20

22

30

4

EXAMPLEEmp table:

∏ name,age(бSalary=4000 (Emp) )

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

EXAMPLEEmp table:

∏ name,age(бSalary=4000 (Emp) )

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

SS# Name Age Salary dno

5 Shideh 4 4000 1

EXAMPLEEmp table:

∏ name,age(бSalary=4000 (Emp) )

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

Name Age

Shideh 4

CARTESIAN PRODUCTCartesian Product (R1 × R2) combines two

relations by concatenating their tuples together, evaluating all possible combinations. If the name of a column is identical for two relations, this ambiguity is resolved by attaching the name of each relation to a column. e.g., Emp × Dept (SS#, name, age, salary, Emp.dno, Dept.dno, dname,

floor, mgrSS#)

If t(Emp) and t(Dept) is the cardinality of the Employee and Dept relations respectively, then the cardinality of Emp × Dept is: t(Emp) × t(Dept)

CARTESIAN PRODUCT (Cont…)Example:

Emp table:

Dept table:

345 John Doe 23 25,000 1943 Jane Java 25 28,000 2876 Joe SQL 22 32,000 1

SS# Name age salary dno

1 Toy 1 3452 Sho

e2 943

dno dname floor mgrSS#

CARTESIAN PRODUCT (Cont…) Cartesian product of Emp and Dept: Emp × Dept:

345

John Doe

23 25,000

1 1 Toy 1 345

943

Jane Java

25 28,000

2 1 Toy 1 345

876

Joe SQL 22 32,000

1 1 Toy 1 345

345

John Doe

23 25,000

1 2 Shoe 2 943

943

Jane Java

25 28,000

2 2 Shoe 2 943

876

Joe SQL 22 32,000

1 2 Shoe 2 943

SS# Name age salary Emp.dno Dept.dno dname floor mgrSS#

CARTESIAN PRODUCT

Example: retrieve the name of employees that work in the toy department:

CARTESIAN PRODUCT

Example: retrieve the name of employees that work in the toy department: ∏name(бEmp.dno=Dept.dno(Emp ×

бdname=‘toy’(Dept)))

CARTESIAN PRODUCT (Cont…)∏name(бdname=‘toy’ (б Emp.dno=Dept.dno(Emp × Dept)))

345 John Doe 23 25,000 1 1 Toy 1 345

943 Jane Java 25 28,000 2 1 Toy 1 345

876 Joe SQL 22 32,000 1 1 Toy 1 345

345 John Doe 23 25,000 1 2 Shoe 2 943

943 Jane Java 25 28,000 2 2 Shoe 2 943

876 Joe SQL 22 32,000 1 2 Shoe 2 943

SS# Name age salary Emp.dno Dept.dno dname floor mgrSS#

CARTESIAN PRODUCT (Cont…)∏name(бdname=‘toy’ (б Emp.dno=Dept.dno(Emp × Dept)))

345 John Doe

23 25,000 1 1 Toy 1 345

876 Joe SQL

22 32,000 1 1 Toy 1 345

943 Jane Java

25 28,000 2 2 Shoe 2 943

SS# Name age salary Emp.dno Dept.dno dname floor mgrSS#

CARTESIAN PRODUCT (Cont…) ∏name(бdname=‘toy’ (б Emp.dno=Dept.dno(Emp × Dept)))

345 John Doe

23 25,000 1 1 Toy 1 345

876 Joe SQL 22 32,000 1 1 Toy 1 345

SS# Name age salary Emp.dno Dept.dno dname floor mgrSS#

CARTESIAN PRODUCT (Cont…) ∏name(бdname=‘toy’ (б Emp.dno=Dept.dno(Emp × Dept)))

John Doe

Joe SQL

Name

•Join is a derivative of Cartesian product.

•Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations.

•One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.•Various forms of join operation

–Natural join (defined by Codd)–Outer join –Theta join–Equijoin (a particular type of Theta join)–Semijoin

EQUALITY JOIN, NATURAL JOIN, JOIN, SEMI-JOIN

Equality join connects tuples from two relations that match on certain attributes. The specified joining columns are kept in the resulting relation. ∏name(бdname=‘toy’(Emp Dept)))

Natural join connects tuples from two relations that match on the specified common attributes ∏name(бdname=‘toy’(Emp Dept)))

How is an equality join between Emp and Dept using dno different than a natural join between Emp and Dept using dno? Equality join: SS#, name, age, salary, Emp.dno, Dept.dno, … Natural join: SS#, name, age, salary, dno, dname, …

Join is similar to equality join using different comparison operators A S op = {=, ≠, ≤, ≥, <, >} att op att

(dno)

(dno)

EXAMPLE JOIN

Equality Join, (Emp Dept)))

SS# Name Age Salary

dno

1 Joe 24 20000 2

2 Mary 20 25000 1

3 Bob 22 27000 1

4 Kathy 30 30000 2

5 Shideh 4 4000 1EMP

dno dname

floor mgrss#

1 Toy 1 5

2 Shoe 2 1

Dept

(dno)

SS# Name Age Salary

EMP.dno

Dept.dno

dname

floor mgrss#

1 Joe 24 20000 2 2 Shoe 2 1

2 Mary 20 25000 1 1 Toy 1 5

3 Bob 22 27000 1 1 Toy 1 5

4 Kathy 30 30000 2 2 Shoe 2 1

5 Shideh 4 4000 1 1 Toy 1 5

EXAMPLE JOIN

Natural Join, (Emp Dept)))

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 1

3 Bob 22 27000 1

4 Kathy 30 30000 2

5 Shideh 4 4000 1

EMP

dno dname

floor mgrss#

1 Toy 1 5

2 Shoe 2 1

Dept

(dno)

SS# Name Age Salary dno dname floor mgrss#

1 Joe 24 20000 2 Shoe 2 1

2 Mary 20 25000 1 Toy 1 5

3 Bob 22 27000 1 Toy 1 5

4 Kathy 30 30000 2 Shoe 2 1

5 Shideh 4 4000 1 Toy 1 5

EXAMPLE JOIN

Join, (Emp ρx(Emp))))

SS# Name Age Salary

dno

1 Joe 24 20000 2

2 Mary 20 25000 1

3 Bob 22 27000 1

4 Kathy 30 30000 2

5 Shideh 4 4000 1EMP

dno dname

floor mgrss#

1 Toy 1 5

2 Shoe 2 1Dept

Salary > 5 * salary

SS#

Name

Age Salary

dno x.SS#

x.Name

x.Age

x.Salary

x.dno

2 Mary 20 25000 1 2 Shideh 4 4000 1

3 Bob 22 27000 1 3 Shideh 4 4000 1

4 Kathy 30 30000 2 4 Shideh 4 4000 1

EQUALITY JOIN, NATURAL JOIN, JOIN, SEMI-JOIN (Cont…)

Example: retrieve the name of employees who earn more than Joe: ∏name(Emp (sal>x.sal)бname=‘Joe’(ρ x(Emp)))

Semi-Join selects the columns of one relation that joins with another. It is equivalent to a join followed by a projection: Emp (dno)Dept ≡∏SS#, name, age, salary, dno(Emp

Dept)